Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,12 @@ See the build [guide](BUILD.md).

cuML is compatible with scikit-learn version 1.4 or higher.

## Model serialization and security

cuML models can be serialized with `pickle` or `joblib` and loaded later for inference. cuML uses cloudpickle so that models trained with cuml.accel can be loaded and used with scikit-learn.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the average user is meant to do with the information about cloudpickle, besides start worrying that we do "insecure stuff" deep in the library. We don't, hence I propose to remove this.

Suggested change
cuML models can be serialized with `pickle` or `joblib` and loaded later for inference. cuML uses cloudpickle so that models trained with cuml.accel can be loaded and used with scikit-learn.
cuML models can be serialized with `pickle` or `joblib` and loaded later for inference.


**Only unpickle or deserialize from trusted sources.** The `pickle` module (and by extension `joblib`) is not secure: malicious payloads can execute arbitrary code during deserialization and compromise your system. **Do not unpickle or load data from untrusted or tampered sources.** This applies to `pickle.load()` / `pickle.loads()`, `joblib.load()`, and any file-based model loading. For details and patterns, see the [Model Serialization and Persistence](docs/source/pickling_cuml_models.ipynb) notebook and the [Python pickle security documentation](https://docs.python.org/3/library/pickle.html).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the blanket file based loading part. It is vague and not correct (you could load from a file based format like a JSON or onnx and be fine). Being precise is important for security stuff, especially things like this where there is already a lot of confusion and "fear and doom" out there.

Suggested change
**Only unpickle or deserialize from trusted sources.** The `pickle` module (and by extension `joblib`) is not secure: malicious payloads can execute arbitrary code during deserialization and compromise your system. **Do not unpickle or load data from untrusted or tampered sources.** This applies to `pickle.load()` / `pickle.loads()`, `joblib.load()`, and any file-based model loading. For details and patterns, see the [Model Serialization and Persistence](docs/source/pickling_cuml_models.ipynb) notebook and the [Python pickle security documentation](https://docs.python.org/3/library/pickle.html).
**Only unpickle or deserialize from trusted sources.** The `pickle` module (and by extension `joblib`) is not secure: malicious payloads can execute arbitrary code during deserialization and compromise your system. **Do not unpickle or load data from untrusted or tampered sources.** This applies to `pickle.load()` / `pickle.loads()`, `joblib.load()`, etc. For details and patterns, see the [Model Serialization and Persistence](docs/source/pickling_cuml_models.ipynb) notebook and the [Python pickle security documentation](https://docs.python.org/3/library/pickle.html).


## Contributing

Please see our [guide for contributing to cuML](CONTRIBUTING.md).
Expand Down