-
Notifications
You must be signed in to change notification settings - Fork 623
Add warning about the use of pickle for model deserialization to README. #7829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -131,6 +131,12 @@ See the build [guide](BUILD.md). | |||||
|
|
||||||
| cuML is compatible with scikit-learn version 1.4 or higher. | ||||||
|
|
||||||
| ## Model serialization and security | ||||||
|
|
||||||
| cuML models can be serialized with `pickle` or `joblib` and loaded later for inference. cuML uses cloudpickle so that models trained with cuml.accel can be loaded and used with scikit-learn. | ||||||
|
|
||||||
| **Only unpickle or deserialize from trusted sources.** The `pickle` module (and by extension `joblib`) is not secure: malicious payloads can execute arbitrary code during deserialization and compromise your system. **Do not unpickle or load data from untrusted or tampered sources.** This applies to `pickle.load()` / `pickle.loads()`, `joblib.load()`, and any file-based model loading. For details and patterns, see the [Model Serialization and Persistence](docs/source/pickling_cuml_models.ipynb) notebook and the [Python pickle security documentation](https://docs.python.org/3/library/pickle.html). | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd remove the blanket file based loading part. It is vague and not correct (you could load from a file based format like a JSON or onnx and be fine). Being precise is important for security stuff, especially things like this where there is already a lot of confusion and "fear and doom" out there.
Suggested change
|
||||||
|
|
||||||
| ## Contributing | ||||||
|
|
||||||
| Please see our [guide for contributing to cuML](CONTRIBUTING.md). | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the average user is meant to do with the information about cloudpickle, besides start worrying that we do "insecure stuff" deep in the library. We don't, hence I propose to remove this.