Add warning about the use of pickle for model deserialization to README.#7829
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughA new "Model serialization and security" section was added to README.md documenting that cuML models support pickle and joblib serialization via cloudpickle, along with explicit security warnings against deserializing from untrusted sources. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
jcrist
left a comment
There was a problem hiding this comment.
While not wrong, it feels a bit off to me to elevate this in the readme. Approving, but begrudgingly.
betatim
left a comment
There was a problem hiding this comment.
Two suggestions for improvement. Otherwise fine for me, though it seems a bit like an exercise in arse covering
|
|
||
| ## Model serialization and security | ||
|
|
||
| cuML models can be serialized with `pickle` or `joblib` and loaded later for inference. cuML uses cloudpickle so that models trained with cuml.accel can be loaded and used with scikit-learn. |
There was a problem hiding this comment.
I'm not sure what the average user is meant to do with the information about cloudpickle, besides start worrying that we do "insecure stuff" deep in the library. We don't, hence I propose to remove this.
| cuML models can be serialized with `pickle` or `joblib` and loaded later for inference. cuML uses cloudpickle so that models trained with cuml.accel can be loaded and used with scikit-learn. | |
| cuML models can be serialized with `pickle` or `joblib` and loaded later for inference. |
|
|
||
| cuML models can be serialized with `pickle` or `joblib` and loaded later for inference. cuML uses cloudpickle so that models trained with cuml.accel can be loaded and used with scikit-learn. | ||
|
|
||
| **Only unpickle or deserialize from trusted sources.** The `pickle` module (and by extension `joblib`) is not secure: malicious payloads can execute arbitrary code during deserialization and compromise your system. **Do not unpickle or load data from untrusted or tampered sources.** This applies to `pickle.load()` / `pickle.loads()`, `joblib.load()`, and any file-based model loading. For details and patterns, see the [Model Serialization and Persistence](docs/source/pickling_cuml_models.ipynb) notebook and the [Python pickle security documentation](https://docs.python.org/3/library/pickle.html). |
There was a problem hiding this comment.
I'd remove the blanket file based loading part. It is vague and not correct (you could load from a file based format like a JSON or onnx and be fine). Being precise is important for security stuff, especially things like this where there is already a lot of confusion and "fear and doom" out there.
| **Only unpickle or deserialize from trusted sources.** The `pickle` module (and by extension `joblib`) is not secure: malicious payloads can execute arbitrary code during deserialization and compromise your system. **Do not unpickle or load data from untrusted or tampered sources.** This applies to `pickle.load()` / `pickle.loads()`, `joblib.load()`, and any file-based model loading. For details and patterns, see the [Model Serialization and Persistence](docs/source/pickling_cuml_models.ipynb) notebook and the [Python pickle security documentation](https://docs.python.org/3/library/pickle.html). | |
| **Only unpickle or deserialize from trusted sources.** The `pickle` module (and by extension `joblib`) is not secure: malicious payloads can execute arbitrary code during deserialization and compromise your system. **Do not unpickle or load data from untrusted or tampered sources.** This applies to `pickle.load()` / `pickle.loads()`, `joblib.load()`, etc. For details and patterns, see the [Model Serialization and Persistence](docs/source/pickling_cuml_models.ipynb) notebook and the [Python pickle security documentation](https://docs.python.org/3/library/pickle.html). |
|
We have two 👍 and some suggestions for edits. We can do them later if we want to. |
|
/merge |
Documents pickle/joblib security in the README: only unpickle from trusted sources; malicious payloads can execute arbitrary code.