Skip to content

Add warning about the use of pickle for model deserialization to README.#7829

Merged
rapids-bot[bot] merged 1 commit intorapidsai:mainfrom
csadorf:add-warnings-to-readme
Feb 25, 2026
Merged

Add warning about the use of pickle for model deserialization to README.#7829
rapids-bot[bot] merged 1 commit intorapidsai:mainfrom
csadorf:add-warnings-to-readme

Conversation

@csadorf
Copy link
Copy Markdown
Contributor

@csadorf csadorf commented Feb 23, 2026

Documents pickle/joblib security in the README: only unpickle from trusted sources; malicious payloads can execute arbitrary code.

@csadorf csadorf requested a review from a team as a code owner February 23, 2026 21:42
@csadorf csadorf requested a review from jcrist February 23, 2026 21:42
@csadorf csadorf added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 23, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 249bd67 and 7e334e3.

📒 Files selected for processing (1)
  • README.md

📝 Walkthrough

Summary by CodeRabbit

  • Documentation
    • Added comprehensive security guidance for model serialization and deserialization, covering pickle and joblib options with explicit warnings about the risks of loading models from untrusted sources.

Walkthrough

A new "Model serialization and security" section was added to README.md documenting that cuML models support pickle and joblib serialization via cloudpickle, along with explicit security warnings against deserializing from untrusted sources.

Changes

Cohort / File(s) Summary
Documentation
README.md
Added a new section detailing model serialization support (pickle, joblib) and security warnings regarding deserialization from untrusted sources with references to external documentation.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a warning about pickle use for model deserialization to the README.
Description check ✅ Passed The description is directly related to the changeset, explaining the security documentation about pickle/joblib and the risk of untrusted sources.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not wrong, it feels a bit off to me to elevate this in the readme. Approving, but begrudgingly.

Copy link
Copy Markdown
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions for improvement. Otherwise fine for me, though it seems a bit like an exercise in arse covering

Comment thread README.md

## Model serialization and security

cuML models can be serialized with `pickle` or `joblib` and loaded later for inference. cuML uses cloudpickle so that models trained with cuml.accel can be loaded and used with scikit-learn.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the average user is meant to do with the information about cloudpickle, besides start worrying that we do "insecure stuff" deep in the library. We don't, hence I propose to remove this.

Suggested change
cuML models can be serialized with `pickle` or `joblib` and loaded later for inference. cuML uses cloudpickle so that models trained with cuml.accel can be loaded and used with scikit-learn.
cuML models can be serialized with `pickle` or `joblib` and loaded later for inference.

Comment thread README.md

cuML models can be serialized with `pickle` or `joblib` and loaded later for inference. cuML uses cloudpickle so that models trained with cuml.accel can be loaded and used with scikit-learn.

**Only unpickle or deserialize from trusted sources.** The `pickle` module (and by extension `joblib`) is not secure: malicious payloads can execute arbitrary code during deserialization and compromise your system. **Do not unpickle or load data from untrusted or tampered sources.** This applies to `pickle.load()` / `pickle.loads()`, `joblib.load()`, and any file-based model loading. For details and patterns, see the [Model Serialization and Persistence](docs/source/pickling_cuml_models.ipynb) notebook and the [Python pickle security documentation](https://docs.python.org/3/library/pickle.html).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the blanket file based loading part. It is vague and not correct (you could load from a file based format like a JSON or onnx and be fine). Being precise is important for security stuff, especially things like this where there is already a lot of confusion and "fear and doom" out there.

Suggested change
**Only unpickle or deserialize from trusted sources.** The `pickle` module (and by extension `joblib`) is not secure: malicious payloads can execute arbitrary code during deserialization and compromise your system. **Do not unpickle or load data from untrusted or tampered sources.** This applies to `pickle.load()` / `pickle.loads()`, `joblib.load()`, and any file-based model loading. For details and patterns, see the [Model Serialization and Persistence](docs/source/pickling_cuml_models.ipynb) notebook and the [Python pickle security documentation](https://docs.python.org/3/library/pickle.html).
**Only unpickle or deserialize from trusted sources.** The `pickle` module (and by extension `joblib`) is not secure: malicious payloads can execute arbitrary code during deserialization and compromise your system. **Do not unpickle or load data from untrusted or tampered sources.** This applies to `pickle.load()` / `pickle.loads()`, `joblib.load()`, etc. For details and patterns, see the [Model Serialization and Persistence](docs/source/pickling_cuml_models.ipynb) notebook and the [Python pickle security documentation](https://docs.python.org/3/library/pickle.html).

@betatim
Copy link
Copy Markdown
Member

betatim commented Feb 25, 2026

We have two 👍 and some suggestions for edits. We can do them later if we want to. :shipit:

@betatim
Copy link
Copy Markdown
Member

betatim commented Feb 25, 2026

/merge

@rapids-bot rapids-bot Bot merged commit a880de8 into rapidsai:main Feb 25, 2026
58 checks passed
@csadorf csadorf deleted the add-warnings-to-readme branch February 25, 2026 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants