Skip to content

Forward-merge release/25.12 into main#7529

Merged
gforsyth merged 7 commits intomainfrom
release/25.12
Nov 25, 2025
Merged

Forward-merge release/25.12 into main#7529
gforsyth merged 7 commits intomainfrom
release/25.12

Conversation

@rapids-bot
Copy link
Copy Markdown
Contributor

@rapids-bot rapids-bot Bot commented Nov 24, 2025

Forward-merge triggered by push to release/25.12 that creates a PR to keep main up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.

jcrist and others added 7 commits November 19, 2025 22:57
This:

- Cleans up `SGD`, `MBSGDClassifier`, and `MBSGDRegressor`, following all the guidelines in #7317.
- Adds a new `fit_sgd` function to handle fitting a linear model using SGD. This was the last part of #6938 sans deprecation/removal of the solver classes themselves.
- Removes the undocumented `solver_model` attribute in favor of storing the fitted attributes on the models themselves.
- Adds support for all label types to `MBSGDClassifier`, bringing it in line with our other classifiers
- Adds a validation check to `MBSGDClassifier` to ensure it's fitting a binary classification problem, since multiclass is currently not supported.
- Removes the `SGD.predictClass` method. This method is now unused. It didn't validate the `SGD` represented a classification problem, didn't handle non [0, 1] classes, and didn't match any standard method name or interface. Our other solvers only support regression problems, with the caller required to convert the output to solve a classification problem when needed. I dropped it as a breaking change here since I doubt anyone is using it, but could back off to a deprecation if people feel strongly. Dropping it lets us rip out `target_dtype` sooner/easier.

Breaking Change Summary:
- Removal of `SGD.predictClass`
- `MBSGDClassifier.classes_` is now always a `numpy.ndarray` (mirroring the recent work on our other classifiers)

With this cleanup, `target_dtype` is no longer used. After this is in we can remove that bit from our api decorators/base class to simplify our internals further.

Part of #7317.
Fixes #6938.

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Simon Adorf (https://github.com/csadorf)

URL: #7504
Follow up to #7440, use S3 as the preprocessor cache location.

Contributes to rapidsai/build-planning#228

Authors:
  - Paul Taylor (https://github.com/trxcllnt)

Approvers:
  - Nate Rock (https://github.com/rockhowse)
  - Bradley Dice (https://github.com/bdice)

URL: #7510
With all the work in #7317, we're now at a point where `target_dtype` is no longer used.

This PR removes `target_dtype` and all supporting infra. This simplifies our decorators and base class, and reduces the amount of state stored on an estimator. Since this was all private implementation details, this is not a breaking change.

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Anupam (https://github.com/aamijar)
  - Simon Adorf (https://github.com/csadorf)

URL: #7516
Closes #7143

This PR improves memory usage in UMAP when given a precomputed knn graph.
Previously, a user-given knn graph will occupy GPU memory throughout the full UMAP pipeline even though it is not needed in later steps of UMAP.

In this PR, if the user-given knn graph is on host memory, we keep it on host memory and copy to device at the cpp level to allow better memory management.

### This PR with precomputed knn graph on CPU
<img width="808" height="313" alt="Screenshot 2025-11-12 at 7 00 33 PM" src="https://github.com/user-attachments/assets/6c752f62-a1b2-4fb1-a44d-d86ed468915b" />

### Before with precomputed knn graph on CPU
<img width="828" height="316" alt="Screenshot 2025-11-12 at 7 01 12 PM" src="https://github.com/user-attachments/assets/8237fdd4-e0bb-48f5-bc46-71878ce14b33" />

Authors:
  - Jinsol Park (https://github.com/jinsolp)

Approvers:
  - Philip Hyunsu Cho (https://github.com/hcho3)
  - Simon Adorf (https://github.com/csadorf)
  - Tarang Jain (https://github.com/tarang-jain)

URL: #7481
Dropping xgboost from our CI for now while upstream builds are fixed.

Stopgap for #7520. Supersedes #7523.

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - https://github.com/jakirkham

URL: #7526
## Summary

This PR improves documentation quality and consistency across cuML's Dask multi-GPU estimators, adds a comprehensive multi-GPU guide, and fixes two minor bugs in KNeighborsClassifier and RandomForestClassifier.

## Changes

### New Documentation
- Added `dask_multigpu_guide.ipynb` - comprehensive guide for multi-GPU usage with Dask

### Documentation Improvements
- Standardized terminology: "multi-node multi-GPU", "Dask cuDF DataFrame" 
- Fixed docstring formatting (parameter underlines, spacing, capitalization)
- Removed "experimental" language from stable APIs
- Added known limitations:
  - PCA: `random_state` parameter not supported in MNMG
  - LogisticRegression: labels must be float32 dtype with code example
  - UMAP: clarified this is for distributed inference only, not training
- Improved class docstrings with clearer descriptions
- Fixed typos and improved grammar throughout

### Bug Fixes
- **KNeighborsClassifier**: Added CuPy array support for label handling
- **RandomForestClassifier**: Fixed `unique()` handling for Dask Arrays vs DataFrames

Closes #7309. Fixes #3663.

Authors:
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - Jim Crist-Harif (https://github.com/jcrist)

URL: #7499
@rapids-bot rapids-bot Bot requested review from a team as code owners November 24, 2025 22:13
@rapids-bot rapids-bot Bot requested a review from bdice November 24, 2025 22:13
@rapids-bot
Copy link
Copy Markdown
Contributor Author

rapids-bot Bot commented Nov 24, 2025

FAILURE - Unable to forward-merge due to an error, manual merge is necessary. Do not use the Resolve conflicts option in this PR, follow these instructions https://docs.rapids.ai/maintainers/forward-merger/

IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the /merge comment). Instead, an admin must manually merge by changing the merging strategy to Create a Merge Commit. Otherwise, history will be lost and the branches become incompatible.

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions Bot added conda conda issue Cython / Python Cython or Python issue CUDA/C++ ci labels Nov 24, 2025
@jcrist
Copy link
Copy Markdown
Member

jcrist commented Nov 24, 2025

Gah, this is my fault. I muscle-memoried /merge instead of /merge nosquash in #7519. Not sure what to do here to fix this :/

@gforsyth gforsyth merged commit 92253f6 into main Nov 25, 2025
113 of 115 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci conda conda issue CUDA/C++ Cython / Python Cython or Python issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants