Skip to content

Improve Dask documentation and fix a few minor issues#7499

Merged
rapids-bot[bot] merged 12 commits intorapidsai:release/25.12from
csadorf:issue/7309-improve-dask-docs
Nov 24, 2025
Merged

Improve Dask documentation and fix a few minor issues#7499
rapids-bot[bot] merged 12 commits intorapidsai:release/25.12from
csadorf:issue/7309-improve-dask-docs

Conversation

@csadorf
Copy link
Copy Markdown
Contributor

@csadorf csadorf commented Nov 17, 2025

Summary

This PR improves documentation quality and consistency across cuML's Dask multi-GPU estimators, adds a comprehensive multi-GPU guide, and fixes two minor bugs in KNeighborsClassifier and RandomForestClassifier.

Changes

New Documentation

  • Added dask_multigpu_guide.ipynb - comprehensive guide for multi-GPU usage with Dask

Documentation Improvements

  • Standardized terminology: "multi-node multi-GPU", "Dask cuDF DataFrame"
  • Fixed docstring formatting (parameter underlines, spacing, capitalization)
  • Removed "experimental" language from stable APIs
  • Added known limitations:
    • PCA: random_state parameter not supported in MNMG
    • LogisticRegression: labels must be float32 dtype with code example
    • UMAP: clarified this is for distributed inference only, not training
  • Improved class docstrings with clearer descriptions
  • Fixed typos and improved grammar throughout

Bug Fixes

  • KNeighborsClassifier: Added CuPy array support for label handling
  • RandomForestClassifier: Fixed unique() handling for Dask Arrays vs DataFrames

Closes #7309. Fixes #3663.

@csadorf csadorf requested a review from a team as a code owner November 17, 2025 23:02
@csadorf csadorf requested a review from divyegala November 17, 2025 23:02
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Nov 17, 2025
@csadorf csadorf added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Nov 17, 2025
@csadorf csadorf changed the base branch from main to release/25.12 November 17, 2025 23:02
@csadorf csadorf changed the title Improve Dask documentation Improve Dask documentation and fix a few minor issues Nov 17, 2025
@csadorf csadorf linked an issue Nov 17, 2025 that may be closed by this pull request
Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, nice improvement. Just a few small comments, do with them what you will.

Comment thread docs/source/dask_multigpu_guide.ipynb
Comment thread docs/source/dask_multigpu_guide.ipynb Outdated
@csadorf csadorf force-pushed the issue/7309-improve-dask-docs branch from a6b1bfb to 68d0448 Compare November 24, 2025 16:38
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Nov 24, 2025

/merge

@rapids-bot rapids-bot Bot merged commit d0ea914 into rapidsai:release/25.12 Nov 24, 2025
191 of 194 checks passed
@csadorf csadorf deleted the issue/7309-improve-dask-docs branch November 24, 2025 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DOC] Enhance Dask Multi-GPU Documentation [BUG] Dask KNeighborsClassifier fit fails on cupy backed dask array

3 participants