Skip to content

Forward-merge release/26.04 into main#7914

Merged
gforsyth merged 8 commits intorapidsai:mainfrom
gforsyth:main-merge-release/26.04
Mar 20, 2026
Merged

Forward-merge release/26.04 into main#7914
gforsyth merged 8 commits intorapidsai:mainfrom
gforsyth:main-merge-release/26.04

Conversation

@gforsyth
Copy link
Copy Markdown
Contributor

@gforsyth gforsyth commented Mar 18, 2026

Remember to /merge nosquash this one

xref #7905

Closes rapidsai#7827

The API reference section is now structured a bit more nicely. On `main` it is just one big page and it is hard to see what is what, what is where and who is who.

🖼️  Rendered preview https://downloads.rapids.ai/ci/cuml/pull-request/7798/55e9d3b/docs/cuml/html/api/ (can be outdated because it contains part of the commit hash)

With this PR the API reference makes use of subsections for the various modules and maintains a list of "all the things" on that first API page (makes it easy to ctrl-f for a class you know exists and just want to get to fast):

<img width="1265" height="955" alt="Screenshot 2026-02-13 at 11 56 31" src="https://github.com/user-attachments/assets/6d9d9288-0292-4e49-a7a4-d2ed4d9d42f1" />


You can navigate into a subsection and see the classes and functions available in it:

<img width="1263" height="990" alt="Screenshot 2026-02-13 at 11 57 49" src="https://github.com/user-attachments/assets/2432b3e2-e156-4b7c-891d-82edcb452fb3" />


And you can visit a particular class/function to read everything there is to know about that class:

<img width="1262" height="987" alt="Screenshot 2026-02-13 at 11 58 39" src="https://github.com/user-attachments/assets/a8e27fe7-1a19-41fe-9eaf-a2fb71ad66f4" />

---

There is an issue about restructuring the documentation, and we've discussed it a few times in the past. This morning I was frustrated enough while navigating the API reference section that I did this. I think it is a improvement on the status quo. It does surface that we could be more consistent with how we format our docstrings (eg without a linebreak after the first sentence all of the docstring ends up in the overview table or the fact that some metrics are shown directly on the "cuml.metrics" page but others referred to), etc but that is something we can tackle in a new issue I think.

What do people think? I'm happy to do more polishing and think about HTTP redirects we need and such, but it would be good to know if people agree that this is going in the right direction.

Authors:
  - Tim Head (https://github.com/betatim)
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)
  - Simon Adorf (https://github.com/csadorf)
  - Jim Crist-Harif (https://github.com/jcrist)

URL: rapidsai#7798
@gforsyth gforsyth requested review from a team as code owners March 18, 2026 21:41
@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Mar 18, 2026
@gforsyth gforsyth added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 18, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5bf32df9-3833-400a-9739-2413430cc288

📥 Commits

Reviewing files that changed from the base of the PR and between 5907ea7 and 791590a.

📒 Files selected for processing (2)
  • .github/workflows/build.yaml
  • .github/workflows/pr.yaml

📝 Walkthrough

Summary by CodeRabbit

  • Documentation

    • Restructured API reference with a new API index, many module-specific pages, and enabled autosummary generation for richer, linked API docs.
  • Dependencies

    • Updated CUDA toolkit requirements to targeted 13.x handling with nvjitlink managed separately.
  • Chores

    • Updated copyright years to 2026.
    • Adjusted XGBoost training params to use CPU/GPU-compatible settings.
    • Added Python 3.13 to development container configs.
  • Refactor

    • Reorganized preprocessing encoder exports and improved distributed label handling for Dask KMeans.

Walkthrough

Restructures and splits the monolithic API docs into per-submodule RSTs with an autosummary template and enabled generation; updates Sphinx config and extensions; adjusts .gitignore; refactors encoder import paths and related tests; modifies Dask KMeans distributed label/inertia handling; updates CI/devcontainer and CUDA dependency pins.

Changes

Cohort / File(s) Summary
API index & per-module RSTs
docs/source/api.rst (deleted), docs/source/api/index.rst, docs/source/api/cuml.*.rst
Removed monolithic API page; added api/index.rst and many per-submodule API RSTs using automodule + autosummary to generate generated/ stubs.
Autosummary template & Sphinx config
docs/source/_templates/autosummary/base.rst, docs/source/conf.py
Added autosummary base template, enabled autosummary_generate = True, added rst_prolog role, and updated copyright years.
Docs nav, minor pages & ignore rules
docs/source/index.rst, docs/source/FIL.rst, .gitignore
Updated :doc: targets to api/index, adjusted FIL link, and extended .gitignore to exclude docs autosummary/generated artifacts and build outputs.
Sphinx extension
docs/source/sphinxext/github_link.py
Added guard to skip docstring regex when obj.__doc__ is None; bumped year.
Encoder import refactor
python/cuml/cuml/__init__.py, python/cuml/cuml/preprocessing/__init__.py, python/cuml/cuml/preprocessing/encoders.py, python/cuml/cuml/dask/preprocessing/__init__.py, python/cuml/cuml/dask/preprocessing/_label.py
Changed internal import targets to re-export LabelEncoder/TargetEncoder from _label/_target_encoder` modules and updated docstring examples.
Encoder tests
python/cuml/tests/test_label_encoder.py, python/cuml/tests/test_target_encoder.py, python/cuml/tests/dask/test_dask_label_encoder.py
Updated tests to import encoders from new internal module paths.
Dask KMeans & tests
python/cuml/cuml/dask/cluster/kmeans.py, python/cuml/tests/dask/test_dask_kmeans.py
Changed post-fit aggregation: keep first-worker estimator for shared params, compute global inertia by summing per-worker tuples, store labels as distributed Dask objects (avoid local concatenation), and updated tests to use fit_predict/predict flow and validate partitions.
CI / wheel scripts
ci/test_wheel.sh, ci/test_wheel_dask.sh, ci/test_wheel_integrations.sh
Use ${PIP_CONSTRAINT} directly, add --prefer-binary, and adjust constraint/extra handling for pip installs.
Dependency & pyproject changes
dependencies.yaml, python/cuml/pyproject.toml, python/libcuml/pyproject.toml
Reworked CUDA dependency matrix and pins: explicit major.minor handling, added separate nvidia-nvjitlink entries, narrowed toolkit specs (e.g., ==13.*).
Devcontainer Python override
.devcontainer/*/devcontainer.json
Added containerEnv: { "PYTHON_VERSION": "3.13" } to multiple devcontainer configs (some include TODO comments).
Misc code/test tweaks
python/cuml/cuml/solvers/cd_mg.pyx, python/cuml/cuml/benchmark/bench_helper_funcs.py, notebooks, other tests
Minor fixes: scalar coercion for alpha in CDMG, XGBoost param updates (device: 'cuda'), test adjustments, and SPDX year bumps.
Workflows
.github/workflows/build.yaml, .github/workflows/pr.yaml
Updated docs-build container image to a py3.13 / cuda13 image for docs build jobs.
Templates & autosummary outputs
docs/source/_templates/autosummary/base.rst, docs/source/api/generated/*
Added template for autosummary-generated pages (used by new per-module RSTs).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

doc

Suggested reviewers

  • csadorf
  • bdice
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Forward-merge release/26.04 into main' clearly describes the main objective of the changeset, which is a forward-merge of the release branch into main.
Description check ✅ Passed The PR description relates to the changeset by providing a merge instruction and referencing issue #7905, indicating this is part of a structured release process.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/api/index.rst`:
- Around line 1-5: Add the repository-standard SPDX/copyright header block to
the top of this new RST file (index.rst) before the existing title lines;
include the SPDX-License-Identifier line and the canonical copyright line (with
correct year and owner) formatted exactly like other repo files so it matches
the standard header template and linting expectations.
- Around line 450-507: Top-level API index omits several cuml.dask modules, so
update the "Multi-Node, Multi-GPU Algorithms" list-table in
docs/source/api/index.rst to include the missing cuml.dask entries (e.g.,
preprocessing, feature_extraction, datasets and any other modules listed in
docs/source/api/cuml.dask.rst); add rows with :obj: and :mod: references such as
:mod:`cuml.dask.preprocessing`, :mod:`cuml.dask.feature_extraction`,
:mod:`cuml.dask.datasets` (and corresponding representative :obj:`~...` entries
used in cuml.dask.rst) so the top-level index matches the detailed cuml.dask API
doc. Ensure the new rows follow the existing table structure and naming style so
discoverability is consistent.

In `@python/cuml/tests/test_label_encoder.py`:
- Line 12: The test currently imports the private module path (from
cuml.preprocessing._label import LabelEncoder); change it to use the public API
import (from cuml.preprocessing import LabelEncoder) so the test asserts the
public import path remains healthy—update the import in the test file and any
references to the LabelEncoder symbol accordingly to reference the public import
instead of _label.

In `@python/cuml/tests/test_target_encoder.py`:
- Line 10: The test imports TargetEncoder from the private module
cuml.preprocessing._target_encoder which bypasses the public API; change the
import to use the public namespace (import TargetEncoder from
cuml.preprocessing) so the test exercises cuml.preprocessing.TargetEncoder and
catches public-API regressions — update the import statement that references
TargetEncoder accordingly and run the tests to ensure no other references use
the private module.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 62fb3fea-c52c-42f6-b5be-9f8074df8fca

📥 Commits

Reviewing files that changed from the base of the PR and between 69261e4 and 722702f.

📒 Files selected for processing (46)
  • .gitignore
  • docs/source/FIL.rst
  • docs/source/_templates/autosummary/base.rst
  • docs/source/api.rst
  • docs/source/api/cuml.accel.rst
  • docs/source/api/cuml.benchmark.rst
  • docs/source/api/cuml.cluster.rst
  • docs/source/api/cuml.compose.rst
  • docs/source/api/cuml.covariance.rst
  • docs/source/api/cuml.dask.rst
  • docs/source/api/cuml.datasets.rst
  • docs/source/api/cuml.decomposition.rst
  • docs/source/api/cuml.ensemble.rst
  • docs/source/api/cuml.experimental.rst
  • docs/source/api/cuml.explainer.rst
  • docs/source/api/cuml.feature_extraction.rst
  • docs/source/api/cuml.fil.rst
  • docs/source/api/cuml.kernel_ridge.rst
  • docs/source/api/cuml.linear_model.rst
  • docs/source/api/cuml.manifold.rst
  • docs/source/api/cuml.metrics.rst
  • docs/source/api/cuml.model_selection.rst
  • docs/source/api/cuml.multiclass.rst
  • docs/source/api/cuml.naive_bayes.rst
  • docs/source/api/cuml.neighbors.rst
  • docs/source/api/cuml.preprocessing.rst
  • docs/source/api/cuml.random_projection.rst
  • docs/source/api/cuml.rst
  • docs/source/api/cuml.solvers.rst
  • docs/source/api/cuml.svm.rst
  • docs/source/api/cuml.tsa.rst
  • docs/source/api/index.rst
  • docs/source/conf.py
  • docs/source/index.rst
  • docs/source/sphinxext/github_link.py
  • python/cuml/cuml/__init__.py
  • python/cuml/cuml/common/doc_utils.py
  • python/cuml/cuml/dask/preprocessing/__init__.py
  • python/cuml/cuml/dask/preprocessing/_label.py
  • python/cuml/cuml/preprocessing/__init__.py
  • python/cuml/cuml/preprocessing/_label.py
  • python/cuml/cuml/preprocessing/_target_encoder.py
  • python/cuml/cuml/preprocessing/encoders.py
  • python/cuml/tests/dask/test_dask_label_encoder.py
  • python/cuml/tests/test_label_encoder.py
  • python/cuml/tests/test_target_encoder.py
💤 Files with no reviewable changes (1)
  • docs/source/api.rst

Comment thread docs/source/api/index.rst
Comment on lines +1 to +5
.. _api_ref:

=============
API Reference
=============
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add the standard SPDX header to this new docs file.

This new file is missing the repository-standard copyright/license header block.

💡 Proposed fix
+.. SPDX-FileCopyrightText: Copyright (c) 2018-2026, NVIDIA CORPORATION.
+.. SPDX-License-Identifier: Apache-2.0
+
 .. _api_ref:

As per coding guidelines: **/*: Ensure copyright headers of files are up-to-date and in the correct format.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.. _api_ref:
=============
API Reference
=============
.. SPDX-FileCopyrightText: Copyright (c) 2018-2026, NVIDIA CORPORATION.
.. SPDX-License-Identifier: Apache-2.0
.. _api_ref:
=============
API Reference
=============
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/api/index.rst` around lines 1 - 5, Add the repository-standard
SPDX/copyright header block to the top of this new RST file (index.rst) before
the existing title lines; include the SPDX-License-Identifier line and the
canonical copyright line (with correct year and owner) formatted exactly like
other repo files so it matches the standard header template and linting
expectations.

Comment thread docs/source/api/index.rst
Comment on lines +450 to +507
Multi-Node, Multi-GPU Algorithms
================================

.. list-table::
:header-rows: 1
:widths: 30 50 20

* - Object
- Description
- Module
* - :obj:`~cuml.dask.solvers.CD`
- Multi-GPU Coordinate Descent solver.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.cluster.DBSCAN`
- Multi-GPU DBSCAN clustering.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.linear_model.ElasticNet`
- Multi-GPU ElasticNet regression.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.cluster.KMeans`
- Multi-GPU K-Means clustering.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.neighbors.KNeighborsClassifier`
- Multi-GPU K-Nearest Neighbors classifier.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.neighbors.KNeighborsRegressor`
- Multi-GPU K-Nearest Neighbors regressor.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.linear_model.Lasso`
- Multi-GPU Lasso regression.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.linear_model.LinearRegression`
- Multi-GPU Linear Regression.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.naive_bayes.MultinomialNB`
- Multi-GPU Multinomial Naive Bayes.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.neighbors.NearestNeighbors`
- Multi-GPU Nearest Neighbors.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.decomposition.PCA`
- Multi-GPU Principal Component Analysis.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.ensemble.RandomForestClassifier`
- Multi-GPU Random Forest classifier.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.ensemble.RandomForestRegressor`
- Multi-GPU Random Forest regressor.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.linear_model.Ridge`
- Multi-GPU Ridge Regression.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.decomposition.TruncatedSVD`
- Multi-GPU Truncated SVD.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.manifold.UMAP`
- Multi-GPU UMAP.
- :mod:`cuml.dask`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Main API index is incomplete for cuml.dask entries.

The “comprehensive quick-search” page currently omits several cuml.dask APIs that are listed on docs/source/api/cuml.dask.rst (e.g., preprocessing, feature extraction, and dataset entries). This weakens discoverability from the top-level API page.

💡 Example patch (add omitted `cuml.dask` entries)
    * - :obj:`~cuml.dask.manifold.UMAP`
      - Multi-GPU UMAP.
      - :mod:`cuml.dask`
+   * - :obj:`~cuml.dask.preprocessing.LabelBinarizer`
+     - Multi-GPU label binarization.
+     - :mod:`cuml.dask`
+   * - :obj:`~cuml.dask.preprocessing.OneHotEncoder`
+     - Multi-GPU one-hot encoding.
+     - :mod:`cuml.dask`
+   * - :obj:`~cuml.dask.feature_extraction.text.TfidfTransformer`
+     - Multi-GPU TF-IDF transform.
+     - :mod:`cuml.dask`
+   * - :obj:`~cuml.dask.datasets.make_blobs`
+     - Multi-GPU synthetic clustering data generation.
+     - :mod:`cuml.dask`
+   * - :obj:`~cuml.dask.datasets.make_classification`
+     - Multi-GPU synthetic classification data generation.
+     - :mod:`cuml.dask`
+   * - :obj:`~cuml.dask.datasets.make_regression`
+     - Multi-GPU synthetic regression data generation.
+     - :mod:`cuml.dask`

As per coding guidelines: docs/**/* → “Completeness: Check if API changes ... are documented”.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Multi-Node, Multi-GPU Algorithms
================================
.. list-table::
:header-rows: 1
:widths: 30 50 20
* - Object
- Description
- Module
* - :obj:`~cuml.dask.solvers.CD`
- Multi-GPU Coordinate Descent solver.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.cluster.DBSCAN`
- Multi-GPU DBSCAN clustering.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.linear_model.ElasticNet`
- Multi-GPU ElasticNet regression.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.cluster.KMeans`
- Multi-GPU K-Means clustering.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.neighbors.KNeighborsClassifier`
- Multi-GPU K-Nearest Neighbors classifier.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.neighbors.KNeighborsRegressor`
- Multi-GPU K-Nearest Neighbors regressor.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.linear_model.Lasso`
- Multi-GPU Lasso regression.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.linear_model.LinearRegression`
- Multi-GPU Linear Regression.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.naive_bayes.MultinomialNB`
- Multi-GPU Multinomial Naive Bayes.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.neighbors.NearestNeighbors`
- Multi-GPU Nearest Neighbors.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.decomposition.PCA`
- Multi-GPU Principal Component Analysis.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.ensemble.RandomForestClassifier`
- Multi-GPU Random Forest classifier.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.ensemble.RandomForestRegressor`
- Multi-GPU Random Forest regressor.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.linear_model.Ridge`
- Multi-GPU Ridge Regression.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.decomposition.TruncatedSVD`
- Multi-GPU Truncated SVD.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.manifold.UMAP`
- Multi-GPU UMAP.
- :mod:`cuml.dask`
Multi-Node, Multi-GPU Algorithms
================================
.. list-table::
:header-rows: 1
:widths: 30 50 20
* - Object
- Description
- Module
* - :obj:`~cuml.dask.solvers.CD`
- Multi-GPU Coordinate Descent solver.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.cluster.DBSCAN`
- Multi-GPU DBSCAN clustering.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.linear_model.ElasticNet`
- Multi-GPU ElasticNet regression.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.cluster.KMeans`
- Multi-GPU K-Means clustering.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.neighbors.KNeighborsClassifier`
- Multi-GPU K-Nearest Neighbors classifier.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.neighbors.KNeighborsRegressor`
- Multi-GPU K-Nearest Neighbors regressor.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.linear_model.Lasso`
- Multi-GPU Lasso regression.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.linear_model.LinearRegression`
- Multi-GPU Linear Regression.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.naive_bayes.MultinomialNB`
- Multi-GPU Multinomial Naive Bayes.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.neighbors.NearestNeighbors`
- Multi-GPU Nearest Neighbors.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.decomposition.PCA`
- Multi-GPU Principal Component Analysis.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.ensemble.RandomForestClassifier`
- Multi-GPU Random Forest classifier.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.ensemble.RandomForestRegressor`
- Multi-GPU Random Forest regressor.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.linear_model.Ridge`
- Multi-GPU Ridge Regression.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.decomposition.TruncatedSVD`
- Multi-GPU Truncated SVD.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.manifold.UMAP`
- Multi-GPU UMAP.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.preprocessing.LabelBinarizer`
- Multi-GPU label binarization.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.preprocessing.OneHotEncoder`
- Multi-GPU one-hot encoding.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.feature_extraction.text.TfidfTransformer`
- Multi-GPU TF-IDF transform.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.datasets.make_blobs`
- Multi-GPU synthetic clustering data generation.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.datasets.make_classification`
- Multi-GPU synthetic classification data generation.
- :mod:`cuml.dask`
* - :obj:`~cuml.dask.datasets.make_regression`
- Multi-GPU synthetic regression data generation.
- :mod:`cuml.dask`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/api/index.rst` around lines 450 - 507, Top-level API index omits
several cuml.dask modules, so update the "Multi-Node, Multi-GPU Algorithms"
list-table in docs/source/api/index.rst to include the missing cuml.dask entries
(e.g., preprocessing, feature_extraction, datasets and any other modules listed
in docs/source/api/cuml.dask.rst); add rows with :obj: and :mod: references such
as :mod:`cuml.dask.preprocessing`, :mod:`cuml.dask.feature_extraction`,
:mod:`cuml.dask.datasets` (and corresponding representative :obj:`~...` entries
used in cuml.dask.rst) so the top-level index matches the detailed cuml.dask API
doc. Ensure the new rows follow the existing table structure and naming style so
discoverability is consistent.

from sklearn.utils.validation import check_is_fitted

from cuml.preprocessing.LabelEncoder import LabelEncoder
from cuml.preprocessing._label import LabelEncoder
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Prefer the public LabelEncoder import path in this test.

Switching this suite to cuml.preprocessing._label stops it from asserting that the public cuml.preprocessing.LabelEncoder path remains healthy.

🔧 Suggested change
-from cuml.preprocessing._label import LabelEncoder
+from cuml.preprocessing import LabelEncoder
As per coding guidelines, "API breaking changes to Python estimator interfaces, removing or renaming public methods/attributes without deprecation, or breaking backward compatibility require at least one release cycle for deprecations."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from cuml.preprocessing._label import LabelEncoder
from cuml.preprocessing import LabelEncoder
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/tests/test_label_encoder.py` at line 12, The test currently
imports the private module path (from cuml.preprocessing._label import
LabelEncoder); change it to use the public API import (from cuml.preprocessing
import LabelEncoder) so the test asserts the public import path remains
healthy—update the import in the test file and any references to the
LabelEncoder symbol accordingly to reference the public import instead of
_label.

import pytest

from cuml.preprocessing.TargetEncoder import TargetEncoder
from cuml.preprocessing._target_encoder import TargetEncoder
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep this test importing TargetEncoder from the public namespace.

Using cuml.preprocessing._target_encoder here bypasses public API coverage and can hide regressions in cuml.preprocessing.TargetEncoder. For behavior tests, prefer the public import path.

🔧 Suggested change
-from cuml.preprocessing._target_encoder import TargetEncoder
+from cuml.preprocessing import TargetEncoder
As per coding guidelines, "API breaking changes to Python estimator interfaces, removing or renaming public methods/attributes without deprecation, or breaking backward compatibility require at least one release cycle for deprecations."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from cuml.preprocessing._target_encoder import TargetEncoder
from cuml.preprocessing import TargetEncoder
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/tests/test_target_encoder.py` at line 10, The test imports
TargetEncoder from the private module cuml.preprocessing._target_encoder which
bypasses the public API; change the import to use the public namespace (import
TargetEncoder from cuml.preprocessing) so the test exercises
cuml.preprocessing.TargetEncoder and catches public-API regressions — update the
import statement that references TargetEncoder accordingly and run the tests to
ensure no other references use the private module.

@gforsyth
Copy link
Copy Markdown
Contributor Author

I opened rapidsai/xgboost-feedstock#128 to get libxgboost built with rmm==26.6

gforsyth and others added 3 commits March 19, 2026 18:36
…vailable (rapidsai#7916)

We bumped the default devcontainer Python version to 3.14 in rapidsai/devcontainers#677 but some repos (like `cuml`) have optional dependencies without Python 3.14 support. 

We need to override the Python version temporarily until those deps catch up.

I'll also open a tracking issue so we remember to undo this.

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#7916
rapidsai#7907)

Contributes to rapidsai/build-planning#257

* builds CUDA 13 wheels with the 13.0 CTK
* ensures wheels ship with a runtime dependency of `nvidia-nvjitlink>={whatever-minor-version-they-were-built-against}`

Contributes to rapidsai/build-planning#256

* updates wheel tests to cover a range of CTK versions (we previously, accidentally, were only testing the latest 12.x and 13.x)

## Notes for Reviewers

### Doesn't this need conda changes too?

I don't think so.

cuML doesn't directly use libnvjitlink in conda packages, and it dynamically links to libcuvs:

https://github.com/rapidsai/cuml/blob/95b7be311789ef6b66bf1983d09f3cfd17b5e508/cpp/CMakeLists.txt#L64

https://github.com/rapidsai/cuml/blob/main/conda/recipes/libcuml/recipe.yaml#L118-L126

Wheels need a runtime dependency on `nvidia-nvjitlink` because they statically link cuVS:

https://github.com/rapidsai/cuml/blob/95b7be311789ef6b66bf1983d09f3cfd17b5e508/python/libcuml/CMakeLists.txt#L54

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)
  - Robert Maynard (https://github.com/robertmaynard)

URL: rapidsai#7907
…ai#7908)

After MNMG KMeans fit, the client was calling .result() on every worker's future, pulling the full fitted estimator (including `cluster_centers_`) from all workers back to the client. Since cluster centers are synchronized across workers via NCCL, every copy is identical making all but one transfer redundant.

This PR changes the post-fit aggregation to:
- Collect the full model from only the first worker
- Extract `labels_` and `inertia_` from the remaining workers remotely via client.submit(getattr, ...)

Authors:
  - Victor Lafargue (https://github.com/viclafargue)
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Jim Crist-Harif (https://github.com/jcrist)

URL: rapidsai#7908
@gforsyth gforsyth requested a review from a team as a code owner March 20, 2026 13:42
@github-actions github-actions Bot added the ci label Mar 20, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/cuml/tests/dask/test_dask_kmeans.py (1)

57-76: ⚠️ Potential issue | 🟡 Minor

Broaden this test to cover the new fit_predict path.

fit_predict() is still called with its default delayed=True, and only predict() has its partitions/chunks checked. The production change added a separate eager branch plus direct distributed labels construction, so this can still miss regressions in fit_predict(delayed=False) or its layout.

🧪 Suggested test update
-    dask_fit_predict_labels = model.fit_predict(X_train)
+    dask_fit_predict_labels = model.fit_predict(
+        X_train, delayed=delayed_predict
+    )
...
     if input_type == "dataframe":
+        assert dask_fit_predict_labels.npartitions == parts_len
         assert dask_predict_labels.npartitions == parts_len
...
     elif input_type == "array":
+        assert len(dask_fit_predict_labels.chunks[0]) == parts_len
         assert len(dask_predict_labels.chunks[0]) == parts_len
As per coding guidelines, "Update unit tests when making code changes" and "Test files must ... test fit/predict/transform consistency".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/tests/dask/test_dask_kmeans.py` around lines 57 - 76, The test
currently only checks partition/chunk layout for model.predict and leaves
model.fit_predict using its default delayed=True path untested; update the test
to also call model.fit_predict with delayed=False (e.g., a new variable like
eager_fit_predict_labels = model.fit_predict(X_train, delayed=False)) and assert
that its returned collections have the expected number of partitions/chunks
matching parts_len just like dask_predict_labels does, and compare their
computed/eager label arrays (e.g., eager_fit_predict_labels.compute() or
equivalent) against fit_pred_labels/y_train to ensure fit_predict(delayed=False)
follows the same layout and label consistency as predict and the delayed
fit_predict path.
🧹 Nitpick comments (1)
python/cuml/cuml/dask/cluster/kmeans.py (1)

180-181: Make these zip() calls strict.

Assuming the supported Python floor here is still 3.10+, Lines 181, 191, and 199 already trip Ruff B905. These sequences are supposed to stay aligned 1:1, so strict=True gives a safer failure mode than silent truncation.

🔧 Minimal diff
-                for f, w in zip(kmeans_fit, workers)
+                for f, w in zip(kmeans_fit, workers, strict=True)
...
-            for f, w in zip(kmeans_fit, workers)
+            for f, w in zip(kmeans_fit, workers, strict=True)
...
-                    for f, (_, length) in zip(labels, inertia_and_lengths)
+                    for f, (_, length) in zip(
+                        labels, inertia_and_lengths, strict=True
+                    )

Also applies to: 190-191, 198-199

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/dask/cluster/kmeans.py` around lines 180 - 181, Replace the
non-strict zip calls with strict=True so mismatched sequence lengths raise
immediately; specifically update the zip(...) usages that pair kmeans_fit with
workers (the generator passed to self.client.submit calling
_get_inertia_and_n_samples) and the other two zip pairings referenced in the
diff to use zip(..., strict=True) to ensure a 1:1 alignment between kmeans_fit
and workers (and the other paired iterables).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/dask/cluster/kmeans.py`:
- Around line 172-185: The code sets the internal model to the first worker's
KMeansMG (first) before combining per-worker state, so get_combined_model() ends
up with only a partial inertia_ and worker-local labels_; instead, aggregate the
distributed state (compute total inertia_ from _get_inertia_and_n_samples
results and assemble/omit combined labels_) and update the first model's
attributes (e.g., set first.inertia_ = total_inertia and replace or clear
first.labels_ as appropriate) before calling _set_internal_model(first) for both
the initial call and the later similar block (referencing kmeans_fit,
_get_inertia_and_n_samples, inertia_, labels_, and _set_internal_model). Ensure
the internal model passed to _set_internal_model contains the combined/global
inertia_ (and not per-worker labels_) so the wrapper's get_combined_model()
reflects the aggregated state.

---

Outside diff comments:
In `@python/cuml/tests/dask/test_dask_kmeans.py`:
- Around line 57-76: The test currently only checks partition/chunk layout for
model.predict and leaves model.fit_predict using its default delayed=True path
untested; update the test to also call model.fit_predict with delayed=False
(e.g., a new variable like eager_fit_predict_labels = model.fit_predict(X_train,
delayed=False)) and assert that its returned collections have the expected
number of partitions/chunks matching parts_len just like dask_predict_labels
does, and compare their computed/eager label arrays (e.g.,
eager_fit_predict_labels.compute() or equivalent) against
fit_pred_labels/y_train to ensure fit_predict(delayed=False) follows the same
layout and label consistency as predict and the delayed fit_predict path.

---

Nitpick comments:
In `@python/cuml/cuml/dask/cluster/kmeans.py`:
- Around line 180-181: Replace the non-strict zip calls with strict=True so
mismatched sequence lengths raise immediately; specifically update the zip(...)
usages that pair kmeans_fit with workers (the generator passed to
self.client.submit calling _get_inertia_and_n_samples) and the other two zip
pairings referenced in the diff to use zip(..., strict=True) to ensure a 1:1
alignment between kmeans_fit and workers (and the other paired iterables).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 231e6579-47b8-4011-8cb3-fb92721793f1

📥 Commits

Reviewing files that changed from the base of the PR and between 722702f and 5627345.

📒 Files selected for processing (12)
  • .devcontainer/cuda12.9-conda/devcontainer.json
  • .devcontainer/cuda12.9-pip/devcontainer.json
  • .devcontainer/cuda13.1-conda/devcontainer.json
  • .devcontainer/cuda13.1-pip/devcontainer.json
  • ci/test_wheel.sh
  • ci/test_wheel_dask.sh
  • ci/test_wheel_integrations.sh
  • dependencies.yaml
  • python/cuml/cuml/dask/cluster/kmeans.py
  • python/cuml/pyproject.toml
  • python/cuml/tests/dask/test_dask_kmeans.py
  • python/libcuml/pyproject.toml
✅ Files skipped from review due to trivial changes (2)
  • .devcontainer/cuda12.9-conda/devcontainer.json
  • .devcontainer/cuda13.1-conda/devcontainer.json

Comment on lines +172 to +185
first = kmeans_fit[0].result()
self._set_internal_model(first)

workers = list(data.worker_to_parts.keys())

# Compute and store the total inertia_
inertia_and_lengths = self.client.gather(
[
self.client.submit(_get_inertia_and_n_samples, f, workers=[w])
for f, w in zip(kmeans_fit, workers)
]
)
self.inertia_ = sum(inertia for inertia, _ in inertia_and_lengths)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Inspect how the Dask estimator stores and exposes its internal model:"
rg -n -A20 -B5 'def _set_internal_model\b|def get_combined_model\b' python/cuml/cuml/dask/common/base.py

echo
echo "Inspect current post-fit callers of get_combined_model():"
rg -n -C2 '\bget_combined_model\s*\(' python/cuml/tests python/cuml/cuml/dask

Repository: rapidsai/cuml

Length of output: 6527


🏁 Script executed:

cat -n python/cuml/cuml/dask/cluster/kmeans.py | sed -n '160,210p'

Repository: rapidsai/cuml

Length of output: 2243


🏁 Script executed:

rg -n -A3 'get_combined_model.*labels_|get_combined_model.*inertia_' python/cuml/tests/dask/ python/cuml/cuml/dask/

Repository: rapidsai/cuml

Length of output: 354


🏁 Script executed:

# Check if KMeans uses get_combined_model internally in predict or other methods
rg -n -B5 -A5 'def predict\|def transform\|def score' python/cuml/cuml/dask/cluster/kmeans.py | head -80

Repository: rapidsai/cuml

Length of output: 39


🏁 Script executed:

rg -n 'def (predict|score|fit_predict|transform)' python/cuml/cuml/dask/cluster/kmeans.py

Repository: rapidsai/cuml

Length of output: 262


🏁 Script executed:

# Check the score method implementation if it exists
rg -n -A15 'def score' python/cuml/cuml/dask/cluster/kmeans.py

Repository: rapidsai/cuml

Length of output: 500


🏁 Script executed:

cat -n python/cuml/cuml/dask/cluster/kmeans.py | sed -n '283,310p'

Repository: rapidsai/cuml

Length of output: 964


🏁 Script executed:

cat -n python/cuml/tests/dask/test_dask_kmeans.py | sed -n '300,320p'

Repository: rapidsai/cuml

Length of output: 648


🏁 Script executed:

# Check if there are any assertions about inertia_ value in tests
rg -n 'inertia_' python/cuml/tests/dask/test_dask_kmeans.py

Repository: rapidsai/cuml

Length of output: 167


🏁 Script executed:

# Check if there are any direct accesses to internal_model.inertia_ or similar
rg -n 'internal_model.*inertia_|\.inertia_' python/cuml/cuml/dask/cluster/kmeans.py

Repository: rapidsai/cuml

Length of output: 194


🏁 Script executed:

# Check the fit_predict method to see if it accesses any internal model attributes
cat -n python/cuml/cuml/dask/cluster/kmeans.py | sed -n '205,230p'

Repository: rapidsai/cuml

Length of output: 1007


🏁 Script executed:

# Final verification: confirm that first.inertia_ is indeed different from self.inertia_ 
# by checking the helper function that extracts inertia_
cat -n python/cuml/cuml/dask/cluster/kmeans.py | sed -n '1,30p'

Repository: rapidsai/cuml

Length of output: 1172


Store combined model state before calling _set_internal_model().

The internal model at line 173 is set to first, a single worker's KMeansMG instance containing only that worker's inertia_ and labels_. However, lines 178–201 aggregate inertia_ across all workers and collect distributed labels_ from all workers into the wrapper object. This violates the documented contract in base.py which states: "If multiple different parameters have been trained across the cluster...they should be combined into a single model and the combined model should be passed to set_internal_model()."

The inconsistency means get_combined_model().inertia_ would return the first worker's partial inertia instead of the aggregated total. Update the internal model's aggregated state (at minimum inertia_, ideally avoiding a worker-local labels_) before calling _set_internal_model() at line 173.

Also applies to: 193–201

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 181-181: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/dask/cluster/kmeans.py` around lines 172 - 185, The code
sets the internal model to the first worker's KMeansMG (first) before combining
per-worker state, so get_combined_model() ends up with only a partial inertia_
and worker-local labels_; instead, aggregate the distributed state (compute
total inertia_ from _get_inertia_and_n_samples results and assemble/omit
combined labels_) and update the first model's attributes (e.g., set
first.inertia_ = total_inertia and replace or clear first.labels_ as
appropriate) before calling _set_internal_model(first) for both the initial call
and the later similar block (referencing kmeans_fit, _get_inertia_and_n_samples,
inertia_, labels_, and _set_internal_model). Ensure the internal model passed to
_set_internal_model contains the combined/global inertia_ (and not per-worker
labels_) so the wrapper's get_combined_model() reflects the aggregated state.

@gforsyth
Copy link
Copy Markdown
Contributor Author

Ugh, ok, new libxgboost with librmm 26.06 is merged, but now need to bump rapids-xgboost to use that version: rapidsai/integration#833

jcrist and others added 2 commits March 20, 2026 14:42
Numpy 2.4 no longer coerces 1 element arrays to scalars automatically. Small update to this line to support both 1-element alpha arrays and scalars. Fixes rapidsai#7873.

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Simon Adorf (https://github.com/csadorf)

URL: rapidsai#7918
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@gforsyth gforsyth force-pushed the main-merge-release/26.04 branch from c2fa289 to 5907ea7 Compare March 20, 2026 15:02
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
python/cuml/cuml/dask/cluster/kmeans.py (1)

168-201: ⚠️ Potential issue | 🟠 Major

Move _set_internal_model() until after the global state is assembled.

Line 173 stores first before inertia_ and labels_ are combined across workers, so get_combined_model() still exposes the first worker’s partial inertia_ and worker-local labels_. It also means a failure in the later gather/submit calls can leave the estimator half-updated. Please update first with the aggregated state (or clear labels_) and call _set_internal_model(first) only after the aggregation finishes. Based on learnings: Model state management must ensure fit/predict/transform maintain consistent state, and state from previous fit() calls does not affect new training.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/dask/cluster/kmeans.py` around lines 168 - 201, Move the
call to _set_internal_model(first) until after the global state assembly:
compute inertia_and_lengths via client.gather and build/assign self.inertia_ and
self.labels_ (using _get_inertia_and_n_samples, labels list,
da.from_delayed/dd.from_delayed logic) first, then update the model by calling
_set_internal_model(first) so the internal model reflects the aggregated
inertia_ and distributed labels_; if you cannot fully populate first with
aggregated values, clear/overwrite first.labels_ and first.inertia_ (or avoid
setting them on the partial `first`) before calling _set_internal_model to
prevent leaking partial/worker-local state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/dask/cluster/kmeans.py`:
- Around line 175-201: The labels were concatenated per-worker (labels list
zipped with workers) which yields worker-block order, not the original partition
order; fix by mapping each returned delayed label future back to its original
partition index and then reassembling labels_ in input-partition order before
creating the dask array. Concretely: after creating the per-worker delayed
futures (labels) and knowing each worker's partition list from
data.worker_to_parts, build a mapping from partition_index ->
delayed_label_future (use the same zip of kmeans_fit/labels with the worker's
partition indices), then produce an ordered list [mapping[i] for i in
range(n_total_partitions)] and pass that ordered list into dd.from_delayed or
da.from_delayed/da.concatenate (replace current use of
labels/inertia_and_lengths) so labels_ respects the original partition ordering.

In `@python/cuml/cuml/experimental/hyperparams/HPO_demo.ipynb`:
- Around line 764-767: The markdown note about CPU runs is inconsistent with the
gating condition; change the note text so it matches the actual check
`data_fraction <= 0.1` (i.e., say "10%" instead of "1%") — locate the
explanatory text referencing `data_fraction` in the HPO_demo notebook and update
the sentence to reflect 10% (or alternatively change the gate to `<= 0.01` if
you intend 1%) so the note and the `data_fraction <= 0.1` condition are aligned.

---

Duplicate comments:
In `@python/cuml/cuml/dask/cluster/kmeans.py`:
- Around line 168-201: Move the call to _set_internal_model(first) until after
the global state assembly: compute inertia_and_lengths via client.gather and
build/assign self.inertia_ and self.labels_ (using _get_inertia_and_n_samples,
labels list, da.from_delayed/dd.from_delayed logic) first, then update the model
by calling _set_internal_model(first) so the internal model reflects the
aggregated inertia_ and distributed labels_; if you cannot fully populate first
with aggregated values, clear/overwrite first.labels_ and first.inertia_ (or
avoid setting them on the partial `first`) before calling _set_internal_model to
prevent leaking partial/worker-local state.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 37f9cd7f-c5fa-4951-8697-606b323854d1

📥 Commits

Reviewing files that changed from the base of the PR and between 5627345 and 5907ea7.

📒 Files selected for processing (63)
  • .devcontainer/cuda12.9-conda/devcontainer.json
  • .devcontainer/cuda12.9-pip/devcontainer.json
  • .devcontainer/cuda13.1-conda/devcontainer.json
  • .devcontainer/cuda13.1-pip/devcontainer.json
  • .gitignore
  • ci/test_wheel.sh
  • ci/test_wheel_dask.sh
  • ci/test_wheel_integrations.sh
  • dependencies.yaml
  • docs/source/FIL.rst
  • docs/source/_templates/autosummary/base.rst
  • docs/source/api.rst
  • docs/source/api/cuml.accel.rst
  • docs/source/api/cuml.benchmark.rst
  • docs/source/api/cuml.cluster.rst
  • docs/source/api/cuml.compose.rst
  • docs/source/api/cuml.covariance.rst
  • docs/source/api/cuml.dask.rst
  • docs/source/api/cuml.datasets.rst
  • docs/source/api/cuml.decomposition.rst
  • docs/source/api/cuml.ensemble.rst
  • docs/source/api/cuml.experimental.rst
  • docs/source/api/cuml.explainer.rst
  • docs/source/api/cuml.feature_extraction.rst
  • docs/source/api/cuml.fil.rst
  • docs/source/api/cuml.kernel_ridge.rst
  • docs/source/api/cuml.linear_model.rst
  • docs/source/api/cuml.manifold.rst
  • docs/source/api/cuml.metrics.rst
  • docs/source/api/cuml.model_selection.rst
  • docs/source/api/cuml.multiclass.rst
  • docs/source/api/cuml.naive_bayes.rst
  • docs/source/api/cuml.neighbors.rst
  • docs/source/api/cuml.preprocessing.rst
  • docs/source/api/cuml.random_projection.rst
  • docs/source/api/cuml.rst
  • docs/source/api/cuml.solvers.rst
  • docs/source/api/cuml.svm.rst
  • docs/source/api/cuml.tsa.rst
  • docs/source/api/index.rst
  • docs/source/conf.py
  • docs/source/index.rst
  • docs/source/sphinxext/github_link.py
  • notebooks/target_encoder_walkthrough.ipynb
  • python/cuml/cuml/__init__.py
  • python/cuml/cuml/benchmark/bench_helper_funcs.py
  • python/cuml/cuml/common/doc_utils.py
  • python/cuml/cuml/dask/cluster/kmeans.py
  • python/cuml/cuml/dask/preprocessing/__init__.py
  • python/cuml/cuml/dask/preprocessing/_label.py
  • python/cuml/cuml/experimental/hyperparams/HPO_demo.ipynb
  • python/cuml/cuml/preprocessing/__init__.py
  • python/cuml/cuml/preprocessing/_label.py
  • python/cuml/cuml/preprocessing/_target_encoder.py
  • python/cuml/cuml/preprocessing/encoders.py
  • python/cuml/cuml/solvers/cd_mg.pyx
  • python/cuml/pyproject.toml
  • python/cuml/tests/dask/test_dask_coordinate_descent.py
  • python/cuml/tests/dask/test_dask_kmeans.py
  • python/cuml/tests/dask/test_dask_label_encoder.py
  • python/cuml/tests/test_label_encoder.py
  • python/cuml/tests/test_target_encoder.py
  • python/libcuml/pyproject.toml
💤 Files with no reviewable changes (1)
  • docs/source/api.rst
✅ Files skipped from review due to trivial changes (42)
  • docs/source/FIL.rst
  • docs/source/index.rst
  • .devcontainer/cuda12.9-pip/devcontainer.json
  • .devcontainer/cuda12.9-conda/devcontainer.json
  • docs/source/api/cuml.model_selection.rst
  • .devcontainer/cuda13.1-pip/devcontainer.json
  • docs/source/api/cuml.multiclass.rst
  • .gitignore
  • docs/source/api/cuml.accel.rst
  • docs/source/api/cuml.svm.rst
  • notebooks/target_encoder_walkthrough.ipynb
  • docs/source/api/cuml.preprocessing.rst
  • docs/source/api/cuml.random_projection.rst
  • docs/source/api/cuml.tsa.rst
  • python/cuml/cuml/dask/preprocessing/_label.py
  • python/cuml/cuml/preprocessing/encoders.py
  • docs/source/api/cuml.solvers.rst
  • python/libcuml/pyproject.toml
  • python/cuml/tests/test_label_encoder.py
  • docs/source/api/cuml.compose.rst
  • docs/source/api/cuml.neighbors.rst
  • docs/source/api/cuml.experimental.rst
  • docs/source/api/cuml.explainer.rst
  • python/cuml/cuml/preprocessing/init.py
  • docs/source/api/cuml.naive_bayes.rst
  • docs/source/api/cuml.covariance.rst
  • docs/source/api/cuml.manifold.rst
  • python/cuml/cuml/common/doc_utils.py
  • docs/source/api/cuml.datasets.rst
  • docs/source/api/cuml.decomposition.rst
  • docs/source/api/cuml.feature_extraction.rst
  • docs/source/api/cuml.rst
  • docs/source/api/cuml.dask.rst
  • python/cuml/tests/dask/test_dask_label_encoder.py
  • docs/source/api/cuml.ensemble.rst
  • docs/source/api/index.rst
  • docs/source/api/cuml.metrics.rst
  • docs/source/api/cuml.kernel_ridge.rst
  • docs/source/api/cuml.cluster.rst
  • docs/source/api/cuml.fil.rst
  • docs/source/_templates/autosummary/base.rst
  • docs/source/api/cuml.linear_model.rst
🚧 Files skipped from review as they are similar to previous changes (10)
  • ci/test_wheel_integrations.sh
  • docs/source/api/cuml.benchmark.rst
  • .devcontainer/cuda13.1-conda/devcontainer.json
  • python/cuml/tests/test_target_encoder.py
  • ci/test_wheel.sh
  • ci/test_wheel_dask.sh
  • python/cuml/cuml/init.py
  • python/cuml/cuml/dask/preprocessing/init.py
  • docs/source/conf.py
  • python/cuml/tests/dask/test_dask_kmeans.py

Comment on lines +175 to +201
workers = list(data.worker_to_parts.keys())

# Compute and store the total inertia_
inertia_and_lengths = self.client.gather(
[
self.client.submit(_get_inertia_and_n_samples, f, workers=[w])
for f, w in zip(kmeans_fit, workers)
]
)
self.inertia_ = sum(inertia for inertia, _ in inertia_and_lengths)

# Store labels_ as a distributed dask array. This attribute scales with
# n_samples, and shouldn't be pulled back to a local node.
labels_meta = cp.zeros(0, dtype=first.labels_.dtype)
labels = [
self.client.submit(getattr, f, "labels_", workers=[w])
for f, w in zip(kmeans_fit, workers)
]
if self.datatype == "cudf":
self.labels_ = dd.from_delayed(labels)
else:
self.labels_ = da.concatenate(
[
da.from_delayed(f, shape=(length,), meta=labels_meta)
for f, (_, length) in zip(labels, inertia_and_lengths)
]
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rebuild labels_ in input-partition order before returning it.

worker_to_parts groups partitions by worker, and _func_fit() concatenates each worker’s local partitions before producing labels_. Concatenating those worker-local blocks here yields [all parts on worker A, then worker B, ...], not the original Dask partition order, so an input distributed as A, B, A comes back as p0, p2, p1. With Lines 220-221 now returning self.labels_ directly, fit_predict() can return permuted labels for perfectly valid partition layouts.

Also applies to: 220-221

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 181-181: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


[warning] 191-191: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


[warning] 199-199: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/dask/cluster/kmeans.py` around lines 175 - 201, The labels
were concatenated per-worker (labels list zipped with workers) which yields
worker-block order, not the original partition order; fix by mapping each
returned delayed label future back to its original partition index and then
reassembling labels_ in input-partition order before creating the dask array.
Concretely: after creating the per-worker delayed futures (labels) and knowing
each worker's partition list from data.worker_to_parts, build a mapping from
partition_index -> delayed_label_future (use the same zip of kmeans_fit/labels
with the worker's partition indices), then produce an ordered list [mapping[i]
for i in range(n_total_partitions)] and pass that ordered list into
dd.from_delayed or da.from_delayed/da.concatenate (replace current use of
labels/inertia_and_lengths) so labels_ respects the original partition ordering.

Comment on lines +764 to 767
"Now, running the model in CPU version to notice the difference in performance in terms of time. The main difference you would notice is that the `device` is set to `cpu` instead of `cuda`. The interface remains the same and we can even make use of the same parameters that we defined earlier (in fact, this is necessary for a fair comparison). \n",
"\n",
"Note: Remember the `data_fraction` flag from earlier and we will ensure we only run the CPU version if we are using less than 1% of the data"
]
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix CPU data-fraction note to match the actual gate condition.

The note says CPU should run only below 1%, but the code gate is data_fraction <= 0.1 (10%). Please align the markdown to avoid confusion.

🛠️ Suggested patch
-    "Note: Remember the `data_fraction` flag from earlier and we will ensure we only run the CPU version if we are using less than 1% of the data"
+    "Note: Remember the `data_fraction` flag from earlier; we only run the CPU version when using 10% of the data or less (`data_fraction <= 0.1`)."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"Now, running the model in CPU version to notice the difference in performance in terms of time. The main difference you would notice is that the `device` is set to `cpu` instead of `cuda`. The interface remains the same and we can even make use of the same parameters that we defined earlier (in fact, this is necessary for a fair comparison). \n",
"\n",
"Note: Remember the `data_fraction` flag from earlier and we will ensure we only run the CPU version if we are using less than 1% of the data"
]
"Now, running the model in CPU version to notice the difference in performance in terms of time. The main difference you would notice is that the `device` is set to `cpu` instead of `cuda`. The interface remains the same and we can even make use of the same parameters that we defined earlier (in fact, this is necessary for a fair comparison). \n",
"\n",
"Note: Remember the `data_fraction` flag from earlier; we only run the CPU version when using 10% of the data or less (`data_fraction <= 0.1`)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/experimental/hyperparams/HPO_demo.ipynb` around lines 764 -
767, The markdown note about CPU runs is inconsistent with the gating condition;
change the note text so it matches the actual check `data_fraction <= 0.1`
(i.e., say "10%" instead of "1%") — locate the explanatory text referencing
`data_fraction` in the HPO_demo notebook and update the sentence to reflect 10%
(or alternatively change the gate to `<= 0.01` if you intend 1%) so the note and
the `data_fraction <= 0.1` condition are aligned.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please stop reviewing this PR, it's pulling in commits that have already been approved on the release branch.

You are burning GPU cycles that could be better spent elsewhere.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gforsyth, apologies for the unnecessary review noise. I understand this is a forward-merge PR bringing in commits that have already been reviewed and approved on the release branch. I'll refrain from further review comments on this PR.


🧠 Learnings used
Learnt from: CR
Repo: rapidsai/cuml PR: 0
File: cpp/agents.md:0-0
Timestamp: 2026-02-04T17:24:50.873Z
Learning: Focus code review on CRITICAL and HIGH issues only (GPU/CUDA errors, race conditions, memory leaks, algorithm correctness, numerical stability, performance); ignore style/formatting, minor naming, and subjective preferences

@gforsyth
Copy link
Copy Markdown
Contributor Author

/merge nosquash

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Mar 20, 2026

Commit history integrity check failed: not all commits from original PR #7905 appear to be present individually in this PR's history. This usually happens if commits were squashed during the manual resolution process. Please ensure all original commits are preserved individually. You can fix this and try the /merge nosquash command again.

@gforsyth gforsyth merged commit edcd781 into rapidsai:main Mar 20, 2026
188 of 190 checks passed
@gforsyth gforsyth deleted the main-merge-release/26.04 branch March 20, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants