Skip to content

Fix Dask KMeans performance regression#7819

Merged
rapids-bot[bot] merged 1 commit intorapidsai:mainfrom
viclafargue:fix-dask-kmeans-performance-regression
Feb 24, 2026
Merged

Fix Dask KMeans performance regression#7819
rapids-bot[bot] merged 1 commit intorapidsai:mainfrom
viclafargue:fix-dask-kmeans-performance-regression

Conversation

@viclafargue
Copy link
Copy Markdown
Contributor

Closes #7749

The KMeans parameter struct on cuML side used to be calloc'ed, and is now constructed on stack. Because of this, Dask Kmeans used to use raft::random::GeneratorType::GenPhilox (value of 0) as its random number generator type and now uses raft::random::GeneratorType::GenPC (constructor default of 1). This somehow causes the KMeans initialization or convergence steps to be of lower quality. Converges in ~40 steps instead of 2.

This PR fixes the issue by setting raft::random::GeneratorType::GenPhilox as the default inside of the cuML parameter struct default values.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 20, 2026

No actionable comments were generated in the recent review. 🎉


📝 Walkthrough

Summary by CodeRabbit

  • Chores
    • Updated copyright year information in cluster module

Walkthrough

The change updates the default RNG state initialization in KMeansParams to explicitly specify the GenPhilox generator type in addition to the seed value. The copyright year header is also incremented from 2025 to 2026.

Changes

Cohort / File(s) Summary
KMeansParams RNG initialization
cpp/include/cuml/cluster/kmeans_params.hpp
Updated rng_state default initialization from {0} to {0, raft::random::GeneratorType::GenPhilox} to explicitly specify the generator type. Copyright year updated from 2025 to 2026.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: fixing a Dask KMeans performance regression, which is the primary objective of this PR.
Description check ✅ Passed The description clearly explains the regression cause, technical details of the fix, and references the linked issue #7749.
Linked Issues check ✅ Passed The PR fixes the performance regression by restoring the original RNG generator type (GenPhilox) as default, directly addressing the root cause identified in issue #7749.
Out of Scope Changes check ✅ Passed All changes are limited to the KMeansParams header file RNG initialization and copyright update, directly related to fixing the reported issue.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@aamijar aamijar added non-breaking Non-breaking change bug Something isn't working labels Feb 23, 2026
@viclafargue
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot Bot merged commit 58da454 into rapidsai:main Feb 24, 2026
102 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CUDA/C++ non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] KMeansMG performance regression between 25.06 and 25.10

4 participants