Init checks for Dask KMeans by viclafargue · Pull Request #7391 · rapidsai/cuml

viclafargue · 2025-10-27T11:19:54Z

jcrist · 2025-10-27T14:40:09Z

While you're here - I'm not 100% sure if this check should be skipped in multi-gpu execution. When refactoring I excluded it in multi-gpu since we weren't running it there before.

Is the multi-gpu implementation robust to a single node having fewer rows than the requested n_clusters? It doesn't seem to error when invoked in that setup, but I'm also not sure if it provides good results.

I think the check was missing in the multi-GPU implementation. I do not know for sure either, but I guess that this is a rare case that should probably not yield very good results especially for scalable/parallel kmeans++ initialization. Better safe than sorry, we should probably alert the user in this case.

divyegala

Maybe also add a check that oversampling_factor > 0?

viclafargue · 2025-10-31T15:20:32Z

/merge

During a recent refactor we removed the `KMeansMG` class, viewing it as internal. It turns out this class was used by a few external projects. Since we still need to support external users accessing the non-dask multi-gpu implementation, we'll want a public way to do so that isn't the private `_fit` method. Additionally, since we want to special case the `MG` case a little more, making it a separate class (even if as a thin shim) makes sense. This PR: - Brings back the `KMeansMG` class - Adds a check that `random_state` is non-None in the `KMeansMG` case, ensuring external users also set `random_state` properly - Removes mutation of kwargs in the dask `KMeans` case (as suggested [here](#7417 (comment))) - Simplifies and moves the multi-gpu `kmeans++`/`oversampling_factor` check (as suggested [here](#7391 (comment))) Fixes #7387. Fixes #7389. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #7420

Closes rapidsai#7389 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#7391

During a recent refactor we removed the `KMeansMG` class, viewing it as internal. It turns out this class was used by a few external projects. Since we still need to support external users accessing the non-dask multi-gpu implementation, we'll want a public way to do so that isn't the private `_fit` method. Additionally, since we want to special case the `MG` case a little more, making it a separate class (even if as a thin shim) makes sense. This PR: - Brings back the `KMeansMG` class - Adds a check that `random_state` is non-None in the `KMeansMG` case, ensuring external users also set `random_state` properly - Removes mutation of kwargs in the dask `KMeans` case (as suggested [here](rapidsai#7417 (comment))) - Simplifies and moves the multi-gpu `kmeans++`/`oversampling_factor` check (as suggested [here](rapidsai#7391 (comment))) Fixes rapidsai#7387. Fixes rapidsai#7389. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#7420

Init checks for Dask KMeans

038c8f6

viclafargue requested a review from a team as a code owner October 27, 2025 11:19

viclafargue requested a review from dantegd October 27, 2025 11:19

github-actions Bot added the Cython / Python Cython or Python issue label Oct 27, 2025

github-actions Bot assigned viclafargue Oct 27, 2025

jcrist reviewed Oct 27, 2025

View reviewed changes

Answering review

82a6bb2

divyegala reviewed Oct 31, 2025

View reviewed changes

viclafargue added 2 commits October 31, 2025 14:18

Merge branch 'main' into init-checks-dask-kmeans

9e605ea

answer review

be8c66b

viclafargue force-pushed the init-checks-dask-kmeans branch from aaa89f5 to be8c66b Compare October 31, 2025 13:33

viclafargue added bug Something isn't working non-breaking Non-breaking change labels Oct 31, 2025

divyegala approved these changes Oct 31, 2025

View reviewed changes

rapids-bot Bot merged commit 550aba7 into rapidsai:main Oct 31, 2025
103 checks passed

jcrist mentioned this pull request Oct 31, 2025

Bring back KMeansMG #7420

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Init checks for Dask KMeans#7391

Init checks for Dask KMeans#7391
rapids-bot[bot] merged 4 commits intorapidsai:mainfrom
viclafargue:init-checks-dask-kmeans

viclafargue commented Oct 27, 2025

Uh oh!

Uh oh!

jcrist Oct 27, 2025

Uh oh!

viclafargue Oct 27, 2025 •

edited

Loading

Uh oh!

divyegala left a comment

Uh oh!

viclafargue commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

viclafargue commented Oct 27, 2025

Uh oh!

Uh oh!

jcrist Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

viclafargue Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divyegala left a comment

Choose a reason for hiding this comment

Uh oh!

viclafargue commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

viclafargue Oct 27, 2025 •

edited

Loading