Fallback to CPU for sparse inputs for KMeans#6448
Merged
rapids-bot[bot] merged 5 commits intorapidsai:branch-25.04from Mar 25, 2025
Merged
Conversation
- Introduced a new method in KMeans to dispatch CPU implementation when sparse arrays are detected during fitting. - Updated the is_sparse function to use cupyx' and scipy's issparse method for better compatibility.
- Introduced a new test to verify that KMeans correctly dispatches to CPU when fitting with sparse input. - Ensured that the model's attributes and predictions are validated as numpy arrays when using sparse data.
Contributor
Author
|
This is not taking advantage of some of the existing infrastructure for this: cuml/python/cuml/cuml/internals/base.pyx Lines 783 to 797 in 6774818 |
viclafargue
approved these changes
Mar 18, 2025
Contributor
viclafargue
left a comment
There was a problem hiding this comment.
LGTM, just minor recommendations
Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>
Contributor
Author
|
/merge |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds support for handling sparse input arrays in the KMeans algorithm by dispatching to CPU implementation when sparse arrays are detected during fitting. It also updates the sparse array detection utilities to be more robust and consistent across the codebase.
Fixes scikit-learn test
test_kmeans_results[float64-lloyd-sparse_array]in combination with #6442 .Changes
_should_dispatch_cpumethod to KMeans to handle sparse input arraysis_sparseutility function to useissparseinstead ofisspmatrixfor better compatibilityinput_utils.pyto use the newissparsemethodTesting