Skip to content

Expose kmeans to python#729

Merged
rapids-bot[bot] merged 29 commits intorapidsai:branch-25.06from
benfred:python_kmeans
Apr 30, 2025
Merged

Expose kmeans to python#729
rapids-bot[bot] merged 29 commits intorapidsai:branch-25.06from
benfred:python_kmeans

Conversation

@benfred
Copy link
Copy Markdown
Contributor

@benfred benfred commented Feb 26, 2025

No description provided.

@benfred benfred added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Feb 26, 2025
@benfred benfred self-assigned this Feb 26, 2025
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Feb 26, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Feb 27, 2025

@benfred this looks great, but one of the things we're being asked for quite a bit today is to expose the hierarchical kmeans to Python. Any chance we can also expose those functions? I don't mind doing it as a follow-up, given that this PR is already feature complete.

@benfred
Copy link
Copy Markdown
Contributor Author

benfred commented Feb 27, 2025

/ok to test

@benfred benfred marked this pull request as ready for review February 27, 2025 22:20
@benfred benfred requested review from a team as code owners February 27, 2025 22:20
@benfred benfred changed the base branch from branch-25.04 to branch-25.06 April 10, 2025 20:53
Comment thread cpp/include/cuvs/cluster/kmeans.h
Comment thread cpp/src/cluster/kmeans.cuh Outdated

rmm::device_uvector<char> workspace(n_samples * sizeof(IndexT), stream);

rmm::device_uvector<DataT> x_norms(n_samples, stream);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the newer mdarray/mdspan API be used here? For the allocation of memory and the calls to raft functions that accept it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used the newer mdarray functions where possible in the last commit (but there are some cases where a device_uvector is expected, like the workspace etc, so I've left those as is)

Comment thread cpp/src/cluster/kmeans.cuh
Comment thread python/cuvs/cuvs/cluster/kmeans/kmeans.pyx
Comment thread python/cuvs/cuvs/cluster/kmeans/kmeans.pyx
@benfred benfred requested a review from lowener April 24, 2025 04:54
@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Apr 30, 2025

/merge

@rapids-bot rapids-bot Bot merged commit f2d70ae into rapidsai:branch-25.06 Apr 30, 2025
68 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing Apr 30, 2025
@VoVAllen
Copy link
Copy Markdown

VoVAllen commented May 1, 2025

Cheers! When will the corresponding version be released?

@bdice
Copy link
Copy Markdown
Contributor

bdice commented May 1, 2025

@VoVAllen Nightlies of cuVS are available now (https://anaconda.org/rapidsai-nightly/cuvs/files), see install commands here (select "Nightly"): https://docs.rapids.ai/install/

The 25.06 release will come out around June 4-5.

@haochengxi
Copy link
Copy Markdown

Thank you for your excellent work! Have you considered exposing the int8 k-means functionality to Python as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change Python

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[FEA] Expose both regular and hierarchical kmeans to C/Python

6 participants