Skip to content

[FEA] Inertia Computation for Balanced KMeans and Add Option for Weighted Inertia#1880

Merged
rapids-bot[bot] merged 17 commits intorapidsai:mainfrom
tarang-jain:h-inertia
Mar 11, 2026
Merged

[FEA] Inertia Computation for Balanced KMeans and Add Option for Weighted Inertia#1880
rapids-bot[bot] merged 17 commits intorapidsai:mainfrom
tarang-jain:h-inertia

Conversation

@tarang-jain
Copy link
Copy Markdown
Contributor

@tarang-jain tarang-jain commented Mar 4, 2026

Closes #1762

@tarang-jain tarang-jain self-assigned this Mar 4, 2026
@tarang-jain tarang-jain added cpp feature request New feature or request non-breaking Introduces a non-breaking change labels Mar 4, 2026
Copy link
Copy Markdown
Contributor

@jinsolp jinsolp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tarang-jain LGTM!

Comment thread cpp/src/cluster/kmeans_cluster_cost.cu
Comment thread cpp/include/cuvs/cluster/kmeans.hpp
divyegala
divyegala previously approved these changes Mar 6, 2026
Comment thread c/src/cluster/kmeans.cpp
Comment thread cpp/include/cuvs/cluster/kmeans.hpp
Comment thread cpp/src/cluster/detail/kmeans_balanced.cuh
Comment thread cpp/src/cluster/kmeans.cuh Outdated
Comment thread cpp/src/cluster/kmeans_balanced.cuh Outdated
@divyegala divyegala dismissed their stale review March 6, 2026 23:39

Accidentally approved

@tarang-jain
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot Bot merged commit 53879fd into rapidsai:main Mar 11, 2026
223 of 227 checks passed
rapids-bot Bot pushed a commit that referenced this pull request Mar 31, 2026
Merge after #1880

This PR adds support for streaming out of core (dataset on host) kmeans clustering. The idea is simple:

Batched accumulation of centroid updates: Data is processed in batches and batch-wise means and cluster counts are accumulated until all the batches i.e., the full dataset pass has completed.
This PR just brings a batch-size parameter to load and compute cluster assignments and (weighted) centroid adjustments on batches of the dataset. The final centroid 'updates' i.e. a single kmeans iteration only completes when all these accumulated sums are averaged once the whole dataset pass has completed.

Authors:
  - Tarang Jain (https://github.com/tarang-jain)

Approvers:
  - Victor Lafargue (https://github.com/viclafargue)
  - Anupam (https://github.com/aamijar)
  - Micka (https://github.com/lowener)
  - Jinsol Park (https://github.com/jinsolp)
  - Ben Frederickson (https://github.com/benfred)

URL: #1886
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpp feature request New feature or request non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] cuVS hierarchical KMeans does not compute inertia (always zero)

4 participants