HDBSCAN with NN Descent build option by jinsolp · Pull Request #7339 · rapidsai/cuml

jinsolp · 2025-10-14T18:12:31Z

viclafargue

Thanks @jinsolp, great work! It looks like we could maybe consolidate the UMAP and HDBSCAN C++ All-neighbors and NN Descent structs. Unless there is a reason we do not expose the intermediate graph degree and termination threshold parameters in HDBSCAN. Also, it looks like the distinction between knn and nnd parameters might prove useful to improve understanding, could be nice if we could to do the same for UMAP. Just a bunch of ideas for follow-up PRs. Again, great work!

viclafargue · 2025-10-17T08:52:47Z

+        random_state=42,
+    )
+
+    umap_handle = None


hdbscan_handle?

jinsolp · 2025-10-22T17:07:43Z

@viclafargue thanks for the review!

like we could maybe consolidate the UMAP and HDBSCAN C++ All-neighbors and NN Descent structs. Unless there is a reason we do not expose the intermediate graph degree and termination threshold parameters in HDBSCAN.

That is a good idea! the reason I don't have intermediate graph degree and termination threshold exposed in HDBSCAN is because they don't affect the results as much as graph degree and max iterations, but might as well just expose it because we're already doing that for umap!

Also, it looks like the distinction between knn and nnd parameters might prove useful to improve understanding, could be nice if we could to do the same for UMAP.

This is also something I had in mind, so probably will have to deprecate existing parameters if we choose to do so!

jinsolp · 2025-10-22T18:52:40Z

@viclafargue exposed other nn descent parameters to match umap! would be nice if you could take a final look before we merge this 🙂

greptile-apps

Greptile Overview

Greptile Summary

This PR adds NN Descent as an alternative KNN graph-building algorithm for HDBSCAN, mirroring the existing UMAP functionality. The implementation introduces a new build_algo parameter (brute_force/nn_descent) with configurable options via build_kwds, enabling faster clustering on large datasets. The change propagates through the stack: C++ headers define the new GRAPH_BUILD_ALGO enum and parameter structs, the runner dispatches to cuVS's all_neighbors API based on build algorithm and data location (device/host), Cython bindings expose the new types, and the Python API validates parameters while auto-adjusting incompatible configurations (e.g., graph_degree >= min_samples + 1). Memory-type selection logic now uses host memory when knn_n_clusters > 1 to support datasets larger than GPU memory via overlapping cluster partitioning.

Critical Issues

Compilation Error (cpp/include/cuml/cluster/hdbscan.hpp:138): Missing comma between CLUSTER_SELECTION_METHOD and GRAPH_BUILD_ALGO enum definitions will cause build failure.
Test Validation Gap (test_hdbscan.py:1223-1248): The new test passes build_kwds={"knn_n_clusters": n_clusters, "nnd_graph_degree": 32} to both brute_force and nn_descent algorithms. knn_n_clusters is documented as applying to both, but nnd_graph_degree (NN Descent specific) may be silently ignored for brute_force, reducing test effectiveness. Additionally, the 0.9 ARI threshold is permissive and may not catch subtle regressions.
Parameter Namespace Mismatch Risk (headers.pxd:23-34): Cython declares nn_descent_params_hdbscan and graph_build_params under nested namespace ML::HDBSCAN::Common::graph_build_params, but the C++ header places them under ML::HDBSCAN::Common. Any mismatch will cause silent memory corruption or segfaults at runtime.
User Confusion from Auto-Adjustment (runner.h:113-121): When graph_degree < min_samples + 1, the code silently increases both graph_degree and intermediate_graph_degree (to 2×graph_degree). Users who explicitly set intermediate_graph_degree may be surprised their value is overridden without error.

Confidence: 2/5 - The compilation error and potential Cython namespace mismatch require immediate attention before merge. The test coverage gaps and auto-adjustment behavior need validation to ensure correctness.

_{5 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

viclafargue

LGTM!

jinsolp · 2025-10-23T22:01:40Z

/merge

hdbscan with nnd

8bb0464

jinsolp self-assigned this Oct 14, 2025

jinsolp requested review from a team as code owners October 14, 2025 18:12

jinsolp requested a review from viclafargue October 14, 2025 18:12

jinsolp added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change algo: hdbscan labels Oct 14, 2025

github-actions Bot added Cython / Python Cython or Python issue CUDA/C++ labels Oct 14, 2025

jinsolp added feature request New feature or request and removed improvement Improvement / enhancement to an existing function labels Oct 14, 2025

viclafargue approved these changes Oct 17, 2025

View reviewed changes

Merge branch 'main' into hdbscan-nn-descent

992650c

expose other nnd params

54210f4

greptile-apps Bot reviewed Oct 22, 2025

View reviewed changes

viclafargue approved these changes Oct 23, 2025

View reviewed changes

fix docs build failure

320abbe

rapids-bot Bot merged commit 772fb22 into rapidsai:main Oct 23, 2025
197 of 202 checks passed

jinsolp deleted the hdbscan-nn-descent branch October 23, 2025 22:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDBSCAN with NN Descent build option#7339

HDBSCAN with NN Descent build option#7339
rapids-bot[bot] merged 4 commits intorapidsai:mainfrom
jinsolp:hdbscan-nn-descent

jinsolp commented Oct 14, 2025

Uh oh!

viclafargue left a comment

Uh oh!

viclafargue Oct 17, 2025

Uh oh!

jinsolp commented Oct 22, 2025

Uh oh!

jinsolp commented Oct 22, 2025

Uh oh!

greptile-apps Bot left a comment

Uh oh!

viclafargue left a comment

Uh oh!

jinsolp commented Oct 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jinsolp commented Oct 14, 2025

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

viclafargue Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

jinsolp commented Oct 22, 2025

Uh oh!

jinsolp commented Oct 22, 2025

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

jinsolp commented Oct 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants