Plumb metric and metric_kwds through to UMAP with nn_descent#6304
Plumb metric and metric_kwds through to UMAP with nn_descent#6304rapids-bot[bot] merged 5 commits intobranch-25.04from
metric and metric_kwds through to UMAP with nn_descent#6304Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
e827f53 to
afcff55
Compare
jcrist
left a comment
There was a problem hiding this comment.
Rebased on main and marked ready for review.
| umap_params.nn_descent_params.n_clusters = <uint64_t> build_kwds.get("nnd_n_clusters", 1) | ||
| # Forward metric & metric_kwds to nn_descent | ||
| umap_params.nn_descent_params.metric = <RaftDistanceType> umap_params.metric | ||
| umap_params.nn_descent_params.metric_arg = umap_params.p |
There was a problem hiding this comment.
The actual fix is here (plumbing through the metric options to nn_descent_params). In the long run we should redesign the C++ layer to remove duplicate options - for now just ensuring they're forwarded correctly seems sufficient.
Everything else here is a simplification of the current pre-existing code.
viclafargue
left a comment
There was a problem hiding this comment.
LGTM, but we would have to make sure that NN Descent is actually compatible with all the metrics we make available. cc @jinsolp
|
A quick browse through the code I believe all metrics are handled fine by The behavior before was incorrect ( Planning to merge once CI is green. |
|
|
Ah! Thanks @jinsolp! I'm curious - right now after plumbing the
Then whenever we get around to moving to |
|
Yes I think that would be great! : ) |
|
Looking at finishing this up today. @jinsolp, can you confirm that the existing support is for If so, then validating would make |
|
Right now with NN Descent raft, we have a distance epilogue that takes care of the sqrt part. Summarizing; UMAP + NN Descent works with Noticing that the distance epilogues are no longer in the cuvs version, we can add the L2SqrtExpanded type in cuvs to enable its use with cuML UMAP in the future. |
|
Hmmm, that seems a bit convoluted. For the As is, it looks like we always pass in the |
|
The default distance epilogue is set to return itself (i.e. is an identity op) declared here as part of the NN Descent types.
The distance epilogues are actually needed to make NN Descent work with HDBSCAN. We will need them in cuvs too eventually : ) (or find a smarter way to do this)
You are right about this!
This should be fine for now 🙂 |
Previously we were erroneously missing this plumbing, leaving the `metric` and `metric_arg` as the defaults when `nn_descent` is used. In the long run we should redo how the parameters are passed when `nn_descent` is enabled to avoid this duplication, but for now fixing the plumbing to be more correct seems fine.
At least on my aarch64 linux box these aren't failing anymore.
Also consolidates `metric` parameter handling in umap and friends.
d275218 to
e65b152
Compare
|
Done - we should now:
Should be ready for another round of review. |
jinsolp
left a comment
There was a problem hiding this comment.
LGTM! Thank you for this PR!
viclafargue
left a comment
There was a problem hiding this comment.
LGTM! Just small change request
csadorf
left a comment
There was a problem hiding this comment.
One suggestion, but LGTM.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## branch-25.04 #6304 +/- ##
================================================
- Coverage 68.53% 67.39% -1.14%
================================================
Files 204 204
Lines 13199 13230 +31
================================================
- Hits 9046 8917 -129
- Misses 4153 4313 +160 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/merge |
Follows [this PR](rapidsai/cuml#6304) from cuml. Signed-off-by: Rishi Chandra <[email protected]>
Previously we were erroneously missing this plumbing, leaving the
metricandmetric_argas the defaults whennn_descentis used. In the long run we should redo how the parameters are passed whennn_descentis enabled to avoid this duplication (there's a few other uselessly exposed params likereturn_distances), but for now fixing the plumbing to be more correct seems fine.On top of #6303, leaving as draft for now until that's merged.
Also re-enables tests on ARM, fixes #5441.