Skip to content

Plumb metric and metric_kwds through to UMAP with nn_descent#6304

Merged
rapids-bot[bot] merged 5 commits intobranch-25.04from
forward-metric-umap-nndescent
Mar 11, 2025
Merged

Plumb metric and metric_kwds through to UMAP with nn_descent#6304
rapids-bot[bot] merged 5 commits intobranch-25.04from
forward-metric-umap-nndescent

Conversation

@jcrist
Copy link
Copy Markdown
Member

@jcrist jcrist commented Feb 7, 2025

Previously we were erroneously missing this plumbing, leaving the metric and metric_arg as the defaults when nn_descent is used. In the long run we should redo how the parameters are passed when nn_descent is enabled to avoid this duplication (there's a few other uselessly exposed params like return_distances), but for now fixing the plumbing to be more correct seems fine.

On top of #6303, leaving as draft for now until that's merged.

Also re-enables tests on ARM, fixes #5441.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Feb 7, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Feb 7, 2025
@jcrist jcrist added bug Something isn't working non-breaking Non-breaking change Cython / Python Cython or Python issue and removed Cython / Python Cython or Python issue labels Feb 7, 2025
@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Feb 13, 2025

/ok to test

@jcrist jcrist mentioned this pull request Feb 28, 2025
9 tasks
@jcrist jcrist force-pushed the forward-metric-umap-nndescent branch 2 times, most recently from e827f53 to afcff55 Compare March 5, 2025 23:10
@jcrist jcrist marked this pull request as ready for review March 5, 2025 23:10
@jcrist jcrist requested a review from a team as a code owner March 5, 2025 23:10
@jcrist jcrist requested review from bdice and viclafargue March 5, 2025 23:10
Copy link
Copy Markdown
Member Author

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased on main and marked ready for review.

umap_params.nn_descent_params.n_clusters = <uint64_t> build_kwds.get("nnd_n_clusters", 1)
# Forward metric & metric_kwds to nn_descent
umap_params.nn_descent_params.metric = <RaftDistanceType> umap_params.metric
umap_params.nn_descent_params.metric_arg = umap_params.p
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual fix is here (plumbing through the metric options to nn_descent_params). In the long run we should redesign the C++ layer to remove duplicate options - for now just ensuring they're forwarded correctly seems sufficient.

Everything else here is a simplification of the current pre-existing code.

Copy link
Copy Markdown
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but we would have to make sure that NN Descent is actually compatible with all the metrics we make available. cc @jinsolp

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Mar 6, 2025

A quick browse through the code I believe all metrics are handled fine by nn_descent. Just to be sure I've also updated an existing metrics test to be parametrized on build_algo as well; this test runs fine locally as well.

The behavior before was incorrect (build_algo="nn_descent" would silently ignore the user selected metric), so I consider this PR strictly an improvement even in the off chance some parameter combo will error after this change. I'm fairly confident this won't happen though.

Planning to merge once CI is green.

@jinsolp
Copy link
Copy Markdown
Contributor

jinsolp commented Mar 6, 2025

Right now NN Descent in cuvs supports L2Expanded, CosineExpanded and InnerProduct, (link)
but in Raft Nn Descent supports only L2Expanded (link)

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Mar 6, 2025

/merge (just realized victor's review wasn't an actual approval)

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Mar 6, 2025

but in Raft Nn Descent supports only L2Expanded

Ah! Thanks @jinsolp! I'm curious - right now after plumbing the metric through nn_descent properly here I'm not seeing an error (and the added test passes), but you're right that it doesn't look like the metrics are supported after all. I don't see any validation in the raft code flow that would error on an invalid metric - perhaps the right additional fix here then is:

  • Add validation on the python side for nn_descent to error on unsupported metrics
  • Keep plumbing the args through properly, even though they appear to be mostly ignored on the raft side for now
  • Keep the test, but update it to check for errors on unsupported metrics

Then whenever we get around to moving to cuvs instead we can increase the supported metrics pretty straightforwardly.

@jinsolp
Copy link
Copy Markdown
Contributor

jinsolp commented Mar 7, 2025

Yes I think that would be great! : )

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Mar 10, 2025

Looking at finishing this up today. @jinsolp, can you confirm that the existing support is for L2Expanded (called "sqeuclidean" in python) and not L2SqrtExpanded (called "euclidean" or "l2" in python)? Both raft and cuvs use else branches for this (and I had a hard time finding if any validation was done upstream to cull the set of DistanceTypes that may reach the branch).

If so, then validating would make build_algo="nn_descent" fail with the default configuration since we default to metric="euclidean". If this is an issue, would it be to be possible to get L2SqrtExpanded support into the raft (and cuvs) implementations this release to resolve this?

@jinsolp
Copy link
Copy Markdown
Contributor

jinsolp commented Mar 10, 2025

Right now with NN Descent raft, we have a distance epilogue that takes care of the sqrt part.
(So NN Descent in raft just calculates the L2Expanded, but it's sqrt-ed in the distance epilogue here)
The distance epilogue is made and passed on from the UMAP side (written here)

Summarizing; UMAP + NN Descent works with "euclidean" because UMAP's distance epilogue sqrt-s the L2Expanded in NN Descent (which is the only supported metric for NN Descent in raft as of now), making it a L2SqrtExpanded type.

Noticing that the distance epilogues are no longer in the cuvs version, we can add the L2SqrtExpanded type in cuvs to enable its use with cuML UMAP in the future.

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Mar 10, 2025

Hmmm, that seems a bit convoluted. For the cuvs implementation, I'd hope we could avoid the callback and just handle the different DistanceTypes natively (unless there's a good reason to do this with a user-provided callback instead).

As is, it looks like we always pass in the epilogue on the cuml umap side, which seems incorrect if we're computing L2Expanded instead of L2SqrtExpanded? I think this is wrong, happy to change it as part of this PR for now. As written, I guess this means we only support L2SqrtExpanded for nn_descent in cuml for now.

@jinsolp
Copy link
Copy Markdown
Contributor

jinsolp commented Mar 10, 2025

The default distance epilogue is set to return itself (i.e. is an identity op) declared here as part of the NN Descent types.
So it doesn't do anything (i.e. the distance is calculated as L2Expanded) if nothing specific is passed as the distance epilogue. UMAP inherits this epilogue type and overrides the () operator to return an sqrt-ed value of its input.

unless there's a good reason to do this with a user-provided callback instead

The distance epilogues are actually needed to make NN Descent work with HDBSCAN. We will need them in cuvs too eventually : ) (or find a smarter way to do this)

which seems incorrect if we're computing L2Expanded instead of L2SqrtExpanded?

You are right about this!

I guess this means we only support L2SqrtExpanded for nn_descent in cuml for now.

This should be fine for now 🙂

jcrist added 4 commits March 10, 2025 14:13
Previously we were erroneously missing this plumbing, leaving the
`metric` and `metric_arg` as the defaults when `nn_descent` is used. In
the long run we should redo how the parameters are passed when
`nn_descent` is enabled to avoid this duplication, but for now fixing
the plumbing to be more correct seems fine.
At least on my aarch64 linux box these aren't failing anymore.
Also consolidates `metric` parameter handling in umap and friends.
@jcrist jcrist force-pushed the forward-metric-umap-nndescent branch from d275218 to e65b152 Compare March 10, 2025 21:15
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Mar 10, 2025

Done - we should now:

  • Properly forward the parameters through to libcuml/raft/etc...
  • Error nicely on unsupported metrics for umap and friends (I consolidated the logic here to be more uniform too)

Should be ready for another round of review.

@jcrist jcrist requested a review from jinsolp March 10, 2025 21:17
Copy link
Copy Markdown
Contributor

@jinsolp jinsolp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for this PR!

Copy link
Copy Markdown
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just small change request

Comment thread python/cuml/cuml/manifold/umap_utils.pyx Outdated
@jcrist jcrist requested a review from viclafargue March 11, 2025 14:56
Copy link
Copy Markdown
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion, but LGTM.

Comment thread python/cuml/cuml/manifold/umap.pyx
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.39%. Comparing base (4384f17) to head (dfd6c65).
Report is 2 commits behind head on branch-25.04.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-25.04    #6304      +/-   ##
================================================
- Coverage         68.53%   67.39%   -1.14%     
================================================
  Files               204      204              
  Lines             13199    13230      +31     
================================================
- Hits               9046     8917     -129     
- Misses             4153     4313     +160     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Mar 11, 2025

/merge

@rapids-bot rapids-bot Bot merged commit 0426c9a into branch-25.04 Mar 11, 2025
@jcrist jcrist deleted the forward-metric-umap-nndescent branch March 11, 2025 16:45
rishic3 added a commit to NVIDIA/spark-rapids-ml that referenced this pull request May 12, 2025
Follows [this PR](rapidsai/cuml#6304) from cuml.

Signed-off-by: Rishi Chandra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Cython / Python Cython or Python issue non-breaking Non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] segfault in UMAP pytests in ARM GHA jobs

6 participants