Fix UMAP outlier issue by checking for outliers and shuffling by jinsolp · Pull Request #7131 · rapidsai/cuml

jinsolp · 2025-08-25T20:23:40Z

Closing #6454

Main difference between out simplicial set embedding and CPU UMAP was in negative sampling.
We should use updated values (value after adding gradients) in the negative sampling stage.

Dispatched to two kernels (and three usages) based on `n_components. Fixed like below.

optimize_batch_kernel_reg (n_components=2): update the current_reg register value (used later in the negative sampling stage) along with grads
optimize_batch_kernel (with shared memory): distinguish current_buffer (which used to JUST hold the gradient) from the grad_buffer. Now current_buffer and grad_buffer corresponds to the current_reg and grads registers in the register-approch kernel.
optimize_batch_kernel (without shared memory): untouched because the grads are applied directly to global memory. This updated value in global memory is read directly for negative sampling later on.

Visualizations 2D

50K samples random selected for plotting.
From the left

CPU KNN + CPU UMAP
GPU KNN + CPU UMAP
GPU KNN + GPU UMAP Before fix
GPU KNN + GPU UMAP After fix in this PR

Using dataset 639K x 384

Using dataset 1.8M x 384

Visualizations 3D

50K samples random selected for plotting.
Plotting the same dataset with n_components=3 (Which uses the second kernel).
From the left

GPU KNN + CPU UMAP
GPU KNN + GPU UMAP Before fix
GPU KNN + GPU UMAP After fix in this PR

Using dataset 639K x 384 (was already doing pretty well without outliers, still doing well)

Using dataset 1.8M x 384
before fix had outliers.

viclafargue

Thanks @jinsolp! It looks great.

viclafargue · 2025-08-26T08:57:23Z

-                           truncate_gradient(rounding, current_buffer[d * TPB_X]));
+      raft::myAtomicAdd<T>((T*)cur_write + d, truncate_gradient(rounding, grads_buffer[d * TPB_X]));


Importantly, when random_state is set, current != cur_write and other != oth_write as updates accumulate in separate buffer to allow high precision deterministic accumulation of updates. It looks like we may still have outliers in this case? But, I guess that is acceptable for now.

jinsolp · 2025-09-05T21:49:23Z

Have to add unit test for outlier checking. Plan is to grab a large enough dataset that originally fails (i.e. has outliers).
We can use the all-neighbors python API to build the knn graph quickly, give that as precomputed graphs to both CPU and cuML UMAP, and compare the resulting embeddings.

e.g. get min/max values of the CPU embedding, and check if all values in out embedding is within a certain threshold of that range.

viclafargue

Thanks @jinsolp! It looks great! However, I believe that we would have to apply the shuffling before make_epochs_per_sample is called (see comment).

csadorf · 2025-09-25T18:12:59Z

 import pytest
 import scipy.sparse as scipy_sparse
 import umap
+from cuvs.neighbors import all_neighbors, nn_descent


We should not directly import the cuvs Python API in cuML. If we do, then we need to cuvs (not just libcuvs) to our test dependencies. CC @divyegala

we can get rid of this, but if we do, we have to run the full e2e cpu umap on a not-so-small dataset for comparison (because outliers don't show up with small datasets)

We can consider adding cuvs to the test dependencies, but then let's make sure to guard the cuvs import with pytest.importorskip. I'll let @divyegala chime in on this.

Yeah, it is fine as a testing dependency. Please point me to the commit that adds cuvs as so, I will verify that we don't leak the dependency by mistake.

Using importorskip for now. Left an issue: #7279

It looks like you are still using a direct import here?

copy-pr-bot · 2025-09-29T19:10:35Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

csadorf

Thanks a lot!

divyegala · 2025-09-30T20:14:10Z

/merge

fix outlier issue in optimize_batch_kernel_reg kernel

6e71fce

jinsolp requested a review from a team as a code owner August 25, 2025 20:23

jinsolp requested review from bdice and dantegd August 25, 2025 20:23

github-actions Bot assigned jinsolp Aug 25, 2025

github-actions Bot added the CUDA/C++ label Aug 25, 2025

jinsolp added bug Something isn't working non-breaking Non-breaking change labels Aug 25, 2025

jinsolp changed the title ~~Fix UMAP outlier issue by using updated gradients in negative sampling.~~ Fix UMAP outlier issue by using updated values in negative sampling. Aug 25, 2025

jinsolp removed request for bdice and dantegd August 25, 2025 20:30

distinguish grad vs current_buf in optimize_batch_kernel

4055864

jinsolp force-pushed the fix-umap-outlier branch from ee73d15 to 4055864 Compare August 25, 2025 21:39

viclafargue approved these changes Aug 26, 2025

View reviewed changes

jinsolp added 2 commits September 4, 2025 16:06

Merge branch 'rapidsai:branch-25.10' into fix-umap-outlier

eae41a5

shuffling for removing outliers for dense points

c00cd27

jinsolp changed the title ~~Fix UMAP outlier issue by using updated values in negative sampling.~~ Fix UMAP outlier issue by checking for outliers and shuffling Sep 5, 2025

Merge branch 'branch-25.10' into fix-umap-outlier

6bf6c78

viclafargue requested changes Sep 10, 2025

View reviewed changes

Comment thread cpp/src/umap/simpl_set_embed/algo.cuh Outdated

Comment thread cpp/src/umap/simpl_set_embed/algo.cuh

Merge branch 'branch-25.10' into fix-umap-outlier

6a3d96c

viclafargue approved these changes Sep 16, 2025

View reviewed changes

jinsolp mentioned this pull request Sep 16, 2025

[BUG] UMAP with random state seed produces noisy results. #7176

Closed

csadorf mentioned this pull request Sep 22, 2025

Fix UMAP outlier issue #6662

Closed

csadorf linked an issue Sep 22, 2025 that may be closed by this pull request

cuml.UMAP embeddings result in outliers #6454

Closed

jinsolp added 4 commits September 22, 2025 19:59

smaller threshold

a5af5c0

Merge branch 'branch-25.10' into fix-umap-outlier

4e4e5dd

heuristing thresholds for smaller data

9cc2b9a

outlier test

5cf16e3

jinsolp requested a review from a team as a code owner September 22, 2025 22:19

jinsolp requested a review from betatim September 22, 2025 22:19

github-actions Bot added the Cython / Python Cython or Python issue label Sep 22, 2025

Intron7 mentioned this pull request Sep 22, 2025

[BUG] Unexpected UMAP behavior after scVI correction scverse/rapids-singlecell#463

Closed

jinsolp force-pushed the fix-umap-outlier branch from 815a018 to 5cf16e3 Compare September 23, 2025 19:10

jinsolp and others added 3 commits September 23, 2025 19:11

flag

b474e8c

Merge branch 'branch-25.10' into fix-umap-outlier

10c8637

Merge branch 'branch-25.10' into fix-umap-outlier

52b6f1c

csadorf requested changes Sep 25, 2025

View reviewed changes

divyegala reviewed Sep 26, 2025

View reviewed changes

Comment thread cpp/src/umap/simpl_set_embed/algo.cuh Outdated

Comment thread cpp/src/umap/simpl_set_embed/algo.cuh Outdated

Comment thread cpp/src/umap/simpl_set_embed/algo.cuh Outdated

jinsolp added 2 commits September 29, 2025 10:42

Merge branch 'branch-25.10' into fix-umap-outlier

95211f7

feedback

823fb73

jinsolp added 3 commits September 29, 2025 20:17

parameterize test

4953056

Merge branch 'branch-25.10' into fix-umap-outlier

0461dea

Merge branch 'branch-25.10' into fix-umap-outlier

4a91537

divyegala approved these changes Sep 30, 2025

View reviewed changes

Fixup the pytest.importorskip logic.

edae314

csadorf approved these changes Sep 30, 2025

View reviewed changes

Merge branch 'branch-25.10' into fix-umap-outlier

e3d8c83

rapids-bot Bot merged commit e736d05 into rapidsai:branch-25.10 Oct 1, 2025
200 of 202 checks passed

jinsolp deleted the fix-umap-outlier branch October 1, 2025 00:31

		truncate_gradient(rounding, current_buffer[d * TPB_X]));
		raft::myAtomicAdd<T>((T)cur_write + d, truncate_gradient(rounding, grads_buffer[d TPB_X]));

Conversation

jinsolp commented Aug 25, 2025

Visualizations 2D

Visualizations 3D

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

viclafargue Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jinsolp commented Sep 5, 2025

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

csadorf Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

jinsolp Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

csadorf Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

divyegala Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

jinsolp Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

csadorf Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

copy-pr-bot Bot commented Sep 29, 2025

Uh oh!

csadorf left a comment

Choose a reason for hiding this comment

Uh oh!

divyegala commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

viclafargue Aug 26, 2025 •

edited

Loading