Skip to content

[BUG] n_neighbors should be smaller than the graph degree computed by nn descent #6091

@thorstenwagner

Description

@thorstenwagner

I updated to the latest cuml (from 23.12). I'm fitting a umap to dataset with 32 features and 400k samples.

With 23.12 I did that with n_neighbors=200 and n_components=2 and it worked. With the latest version (24.08) I get:

Traceback (most recent call last):
  File "/mnt/data/twagner/Projects/TomoTwin/results/test_runs/test.py", line 11, in <module>
    reducer.fit(np.random.randn(52000,32))
  File "/opt/user_software/miniconda3_envs/tomotwin2/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/user_software/miniconda3_envs/tomotwin2/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
    return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/user_software/miniconda3_envs/tomotwin2/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
  File "umap.pyx", line 668, in cuml.manifold.umap.UMAP.fit
RuntimeError: RAFT failure at file=/opt/conda/conda-bld/work/cpp/src/umap/knn_graph/algo.cuh line=115: n_neighbors should be smaller than the graph degree computed by nn descent
Obtained 25 stack frames

The magic n_neighbors number when it starts working is 64, which seems to be the default according this documentation: https://docs.rapids.ai/api/cuvs/stable/cpp_api/neighbors_nn_descent/

Here is a script to reproduce the issue:

import cuml
import numpy as np
reducer = cuml.UMAP(
    n_neighbors=200,
    n_components=2,
    n_epochs=None,  # means automatic selection
    min_dist=0.0,
    random_state=19,
    metric="euclidean"
)
reducer.fit(np.random.randn(400000,32))
print("Done")

Interestingly, when I reduce the number of samples from 400k to 50k it also works.

Any ideas what I'm doing wrong?

Metadata

Metadata

Assignees

Labels

? - Needs TriageNeed team to review and classifybugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions