I updated to the latest cuml (from 23.12). I'm fitting a umap to dataset with 32 features and 400k samples.
With 23.12 I did that with n_neighbors=200 and n_components=2 and it worked. With the latest version (24.08) I get:
Traceback (most recent call last):
File "/mnt/data/twagner/Projects/TomoTwin/results/test_runs/test.py", line 11, in <module>
reducer.fit(np.random.randn(52000,32))
File "/opt/user_software/miniconda3_envs/tomotwin2/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/user_software/miniconda3_envs/tomotwin2/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/user_software/miniconda3_envs/tomotwin2/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
File "umap.pyx", line 668, in cuml.manifold.umap.UMAP.fit
RuntimeError: RAFT failure at file=/opt/conda/conda-bld/work/cpp/src/umap/knn_graph/algo.cuh line=115: n_neighbors should be smaller than the graph degree computed by nn descent
Obtained 25 stack frames
The magic n_neighbors number when it starts working is 64, which seems to be the default according this documentation: https://docs.rapids.ai/api/cuvs/stable/cpp_api/neighbors_nn_descent/
Here is a script to reproduce the issue:
import cuml
import numpy as np
reducer = cuml.UMAP(
n_neighbors=200,
n_components=2,
n_epochs=None, # means automatic selection
min_dist=0.0,
random_state=19,
metric="euclidean"
)
reducer.fit(np.random.randn(400000,32))
print("Done")
Interestingly, when I reduce the number of samples from 400k to 50k it also works.
Any ideas what I'm doing wrong?
I updated to the latest cuml (from 23.12). I'm fitting a umap to dataset with 32 features and 400k samples.
With 23.12 I did that with
n_neighbors=200andn_components=2and it worked. With the latest version (24.08) I get:The magic
n_neighborsnumber when it starts working is 64, which seems to be the default according this documentation: https://docs.rapids.ai/api/cuvs/stable/cpp_api/neighbors_nn_descent/Here is a script to reproduce the issue:
Interestingly, when I reduce the number of samples from 400k to 50k it also works.
Any ideas what I'm doing wrong?