Skip to content

Fix UMAP issues with large inputs#6245

Merged
rapids-bot[bot] merged 33 commits intorapidsai:branch-25.04from
viclafargue:fix-umap-large-inputs
Feb 13, 2025
Merged

Fix UMAP issues with large inputs#6245
rapids-bot[bot] merged 33 commits intorapidsai:branch-25.04from
viclafargue:fix-umap-large-inputs

Conversation

@viclafargue
Copy link
Copy Markdown
Contributor

Answers #6204

Copy link
Copy Markdown
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Let's just make sure to IWYU for the new uses of uint64_t. Using C++ types (std::uint64_t) in our non-CUDA code would be a bonus, but it shouldn't block merge. I've also called out some spots where we could use uniform initialization syntax rather than a bare cast.

Comment thread cpp/include/cuml/common/callback.hpp Outdated
Comment thread cpp/include/cuml/manifold/common.hpp
Comment thread cpp/src/umap/fuzzy_simpl_set/naive.cuh Outdated
Comment thread cpp/src/tsne/tsne_runner.cuh Outdated
Comment thread cpp/src/umap/fuzzy_simpl_set/naive.cuh Outdated
Comment thread cpp/src/umap/knn_graph/algo.cuh Outdated
Comment thread cpp/src/umap/simpl_set_embed/algo.cuh Outdated
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jan 23, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@wphicks
Copy link
Copy Markdown
Contributor

wphicks commented Jan 23, 2025

@viclafargue That last commit was unsigned. Could you sign it and push that up?

Copy link
Copy Markdown
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viclafargue and I discussed this briefly last week, but given the nature of the 64-bit hardcoded changes here, I would like to see at least a small benchmark before this is merged so that we can feel comfortable that this doesn’t have a huge impact on the runtime.

@divyegala divyegala requested a review from a team as a code owner February 3, 2025 18:42
@divyegala divyegala requested a review from vyasr February 3, 2025 18:42
@github-actions github-actions Bot added the CMake label Feb 3, 2025
@divyegala divyegala changed the base branch from branch-25.02 to branch-25.04 February 3, 2025 18:43
@divyegala divyegala requested a review from a team as a code owner February 3, 2025 18:43
Comment thread cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh Outdated
Comment thread cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh Outdated
Comment thread cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh Outdated
Copy link
Copy Markdown
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issues as raft, but this is hardcoding types in perf critical code. Please use templates- it allows us to quickly switch.

@github-actions github-actions Bot removed conda conda issue Cython / Python Cython or Python issue CMake labels Feb 7, 2025
@wphicks
Copy link
Copy Markdown
Contributor

wphicks commented Feb 7, 2025

/ok to test

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Feb 7, 2025

/ok to test

Comment thread cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh
@viclafargue
Copy link
Copy Markdown
Contributor Author

This benchmark was ran a while back just before the last changes. It demonstrate that there does not seem to be a performance drop when switching to uin64_t. However, it could still be preferable to implement a dispatching mechanism that would store the indices on 32 bits below a certain number of rows. To prevent any delay in merging this PR, I propose opening a separate PR based on this one to handle this properly.

UMAP_branch-25.04_bench.csv
UMAP_53d276c_bench.csv

@divyegala
Copy link
Copy Markdown
Member

/ok to test

@jcrist
Copy link
Copy Markdown
Member

jcrist commented Feb 12, 2025

/ok to test

@dantegd
Copy link
Copy Markdown
Member

dantegd commented Feb 13, 2025

/ok to test

Copy link
Copy Markdown
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for now, but can you please create an issue to follow up on the hardcodings of uint64_t in umap.cu? It would be nice for us to figure a good strategy to determine whether or not we should be using uint64_t or int based on the dataset size, rasther than hardcoding everywhere. cc @dantegd

@dantegd
Copy link
Copy Markdown
Member

dantegd commented Feb 13, 2025

/merge

@rapids-bot rapids-bot Bot merged commit 9c0166a into rapidsai:branch-25.04 Feb 13, 2025
@viclafargue
Copy link
Copy Markdown
Contributor Author

Here is the issue #6310

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CUDA/C++ non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants