Skip to content

[BUG] Fix device access check in build_sorted_mst#1083

Merged
rapids-bot[bot] merged 10 commits intorapidsai:branch-25.08from
tarang-jain:fix-device-access
Jul 9, 2025
Merged

[BUG] Fix device access check in build_sorted_mst#1083
rapids-bot[bot] merged 10 commits intorapidsai:branch-25.08from
tarang-jain:fix-device-access

Conversation

@tarang-jain
Copy link
Copy Markdown
Contributor

@tarang-jain tarang-jain commented Jul 3, 2025

X may be in managed memory.
Fixes rapidsai/cuml#6948

@tarang-jain tarang-jain requested a review from a team as a code owner July 3, 2025 21:47
@github-actions github-actions Bot added the cpp label Jul 3, 2025
@tarang-jain tarang-jain added bug Something isn't working non-breaking Introduces a non-breaking change labels Jul 3, 2025
@cjnolet cjnolet moved this to In Progress in Unstructured Data Processing Jul 3, 2025
@jinsolp
Copy link
Copy Markdown
Contributor

jinsolp commented Jul 3, 2025

Did you run into an issue with the current check? Because we previously discussed this issue when we were adding a hotfix for UMAP for the 25.06 release. (link).

@tarang-jain
Copy link
Copy Markdown
Contributor Author

@jinsolp I just saw the hotfix. Yes I did run into an issue. In the cuml accel tests for hdbscan, the host side connect_knn_graph will be called, but the Mutual Reachabilty distance across components is actually embedded inside the reduction op, which is not handled by the host side code path at the moment (seems to be hard coded to pairwise).

Secondly, there is an edge case arising because of this code path, which is not handled by raft's mst solver. That edge case is that the mst solver is called on a graph that has only a single edge. I am opening a PR in raft to handle this.

The second one is causing the cuML optional check (rapidsai/cuml#6948) to fail.

@tarang-jain
Copy link
Copy Markdown
Contributor Author

How about a boolean to force one code path over the other (when the X matrix is device accessible)?

@jinsolp
Copy link
Copy Markdown
Contributor

jinsolp commented Jul 3, 2025

How about a boolean to force one code path over the other (when the X matrix is device accessible)?

I see, and I agree. I guess this is what this PR is doing?

@tarang-jain
Copy link
Copy Markdown
Contributor Author

@jinsolp I'll update this PR. At this time, it is forcing the device route if X is device accessible, whereas for UMAP you prefer to use the host path despite it being device accessible. A boolean override variable is the only way to satisfy both these scenarios.

@tarang-jain tarang-jain requested a review from jinsolp July 4, 2025 00:26
Copy link
Copy Markdown
Contributor

@jinsolp jinsolp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread cpp/src/cluster/detail/mst.cuh Outdated
@tarang-jain
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot Bot merged commit 5500811 into rapidsai:branch-25.08 Jul 9, 2025
53 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing Jul 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cpp non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[CI] test_single_linkage_sklearn_compare failure with cudf-pandas enabled

4 participants