Skip to content

Fix Dask DBSCAN tree reduction#7398

Merged
rapids-bot[bot] merged 2 commits intorapidsai:mainfrom
viclafargue:fix-dask-dbscan-tree-reduction
Oct 30, 2025
Merged

Fix Dask DBSCAN tree reduction#7398
rapids-bot[bot] merged 2 commits intorapidsai:mainfrom
viclafargue:fix-dask-dbscan-tree-reduction

Conversation

@viclafargue
Copy link
Copy Markdown
Contributor

Closes #7341

The tree reduction operation of Dask DBSCAN had an issue. The communicator was waiting for communication request completion even when none was ordered (passive rank) causing a deadlock.
Additionally, there was an issue with the UCX comms being set with an incorrect number of ranks when Dask workers fail to register to the scheduler.

Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. Was there a test before that would fail when run on a multi-gpu system? If not, is there an easy way to add one?

Comment thread cpp/src/dbscan/mergelabels/tree_reduction.cuh Outdated
@viclafargue viclafargue added bug Something isn't working non-breaking Non-breaking change labels Oct 28, 2025
@viclafargue
Copy link
Copy Markdown
Contributor Author

Was there a test before that would fail when run on a multi-gpu system? If not, is there an easy way to add one?

Almost all of the existing tests would hang when ran on more than 2 GPUs.

@viclafargue viclafargue requested a review from divyegala October 29, 2025 09:19
@viclafargue
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot Bot merged commit f986a40 into rapidsai:main Oct 30, 2025
108 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CUDA/C++ Cython / Python Cython or Python issue non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]DBSCAN inconsistent results / hangs multiple GPUS

4 participants