Enable HDBSCAN gpu training and cpu inference#6108
Merged
rapids-bot[bot] merged 4 commits intorapidsai:branch-24.12from Oct 17, 2024
Merged
Enable HDBSCAN gpu training and cpu inference#6108rapids-bot[bot] merged 4 commits intorapidsai:branch-24.12from
gpu training and cpu inference#6108rapids-bot[bot] merged 4 commits intorapidsai:branch-24.12from
Conversation
divyegala
commented
Oct 16, 2024
| - statsmodels | ||
| - umap-learn==0.5.6 | ||
| - pynndescent | ||
| - setuptools # Needed on Python 3.12 for dask-glm, which requires pkg_resources but Python 3.12 doesn't have setuptools by default |
Member
Author
There was a problem hiding this comment.
dask-glm was removed by PR #6028 so this is now unnecessary
KyleFromNVIDIA
approved these changes
Oct 17, 2024
bdice
approved these changes
Oct 17, 2024
dantegd
approved these changes
Oct 17, 2024
Member
dantegd
left a comment
There was a problem hiding this comment.
looks great to me, had a question about a comment but that's all
Comment on lines
+172
to
+174
| # These attributes have to be reassigned to the CPU model | ||
| # as the raw arrays because the reference HDBSCAN implementation | ||
| # reconstructs the objects from the raw arrays |
Member
There was a problem hiding this comment.
This happens in the setters in the hdbscan library, right?
Member
Author
There was a problem hiding this comment.
Essentially it happens in the getters. Here's the issue, consider CondensedTree object:
- We use setter to assign
CondensedTreeobject toself.condensed_tree_from cuML to hdbscan - The getter for
self.condensed_tree_checks if it has a value already. If it does, it assumes that it is raw numpy arrays and creates anotherCondensedTreeobject without any value sanitization
That's why I re-assigned the raw arrays, so when hdbscan library internally calls the getters it reconstructs the object correctly.
wphicks
approved these changes
Oct 17, 2024
Member
Author
|
/merge |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Until now, we supported all combinations of GPU/CPU interoperability except the one mentioned in the title. This was because the CPU HDBSCAN package was missing attribute setters. With scikit-learn-contrib/hdbscan#657, attribute setters are now available which allow us to transfer GPU trained attributes to the CPU model. This feature is available as part of
hdbscan=0.8.39