Skip to content

Deprecating data_on_host parameter for UMAP#6953

Merged
rapids-bot[bot] merged 14 commits intorapidsai:branch-25.08from
jinsolp:deprecate-data-on-host
Jul 3, 2025
Merged

Deprecating data_on_host parameter for UMAP#6953
rapids-bot[bot] merged 14 commits intorapidsai:branch-25.08from
jinsolp:deprecate-data-on-host

Conversation

@jinsolp
Copy link
Copy Markdown
Contributor

@jinsolp jinsolp commented Jul 1, 2025

Closing #6886

@jinsolp jinsolp self-assigned this Jul 1, 2025
@jinsolp jinsolp requested a review from a team as a code owner July 1, 2025 17:54
@jinsolp jinsolp requested review from jcrist and vyasr July 1, 2025 17:54
@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Jul 1, 2025
@jinsolp jinsolp requested review from csadorf and removed request for vyasr July 1, 2025 17:54
@jinsolp jinsolp added breaking Breaking change improvement Improvement / enhancement to an existing function labels Jul 1, 2025
Comment thread python/cuml/cuml/manifold/umap.pyx Outdated
Comment thread python/cuml/cuml/manifold/umap.pyx Outdated
Comment thread python/cuml/cuml/manifold/umap.pyx Outdated
Comment thread python/cuml/cuml/manifold/umap.pyx
Comment thread python/cuml/cuml/manifold/umap.pyx Outdated
Comment thread python/cuml/cuml/tests/test_umap.py Outdated
@jinsolp
Copy link
Copy Markdown
Contributor Author

jinsolp commented Jul 2, 2025

Made changes to reflect all suggestions, cleaned up the code a bit, and added a separate test for param handling.

@csadorf
Copy link
Copy Markdown
Contributor

csadorf commented Jul 2, 2025

Approved pending that comments by @jcrist are addressed.

Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (assuming CI passes). Thanks @jinsolp!

Comment thread python/cuml/cuml/manifold/umap.pyx Outdated
Comment on lines +761 to +766
# for getting n_rows of the dataset
_, self.n_rows, _, _ = \
input_to_cuml_array(X, order='C', check_dtype=np.float32,
convert_to_dtype=(np.float32
if convert_dtype
else None))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we hold on to merging this for a while? I added this part since we need

  1. get number of rows
  2. decide build algo given auto using number of rows
  3. decide mem type given configured build algo (which should return the number of rows)

But I think calling input_to_cuml_array just for the sake of getting rows will add to mem usage. Let me figure this out before we merge this PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice catch, this is indeed not something we want to call here.

For getting the number of rows in X I believe you should be able to use len on every data type we accept as X. This works for 2D and 1D arrays (cupy, numpy, cuml), as well as dataframes (pandas or cudf).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinsolp I've added the "DO NOT MERGE" label to prevent accidental merge. Just remove it (or ask me to remove it in case you lack permission) once this is resolved.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like len doesn't work on sparse inputs. Please use X.shape[0] instead, which should also work with all the inputs stated above (but also will work with sparse inputs)

@csadorf csadorf added the DO NOT MERGE Hold off on merging; see PR for details label Jul 2, 2025
@jinsolp jinsolp removed the DO NOT MERGE Hold off on merging; see PR for details label Jul 2, 2025
Comment thread python/cuml/cuml/manifold/umap.pyx Outdated
@jcrist
Copy link
Copy Markdown
Member

jcrist commented Jul 3, 2025

/merge

@rapids-bot rapids-bot Bot merged commit fdebb9e into rapidsai:branch-25.08 Jul 3, 2025
101 of 107 checks passed
@jinsolp jinsolp deleted the deprecate-data-on-host branch July 3, 2025 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants