Deprecating data_on_host parameter for UMAP#6953
Deprecating data_on_host parameter for UMAP#6953rapids-bot[bot] merged 14 commits intorapidsai:branch-25.08from
data_on_host parameter for UMAP#6953Conversation
|
Made changes to reflect all suggestions, cleaned up the code a bit, and added a separate test for param handling. |
|
Approved pending that comments by @jcrist are addressed. |
| # for getting n_rows of the dataset | ||
| _, self.n_rows, _, _ = \ | ||
| input_to_cuml_array(X, order='C', check_dtype=np.float32, | ||
| convert_to_dtype=(np.float32 | ||
| if convert_dtype | ||
| else None)) |
There was a problem hiding this comment.
Can we hold on to merging this for a while? I added this part since we need
- get number of rows
- decide build algo given auto using number of rows
- decide mem type given configured build algo (which should return the number of rows)
But I think calling input_to_cuml_array just for the sake of getting rows will add to mem usage. Let me figure this out before we merge this PR.
There was a problem hiding this comment.
Ah, nice catch, this is indeed not something we want to call here.
For getting the number of rows in X I believe you should be able to use len on every data type we accept as X. This works for 2D and 1D arrays (cupy, numpy, cuml), as well as dataframes (pandas or cudf).
There was a problem hiding this comment.
@jinsolp I've added the "DO NOT MERGE" label to prevent accidental merge. Just remove it (or ask me to remove it in case you lack permission) once this is resolved.
There was a problem hiding this comment.
Looks like len doesn't work on sparse inputs. Please use X.shape[0] instead, which should also work with all the inputs stated above (but also will work with sparse inputs)
|
/merge |
fdebb9e
into
rapidsai:branch-25.08
Closing #6886