Skip to content

Support set_output and get_feature_names_out in cuml.accel#6942

Merged
rapids-bot[bot] merged 4 commits intorapidsai:branch-25.08from
jcrist:cuml-accel-set-output
Jun 30, 2025
Merged

Support set_output and get_feature_names_out in cuml.accel#6942
rapids-bot[bot] merged 4 commits intorapidsai:branch-25.08from
jcrist:cuml-accel-set-output

Conversation

@jcrist
Copy link
Copy Markdown
Member

@jcrist jcrist commented Jun 27, 2025

This adds support for set_output and get_feature_names_out in cuml.accel. This API addition affects any Transformer exposed through cuml.accel (currently only KMeans, TSNE, PCA, and TruncatedSVD). The support is automatic and implemented in the ProxyBase class, so any newly wrapped estimators should automatically support it. Likewise, a test is added to auto-test any estimators exposing these methods.

Fixes #6606.

@jcrist jcrist self-assigned this Jun 27, 2025
@jcrist jcrist requested a review from a team as a code owner June 27, 2025 17:10
@jcrist jcrist requested review from csadorf and viclafargue June 27, 2025 17:10
@jcrist jcrist added Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change cuml-accel Issues related to cuml.accel labels Jun 27, 2025
Copy link
Copy Markdown
Member Author

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments for reviewers

Comment thread python/cuml/cuml/accel/estimator_proxy.py
Comment thread python/cuml/cuml_accel_tests/test_estimator_proxy.py
Comment thread python/cuml/cuml_accel_tests/test_set_output.py
jcrist added 3 commits June 29, 2025 21:06
This adds support for `set_output` and `get_feature_names_out` in
`cuml.accel` wrapped estimators.
umap-learn <= 0.5.7 doesn't properly implement `get_feature_names_out`;
this method won't match the proper signature until then. Rather than
hack around this in `cuml-accel` to fit the incorrect signature in
`umap-learn`, I'd rather xfail this test for now.

There is very little likelihood of a user actually running into this.
`umap.UMAP` doesn't subclass from `TransformerMixin`, which means it
doesn't have a `set_output` method (the only reason
`get_feature_names_out` would be called in the wild).
@jcrist jcrist force-pushed the cuml-accel-set-output branch from 1b2315d to 1095b41 Compare June 30, 2025 04:07
Copy link
Copy Markdown
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the detailed commentary motivating various choices.

LGTM, but I'm a bit surprised that it has not led to a reduction in the number of xfailed tests.

Comment thread python/cuml/cuml_accel_tests/test_core.py
Comment thread python/cuml/cuml_accel_tests/test_estimator_proxy.py
Comment thread python/cuml/cuml_accel_tests/test_set_output.py
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Jun 30, 2025

LGTM, but I'm a bit surprised that it has not led to a reduction in the number of xfailed tests.

I was also a bit surprised by that. AFAICT sklearn doesn't test set_output handling as part of the generic check_estimator, so we didn't have a failing test for each estimator that hadn't implemented it before.

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Jun 30, 2025

/merge

@rapids-bot rapids-bot Bot merged commit ea8fd1d into rapidsai:branch-25.08 Jun 30, 2025
66 of 67 checks passed
@jcrist jcrist deleted the cuml-accel-set-output branch June 30, 2025 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuml-accel Issues related to cuml.accel Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add scikit-learn set_output() API support to cuml.accel

3 participants