Skip to content

New Estimator Proxy architecture#6613

Merged
rapids-bot[bot] merged 21 commits intorapidsai:branch-25.06from
jcrist:new-estimator-proxy
May 9, 2025
Merged

New Estimator Proxy architecture#6613
rapids-bot[bot] merged 21 commits intorapidsai:branch-25.06from
jcrist:new-estimator-proxy

Conversation

@jcrist
Copy link
Copy Markdown
Member

@jcrist jcrist commented May 1, 2025

This PR adds a new ProxyBase base class for defining proxy estimators via composing one CPU and one GPU estimator, instead of inheriting from the GPU estimator. This lets us better match sklearn's expected interface, since the exposed class can look much more like the proxied sklearn class than one subclassed from the cuml estimator.

This is accomplished by defining a new InteropMixin mixin class to handle converting hyperparameters/attributes to/from the proxied CPU estimator. The mixin defines the public as_sklearn/from_sklearn methods, and the private methods are also used by ProxyBase.

To port an estimator from the old ProxyMixin to the new ProxyBase:

  • Switch the cuml estimator to subclass from Base instead of UniversalBase
  • Add InteropMixin to the cuml estimator, and define the required methods. Remove any old methods and decorators used by UniversalBase.
  • Switch the class in cuml.accel._wrappers.* to subclass from ProxyBase. Any special casing for certain methods should be moved to this class from the cuml proper estimator class
  • Add tests

One notable caveat of this approach is that the old device selection support will be removed from ported estimators. Ported estimators that previously would run on CPU with using_device_type no longer will - they will only run on GPU. This is a breaking change. That said, I don't believe we have a large user base doing this, and there's a clear transition path to using cuml.accel instead of using_device_type.

In this PR I've handled porting all of the currently handled linear model estimators. There are still a few remaining todos:

  • Add warning for old methods that supported device selection if device isn't GPU
  • Fix pickling (should be easy to do, just haven't done it yet)
  • Expand tests for the generic ProxyBase functionality
  • Unxfail any newly passing sklearn test suite tests

I don't intend to port any more estimators in this PR. Converting the other proxies to the new system can be done incrementally in follow up PRs.

Part of #6502.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 1, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the Cython / Python Cython or Python issue label May 1, 2025
@jcrist jcrist added breaking Breaking change improvement Improvement / enhancement to an existing function labels May 1, 2025
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented May 1, 2025

/ok to test

@csadorf csadorf linked an issue May 5, 2025 that may be closed by this pull request
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented May 6, 2025

/ok to test

@jcrist jcrist force-pushed the new-estimator-proxy branch from 8da0f7b to 926d044 Compare May 7, 2025 04:40
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented May 7, 2025

/ok to test

Comment thread python/cuml/cuml/accel/estimator_proxy.py
Comment thread python/cuml/cuml/internals/interop.py
Comment thread python/cuml/cuml/internals/interop.py
Comment thread python/cuml/cuml/internals/interop.py
Comment thread python/cuml/cuml/internals/interop.py
Comment thread python/cuml/cuml/internals/interop.py
@jcrist jcrist force-pushed the new-estimator-proxy branch from 926d044 to 8475f08 Compare May 7, 2025 19:33
@jcrist jcrist marked this pull request as ready for review May 7, 2025 19:34
@jcrist jcrist requested a review from a team as a code owner May 7, 2025 19:34
@jcrist jcrist requested review from dantegd and viclafargue May 7, 2025 19:34
@jcrist jcrist changed the title [WIP] New Estimator Proxy architecture New Estimator Proxy architecture May 7, 2025
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented May 7, 2025

Ok, this is ready for review. I haven't updated the sklearn xfail list yet (was waiting on #6629 to be merged first, will hack away at this now that it's in), but everything else should be relatively good to go.

@jcrist jcrist force-pushed the new-estimator-proxy branch from 8475f08 to b452112 Compare May 7, 2025 22:25
Copy link
Copy Markdown
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall really like the new design!

Comment thread python/cuml/cuml/accel/_wrappers/sklearn/linear_model.py
Comment thread python/cuml/cuml/accel/_wrappers/sklearn/linear_model.py
Comment thread python/cuml/cuml/accel/estimator_proxy.py
Comment thread python/cuml/cuml/accel/estimator_proxy.py
Comment thread python/cuml/cuml/linear_model/elastic_net.pyx Outdated
Comment on lines +188 to +196
return {
"alpha": model.alpha,
"l1_ratio": model.l1_ratio,
"fit_intercept": model.fit_intercept,
"tol": tol,
"max_iter": model.max_iter,
"selection": model.selection,
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is significantly more verboe, great for flexibility but I was wondering if it might be worth avoiding having to repeat all the parameters here again and perhaps use get_param_names minus the exceptions or something like that?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I much prefer the explicit verbosity for clarity over any kind of logic that we might get wrong at this point. I think that reducing verbosity and explicitness would be better done in a follow-up.

# TODO: do we want to be changing the tolerance?
tol = 0.1 * self.tol

return {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the comment above. This sems error pronse, or do we have tests to check that these lists are correct?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect sklearn test regressions to signal mistakes here.

Comment thread python/cuml/cuml/linear_model/ridge.pyx Outdated
Comment thread python/cuml/cuml/tests/test_sklearn_import_export.py
@csadorf csadorf self-requested a review May 8, 2025 17:26
@csadorf
Copy link
Copy Markdown
Contributor

csadorf commented May 8, 2025

Just noting that I'd like to get a chance for review prior to merge.

Copy link
Copy Markdown
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments, but this looks great!

Comment thread python/cuml/cuml/accel/estimator_proxy.py
Comment on lines +188 to +196
return {
"alpha": model.alpha,
"l1_ratio": model.l1_ratio,
"fit_intercept": model.fit_intercept,
"tol": tol,
"max_iter": model.max_iter,
"selection": model.selection,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I much prefer the explicit verbosity for clarity over any kind of logic that we might get wrong at this point. I think that reducing verbosity and explicitness would be better done in a follow-up.

# TODO: do we want to be changing the tolerance?
tol = 0.1 * self.tol

return {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect sklearn test regressions to signal mistakes here.

jcrist added 2 commits May 8, 2025 19:49
This adds a new mixin class for defining methods for converting cuml
estimators to/from their CPU-based counterparts.

This replaces the functionality defined currently in `UniversalBase`,
decoupling the conversion methods (``as_sklearn``, ``from_sklearn``)
from the current device selection handling.

The new estimator proxy base class will then make use of the methods
defined in this mixin class.
This adds a new base class for defining proxy estimators, making use of
the new `InteropMixin`. With this base class, proxy estimators will be
true sklearn estimators, wrapping both a sklearn estimator and a cuml
estimator (composition instead of inheritance). This lets us achieve
better compatibility with sklearn, since we can fully replicate the
required interface without requiring massive changes to existing cuml
estimators.

The migration plan for porting `cuml.accel` estimators is then:

- Move a cuml estimator back to subclassing from `Base` instead of
  `UniversalBase`.
- Add in the `InteropMixin` and define the required methods
- Define a new subclass of `ProxyBase` in the proper wrappers file. Add
  in any required special-cased method definitions as needed.
jcrist added 18 commits May 8, 2025 19:49
We no longer support selecting CPU execution with `using_device_type`
for methods on classes ported to `InteropMixin`. From our understanding,
very very few users made use of these, so we're opting for a breaking
change in removing this functionality. Users are instead encouraged to
make use of the upstream libraries (e.g. sklearn) directly, and use the
`from_sklearn`/`as_sklearn` methods to coerce to/from cuML as needed.

Since old code using `using_device_type` won't error (but will silently
always use GPU), we add a warning if the device type is explicitly set
to CPU execution on any method previously decorated with
`enable_device_selection`.
Pickling is now implemented in a way where loading the pickle:

- Works in environments where `cuml` is not installed
- Works if `cuml.accel` is not enabled

In both cases, the CPU estimator is returned instead.

Also expands tests for the `ProxyBase` estimator class. Still more to do
here, but we're getting closer to full coverage.
This:
- Makes the separation between `ProxyBase` and the legacy `ProxyMixin`
  clearer.
- Makes it possible to import cuml without sklearn installed. Once
  sklearn is a required dependency (and the `ProxyMixin` is removed),
  `is_proxy` can be moved back into `estimator_proxy.py` where it makes
  more sense to be located.
- Support `__sklearn_is_fitted__` since the default `check_is_fitted`
  doesn't work for `ProxyBase`.
- Oops, `LinearRegression` _can_ be multi-target. Fix the `InteropMixin`
  implementations as needed.
Previously we only did the sane things a user would normally do, but the
sklearn test suite wants to muck with internals. We now proxy everything
to help pass more of the sklearn test suite.
Currently cuml does this wrong when it does it at all. For now we avoid
forwarding this fit attribute.
- Validate hyperparameters before fit using sklearn's `_validate_params`
- Fixup repr to use the native `BaseEstimator` repr, which plays nicely
  with sklearn's pprint features in meta-estimators.
Sklearn generates the parameterized tests in `test_common` dynamically.
Before when our proxy estimators weren't subclasses of `BaseEstimator`
they were fully ignored by these tests. Now that they're true
`BaseEstimator` subclasses, there are suddenly a lot more tests for
them. We pass _most_ of the checks, the ones that we don't have been
added to the xfail list.

There are a few other newly failing tests. I went through them
individually - in most cases the issue was that before we had a bug, and
that bug just happened to make the test not fail. A good example of this
is the `ElasticNet` tests - there's a few that run `ElasticNet` with two
different sets of hyperparameters and compare the results. Before we
were silently dropping the relevant hyperparameters, so both estimators
were identical and would of course pass. Now they're not, and we don't
have identical `coef_` since one is running on GPU and one on CPU. But
the scores of both estimators are comparable.

We also remove several xfailed tests. All in all, I think this PR is a
net positive, and afaict doesn't introduce new bugs per the sklearn test
suite.
@jcrist jcrist force-pushed the new-estimator-proxy branch from b452112 to 063c939 Compare May 9, 2025 02:49
Copy link
Copy Markdown
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jcrist! LGTM, just a few questions/comments

Comment thread python/cuml/cuml/accel/estimator_proxy.py
Comment thread python/cuml/cuml/accel/estimator_proxy.py
Comment thread python/cuml/cuml/accel/estimator_proxy.py
Comment thread python/cuml/cuml/accel/estimator_proxy.py
Comment thread python/cuml/cuml/accel/_wrappers/sklearn/linear_model.py
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented May 9, 2025

/merge

@rapids-bot rapids-bot Bot merged commit 23d2fb5 into rapidsai:branch-25.06 May 9, 2025
88 checks passed
@jcrist jcrist deleted the new-estimator-proxy branch May 9, 2025 21:36
Ofek-Haim pushed a commit to Ofek-Haim/cuml that referenced this pull request May 13, 2025
This PR adds a new `ProxyBase` base class for defining proxy estimators via composing one CPU and one GPU estimator, instead of inheriting from the GPU estimator. This lets us better match sklearn's expected interface, since the exposed class can look much more like the proxied sklearn class than one subclassed from the cuml estimator.

This is accomplished by defining a new `InteropMixin` mixin class to handle converting hyperparameters/attributes to/from the proxied CPU estimator. The mixin defines the public `as_sklearn`/`from_sklearn` methods, and the private methods are also used by `ProxyBase`.

To port an estimator from the old `ProxyMixin` to the new `ProxyBase`:

- Switch the cuml estimator to subclass from `Base` instead of `UniversalBase`
- Add `InteropMixin` to the `cuml` estimator, and define the required methods. Remove any old methods and decorators used by `UniversalBase`.
- Switch the class in `cuml.accel._wrappers.*` to subclass from `ProxyBase`. Any special casing for certain methods should be moved to this class from the `cuml` proper estimator class
- Add tests

One notable caveat of this approach is that the old device selection support will be removed from ported estimators. Ported estimators that previously would run on CPU with `using_device_type` no longer will - they will only run on GPU. This is a breaking change. That said, I don't believe we have a large user base doing this, and there's a clear transition path to using `cuml.accel` instead of `using_device_type`.

In this PR I've handled porting all of the currently handled linear model estimators. There are still a few remaining todos:

- [x] Add warning for old methods that supported device selection if device isn't GPU
- [x] Fix pickling (should be easy to do, just haven't done it yet)
- [x] Expand tests for the generic `ProxyBase` functionality
- [x] Unxfail any newly passing sklearn test suite tests

I don't intend to port any more estimators in this PR. Converting the other proxies to the new system can be done incrementally in follow up PRs.

Part of rapidsai#6502.

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Simon Adorf (https://github.com/csadorf)
  - Victor Lafargue (https://github.com/viclafargue)

URL: rapidsai#6613
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement new proxy estimator class using composition

6 participants