New Estimator Proxy architecture#6613
New Estimator Proxy architecture#6613rapids-bot[bot] merged 21 commits intorapidsai:branch-25.06from
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
/ok to test |
8da0f7b to
926d044
Compare
|
/ok to test |
926d044 to
8475f08
Compare
|
Ok, this is ready for review. I haven't updated the sklearn xfail list yet (was waiting on #6629 to be merged first, will hack away at this now that it's in), but everything else should be relatively good to go. |
8475f08 to
b452112
Compare
dantegd
left a comment
There was a problem hiding this comment.
Overall really like the new design!
| return { | ||
| "alpha": model.alpha, | ||
| "l1_ratio": model.l1_ratio, | ||
| "fit_intercept": model.fit_intercept, | ||
| "tol": tol, | ||
| "max_iter": model.max_iter, | ||
| "selection": model.selection, | ||
| } |
There was a problem hiding this comment.
The method is significantly more verboe, great for flexibility but I was wondering if it might be worth avoiding having to repeat all the parameters here again and perhaps use get_param_names minus the exceptions or something like that?
There was a problem hiding this comment.
I much prefer the explicit verbosity for clarity over any kind of logic that we might get wrong at this point. I think that reducing verbosity and explicitness would be better done in a follow-up.
| # TODO: do we want to be changing the tolerance? | ||
| tol = 0.1 * self.tol | ||
|
|
||
| return { |
There was a problem hiding this comment.
Same as the comment above. This sems error pronse, or do we have tests to check that these lists are correct?
There was a problem hiding this comment.
I would expect sklearn test regressions to signal mistakes here.
|
Just noting that I'd like to get a chance for review prior to merge. |
csadorf
left a comment
There was a problem hiding this comment.
Just a couple of comments, but this looks great!
| return { | ||
| "alpha": model.alpha, | ||
| "l1_ratio": model.l1_ratio, | ||
| "fit_intercept": model.fit_intercept, | ||
| "tol": tol, | ||
| "max_iter": model.max_iter, | ||
| "selection": model.selection, | ||
| } |
There was a problem hiding this comment.
I much prefer the explicit verbosity for clarity over any kind of logic that we might get wrong at this point. I think that reducing verbosity and explicitness would be better done in a follow-up.
| # TODO: do we want to be changing the tolerance? | ||
| tol = 0.1 * self.tol | ||
|
|
||
| return { |
There was a problem hiding this comment.
I would expect sklearn test regressions to signal mistakes here.
This adds a new mixin class for defining methods for converting cuml estimators to/from their CPU-based counterparts. This replaces the functionality defined currently in `UniversalBase`, decoupling the conversion methods (``as_sklearn``, ``from_sklearn``) from the current device selection handling. The new estimator proxy base class will then make use of the methods defined in this mixin class.
This adds a new base class for defining proxy estimators, making use of the new `InteropMixin`. With this base class, proxy estimators will be true sklearn estimators, wrapping both a sklearn estimator and a cuml estimator (composition instead of inheritance). This lets us achieve better compatibility with sklearn, since we can fully replicate the required interface without requiring massive changes to existing cuml estimators. The migration plan for porting `cuml.accel` estimators is then: - Move a cuml estimator back to subclassing from `Base` instead of `UniversalBase`. - Add in the `InteropMixin` and define the required methods - Define a new subclass of `ProxyBase` in the proper wrappers file. Add in any required special-cased method definitions as needed.
We no longer support selecting CPU execution with `using_device_type` for methods on classes ported to `InteropMixin`. From our understanding, very very few users made use of these, so we're opting for a breaking change in removing this functionality. Users are instead encouraged to make use of the upstream libraries (e.g. sklearn) directly, and use the `from_sklearn`/`as_sklearn` methods to coerce to/from cuML as needed. Since old code using `using_device_type` won't error (but will silently always use GPU), we add a warning if the device type is explicitly set to CPU execution on any method previously decorated with `enable_device_selection`.
Pickling is now implemented in a way where loading the pickle: - Works in environments where `cuml` is not installed - Works if `cuml.accel` is not enabled In both cases, the CPU estimator is returned instead. Also expands tests for the `ProxyBase` estimator class. Still more to do here, but we're getting closer to full coverage.
This: - Makes the separation between `ProxyBase` and the legacy `ProxyMixin` clearer. - Makes it possible to import cuml without sklearn installed. Once sklearn is a required dependency (and the `ProxyMixin` is removed), `is_proxy` can be moved back into `estimator_proxy.py` where it makes more sense to be located.
- Support `__sklearn_is_fitted__` since the default `check_is_fitted` doesn't work for `ProxyBase`. - Oops, `LinearRegression` _can_ be multi-target. Fix the `InteropMixin` implementations as needed.
Previously we only did the sane things a user would normally do, but the sklearn test suite wants to muck with internals. We now proxy everything to help pass more of the sklearn test suite.
Currently cuml does this wrong when it does it at all. For now we avoid forwarding this fit attribute.
- Validate hyperparameters before fit using sklearn's `_validate_params` - Fixup repr to use the native `BaseEstimator` repr, which plays nicely with sklearn's pprint features in meta-estimators.
Sklearn generates the parameterized tests in `test_common` dynamically. Before when our proxy estimators weren't subclasses of `BaseEstimator` they were fully ignored by these tests. Now that they're true `BaseEstimator` subclasses, there are suddenly a lot more tests for them. We pass _most_ of the checks, the ones that we don't have been added to the xfail list. There are a few other newly failing tests. I went through them individually - in most cases the issue was that before we had a bug, and that bug just happened to make the test not fail. A good example of this is the `ElasticNet` tests - there's a few that run `ElasticNet` with two different sets of hyperparameters and compare the results. Before we were silently dropping the relevant hyperparameters, so both estimators were identical and would of course pass. Now they're not, and we don't have identical `coef_` since one is running on GPU and one on CPU. But the scores of both estimators are comparable. We also remove several xfailed tests. All in all, I think this PR is a net positive, and afaict doesn't introduce new bugs per the sklearn test suite.
b452112 to
063c939
Compare
viclafargue
left a comment
There was a problem hiding this comment.
Thanks @jcrist! LGTM, just a few questions/comments
|
/merge |
This PR adds a new `ProxyBase` base class for defining proxy estimators via composing one CPU and one GPU estimator, instead of inheriting from the GPU estimator. This lets us better match sklearn's expected interface, since the exposed class can look much more like the proxied sklearn class than one subclassed from the cuml estimator. This is accomplished by defining a new `InteropMixin` mixin class to handle converting hyperparameters/attributes to/from the proxied CPU estimator. The mixin defines the public `as_sklearn`/`from_sklearn` methods, and the private methods are also used by `ProxyBase`. To port an estimator from the old `ProxyMixin` to the new `ProxyBase`: - Switch the cuml estimator to subclass from `Base` instead of `UniversalBase` - Add `InteropMixin` to the `cuml` estimator, and define the required methods. Remove any old methods and decorators used by `UniversalBase`. - Switch the class in `cuml.accel._wrappers.*` to subclass from `ProxyBase`. Any special casing for certain methods should be moved to this class from the `cuml` proper estimator class - Add tests One notable caveat of this approach is that the old device selection support will be removed from ported estimators. Ported estimators that previously would run on CPU with `using_device_type` no longer will - they will only run on GPU. This is a breaking change. That said, I don't believe we have a large user base doing this, and there's a clear transition path to using `cuml.accel` instead of `using_device_type`. In this PR I've handled porting all of the currently handled linear model estimators. There are still a few remaining todos: - [x] Add warning for old methods that supported device selection if device isn't GPU - [x] Fix pickling (should be easy to do, just haven't done it yet) - [x] Expand tests for the generic `ProxyBase` functionality - [x] Unxfail any newly passing sklearn test suite tests I don't intend to port any more estimators in this PR. Converting the other proxies to the new system can be done incrementally in follow up PRs. Part of rapidsai#6502. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Simon Adorf (https://github.com/csadorf) - Victor Lafargue (https://github.com/viclafargue) URL: rapidsai#6613
This PR adds a new
ProxyBasebase class for defining proxy estimators via composing one CPU and one GPU estimator, instead of inheriting from the GPU estimator. This lets us better match sklearn's expected interface, since the exposed class can look much more like the proxied sklearn class than one subclassed from the cuml estimator.This is accomplished by defining a new
InteropMixinmixin class to handle converting hyperparameters/attributes to/from the proxied CPU estimator. The mixin defines the publicas_sklearn/from_sklearnmethods, and the private methods are also used byProxyBase.To port an estimator from the old
ProxyMixinto the newProxyBase:Baseinstead ofUniversalBaseInteropMixinto thecumlestimator, and define the required methods. Remove any old methods and decorators used byUniversalBase.cuml.accel._wrappers.*to subclass fromProxyBase. Any special casing for certain methods should be moved to this class from thecumlproper estimator classOne notable caveat of this approach is that the old device selection support will be removed from ported estimators. Ported estimators that previously would run on CPU with
using_device_typeno longer will - they will only run on GPU. This is a breaking change. That said, I don't believe we have a large user base doing this, and there's a clear transition path to usingcuml.accelinstead ofusing_device_type.In this PR I've handled porting all of the currently handled linear model estimators. There are still a few remaining todos:
ProxyBasefunctionalityI don't intend to port any more estimators in this PR. Converting the other proxies to the new system can be done incrementally in follow up PRs.
Part of #6502.