Add MinMaxScaler, MaxAbsScaler, and PolynomialFeatures to cuml.accel#8032
Add MinMaxScaler, MaxAbsScaler, and PolynomialFeatures to cuml.accel#8032rapids-bot[bot] merged 6 commits intorapidsai:mainfrom
MinMaxScaler, MaxAbsScaler, and PolynomialFeatures to cuml.accel#8032Conversation
- Convert all non-parameter attributes set on an array-api estimator, not just public ones - Add support for `_params_from_cpu` overrides on `ArrayAPIProxyBase`. - Use a `classproperty` to define `_parameter_constraints`
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds cuml.accel ArrayAPI proxy support for three sklearn preprocessing estimators (MinMaxScaler, MaxAbsScaler, PolynomialFeatures), updates estimator-proxy plumbing for class-level properties and CPU param extraction, extends docs with CPU-fallback details, and expands/adjusts related tests and xfail list. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/cuml/cuml_accel_tests/integration/test_preprocessing.py (1)
82-96:⚠️ Potential issue | 🟠 MajorBug: Parameterized test doesn't use the
clsparameter.The test is parameterized over
StandardScaler,MinMaxScaler, andMaxAbsScaler, but the test body hardcodesStandardScaler()on lines 86 and 88. Theclsparameter is never used.🐛 Proposed fix to use the parameterized class
`@pytest.mark.parametrize`("cls", [StandardScaler, MinMaxScaler, MaxAbsScaler]) def test_scaler_partial_fit(cls): X, _ = make_blobs(n_samples=100, centers=3, random_state=42) - model = StandardScaler().fit(X) + model = cls().fit(X) - model2 = StandardScaler() + model2 = cls() model2.partial_fit(X[:25]) assert model2.n_samples_seen_ == 25 model2.partial_fit(X[25:]) assert model2.n_samples_seen_ == X.shape[0] sol = model.transform(X) res = model2.transform(X) np.testing.assert_allclose(sol, res)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml_accel_tests/integration/test_preprocessing.py` around lines 82 - 96, The test_scaler_partial_fit is parameterized with cls but incorrectly instantiates StandardScaler directly; replace both hardcoded StandardScaler() calls (the one assigned to model and the one assigned to model2) with cls() so the test uses the parameterized class, ensuring model = cls().fit(X) and model2 = cls() before calling model2.partial_fit and model2.transform to validate behavior across StandardScaler, MinMaxScaler, and MaxAbsScaler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@python/cuml/cuml_accel_tests/integration/test_preprocessing.py`:
- Around line 82-96: The test_scaler_partial_fit is parameterized with cls but
incorrectly instantiates StandardScaler directly; replace both hardcoded
StandardScaler() calls (the one assigned to model and the one assigned to
model2) with cls() so the test uses the parameterized class, ensuring model =
cls().fit(X) and model2 = cls() before calling model2.partial_fit and
model2.transform to validate behavior across StandardScaler, MinMaxScaler, and
MaxAbsScaler.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 09c9503c-10cc-4c29-a2f4-dfb39e93fc54
📒 Files selected for processing (7)
docs/source/cuml-accel/faq.rstdocs/source/cuml-accel/limitations.rstpython/cuml/cuml/accel/_overrides/sklearn/preprocessing.pypython/cuml/cuml/accel/estimator_proxy.pypython/cuml/cuml_accel_tests/integration/test_preprocessing.pypython/cuml/cuml_accel_tests/test_estimator_proxy.pypython/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml
💤 Files with no reviewable changes (1)
- python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml
|
/merge |
|
|
||
| ``MaxAbsScaler`` will fall back to CPU in the following cases: | ||
|
|
||
| - If ``X`` is sparse. |
There was a problem hiding this comment.
(potential follow-up) Delegate to cuML's implementation for sparse inputs.
This adds support for
MinMaxScaler,MaxAbsScalerandPolynomialFeaturestocuml.accel, through sklearn's array-api (using the recently addedArrayAPIProxyBase). I had to make a few additional modifications to support these without regressions:_parameter_constraintsas a class attribute on the proxy classes. This fixed a few existing xfails._ArrayAPIWrapper._internal_modelinstance, not just the public ones._params_from_cpuonArrayAPIProxyBase, for cases where certain parameter combinations don't support the array-api.Fixes #8014.
Fixes #8030.
Fixes #8031.