Add `MinMaxScaler`, `MaxAbsScaler`, and `PolynomialFeatures` to `cuml.accel` by jcrist · Pull Request #8032 · rapidsai/cuml

jcrist · 2026-04-29T20:10:32Z

This adds support for MinMaxScaler, MaxAbsScaler and PolynomialFeatures to cuml.accel, through sklearn's array-api (using the recently added ArrayAPIProxyBase). I had to make a few additional modifications to support these without regressions:

Expose _parameter_constraints as a class attribute on the proxy classes. This fixed a few existing xfails.
Coerce all non-parameter attributes set on the _ArrayAPIWrapper._internal_model instance, not just the public ones.
Add support for customizing _params_from_cpu on ArrayAPIProxyBase, for cases where certain parameter combinations don't support the array-api.

Fixes #8014.
Fixes #8030.
Fixes #8031.

- Convert all non-parameter attributes set on an array-api estimator, not just public ones - Add support for `_params_from_cpu` overrides on `ArrayAPIProxyBase`. - Use a `classproperty` to define `_parameter_constraints`

coderabbitai · 2026-04-29T20:19:27Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fc9a3c3c-2f08-4d25-b115-ca2e7ba7d1cf

📥 Commits

Reviewing files that changed from the base of the PR and between ae233f4 and be40842.

📒 Files selected for processing (1)

python/cuml/cuml_accel_tests/integration/test_preprocessing.py

🚧 Files skipped from review as they are similar to previous changes (1)

python/cuml/cuml_accel_tests/integration/test_preprocessing.py

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added MinMaxScaler, MaxAbsScaler, and PolynomialFeatures to GPU-accelerated preprocessing with automatic CPU fallback for unsupported cases.
Documentation
- Expanded estimator support list and detailed conditions that trigger CPU fallback for preprocessing.
Tests
- Extended integration tests for the new preprocessors and parameter-constraint introspection; updated upstream test expectations removing some xfail entries.

Walkthrough

Adds cuml.accel ArrayAPI proxy support for three sklearn preprocessing estimators (MinMaxScaler, MaxAbsScaler, PolynomialFeatures), updates estimator-proxy plumbing for class-level properties and CPU param extraction, extends docs with CPU-fallback details, and expands/adjusts related tests and xfail list.

Changes

Cohort / File(s)	Summary
Documentation `docs/source/cuml-accel/faq.rst`, `docs/source/cuml-accel/limitations.rst`	Expanded estimator lists in FAQ and added CPU-fallback/limitations details for `MinMaxScaler`, `MaxAbsScaler`, and `PolynomialFeatures`.
Preprocessing Proxies `python/cuml/cuml/accel/_overrides/sklearn/preprocessing.py`	Added `MinMaxScaler`, `MaxAbsScaler`, and `PolynomialFeatures` ArrayAPI proxy classes; exported sklearn internals for `PolynomialFeatures`; added `_params_from_cpu` handling and updated `__all__`.
Proxy Core `python/cuml/cuml/accel/estimator_proxy.py`	Introduced `classproperty`; made `_parameter_constraints` a class-level property sourced from CPU class; added plumbing for `_params_from_cpu` overrides; refined attribute sync/conversion logic.
Tests `python/cuml/cuml_accel_tests/integration/test_preprocessing.py`, `python/cuml/cuml_accel_tests/test_estimator_proxy.py`	Added/expanded integration tests for MinMaxScaler, MaxAbsScaler, PolynomialFeatures; parameterized partial-fit tests; added proxy parameter-constraint introspection test.
Test Config `python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml`	Removed xfail entries for certain sklearn parameter-validation tests (DBSCAN, KMeans, SpectralClustering, LedoitWolf).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Use scikit-learn's array-api to accelerate StandardScaler #8020 — Extends ArrayAPI proxy approach to sklearn preprocessing proxies and related estimator-proxy plumbing (closely related).
SpectralClustering in cuml.accel #7804 — Related estimator-proxy plumbing changes enabling additional sklearn wrapper support.
Add cuml.accel support for StandardScaler #7766 — Prior preprocessing/integration work touching estimator_proxy and preprocessing proxies.

Suggested labels

feature request

Suggested reviewers

divyegala
csadorf
betatim

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 13.04% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding three preprocessing estimators to cuml.accel.
Description check	✅ Passed	The description is directly related to the changeset, explaining what was added and why, including supporting modifications and issue references.
Linked Issues check	✅ Passed	The PR successfully implements all three requirements: MinMaxScaler, MaxAbsScaler, and PolynomialFeatures support in cuml.accel with proper documentation and tests.
Out of Scope Changes check	✅ Passed	All changes are within scope, including documentation updates, preprocessing estimator exports, proxy framework enhancements, and comprehensive test coverage for the new functionality.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

python/cuml/cuml_accel_tests/integration/test_preprocessing.py (1)

82-96: ⚠️ Potential issue | 🟠 Major

Bug: Parameterized test doesn't use the cls parameter.

The test is parameterized over StandardScaler, MinMaxScaler, and MaxAbsScaler, but the test body hardcodes StandardScaler() on lines 86 and 88. The cls parameter is never used.

🐛 Proposed fix to use the parameterized class

 `@pytest.mark.parametrize`("cls", [StandardScaler, MinMaxScaler, MaxAbsScaler])
 def test_scaler_partial_fit(cls):
     X, _ = make_blobs(n_samples=100, centers=3, random_state=42)
 
-    model = StandardScaler().fit(X)
+    model = cls().fit(X)
 
-    model2 = StandardScaler()
+    model2 = cls()
     model2.partial_fit(X[:25])
     assert model2.n_samples_seen_ == 25
     model2.partial_fit(X[25:])
     assert model2.n_samples_seen_ == X.shape[0]
 
     sol = model.transform(X)
     res = model2.transform(X)
     np.testing.assert_allclose(sol, res)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml_accel_tests/integration/test_preprocessing.py` around lines
82 - 96, The test_scaler_partial_fit is parameterized with cls but incorrectly
instantiates StandardScaler directly; replace both hardcoded StandardScaler()
calls (the one assigned to model and the one assigned to model2) with cls() so
the test uses the parameterized class, ensuring model = cls().fit(X) and model2
= cls() before calling model2.partial_fit and model2.transform to validate
behavior across StandardScaler, MinMaxScaler, and MaxAbsScaler.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@python/cuml/cuml_accel_tests/integration/test_preprocessing.py`:
- Around line 82-96: The test_scaler_partial_fit is parameterized with cls but
incorrectly instantiates StandardScaler directly; replace both hardcoded
StandardScaler() calls (the one assigned to model and the one assigned to
model2) with cls() so the test uses the parameterized class, ensuring model =
cls().fit(X) and model2 = cls() before calling model2.partial_fit and
model2.transform to validate behavior across StandardScaler, MinMaxScaler, and
MaxAbsScaler.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 09c9503c-10cc-4c29-a2f4-dfb39e93fc54

📥 Commits

Reviewing files that changed from the base of the PR and between c811e80 and ae233f4.

📒 Files selected for processing (7)

docs/source/cuml-accel/faq.rst
docs/source/cuml-accel/limitations.rst
python/cuml/cuml/accel/_overrides/sklearn/preprocessing.py
python/cuml/cuml/accel/estimator_proxy.py
python/cuml/cuml_accel_tests/integration/test_preprocessing.py
python/cuml/cuml_accel_tests/test_estimator_proxy.py
python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml

💤 Files with no reviewable changes (1)

python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml

betatim

Looks nice

jcrist · 2026-04-30T12:18:45Z

/merge

csadorf

Nice!

csadorf · 2026-04-30T14:49:54Z

+
+``MaxAbsScaler`` will fall back to CPU in the following cases:
+
+- If ``X`` is sparse.


(potential follow-up) Delegate to cuML's implementation for sparse inputs.

jcrist added 5 commits April 29, 2026 14:09

Expose _parameter_constraints as class attribute

d749f48

Add support for MinMaxScaler to cuml.accel

80b15ba

Add support for MaxAbsScaler to cuml.accel

6c5c264

A few Proxy improvements

f73641c

- Convert all non-parameter attributes set on an array-api estimator, not just public ones - Add support for `_params_from_cpu` overrides on `ArrayAPIProxyBase`. - Use a `classproperty` to define `_parameter_constraints`

Add support for PolynomialFeatures to cuml.accel

ae233f4

jcrist self-assigned this Apr 29, 2026

jcrist requested a review from a team as a code owner April 29, 2026 20:10

jcrist requested a review from betatim April 29, 2026 20:10

jcrist added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change cuml-accel Issues related to cuml.accel labels Apr 29, 2026

github-actions Bot added the Cython / Python Cython or Python issue label Apr 29, 2026

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Respond to feedback

be40842

betatim approved these changes Apr 30, 2026

View reviewed changes

rapids-bot Bot merged commit 49d106a into rapidsai:main Apr 30, 2026
171 of 174 checks passed

jcrist deleted the more-preprocessors branch April 30, 2026 12:18

csadorf reviewed Apr 30, 2026

View reviewed changes

csadorf mentioned this pull request Apr 30, 2026

Accelerate StandardScaler, MinMaxScaler, and LabelEncoder via sklearn's array API dispatch #8013

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `MinMaxScaler`, `MaxAbsScaler`, and `PolynomialFeatures` to `cuml.accel`#8032

Add `MinMaxScaler`, `MaxAbsScaler`, and `PolynomialFeatures` to `cuml.accel`#8032
rapids-bot[bot] merged 6 commits intorapidsai:mainfrom
jcrist:more-preprocessors

jcrist commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

betatim left a comment

Uh oh!

jcrist commented Apr 30, 2026

Uh oh!

Uh oh!

csadorf left a comment

Uh oh!

csadorf Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		``MaxAbsScaler`` will fall back to CPU in the following cases:

		- If ``X`` is sparse.

Conversation

jcrist commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

betatim left a comment

Choose a reason for hiding this comment

Uh oh!

jcrist commented Apr 30, 2026

Uh oh!

Uh oh!

csadorf left a comment

Choose a reason for hiding this comment

Uh oh!

csadorf Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading