Skip to content

Add MinMaxScaler, MaxAbsScaler, and PolynomialFeatures to cuml.accel#8032

Merged
rapids-bot[bot] merged 6 commits intorapidsai:mainfrom
jcrist:more-preprocessors
Apr 30, 2026
Merged

Add MinMaxScaler, MaxAbsScaler, and PolynomialFeatures to cuml.accel#8032
rapids-bot[bot] merged 6 commits intorapidsai:mainfrom
jcrist:more-preprocessors

Conversation

@jcrist
Copy link
Copy Markdown
Member

@jcrist jcrist commented Apr 29, 2026

This adds support for MinMaxScaler, MaxAbsScaler and PolynomialFeatures to cuml.accel, through sklearn's array-api (using the recently added ArrayAPIProxyBase). I had to make a few additional modifications to support these without regressions:

  • Expose _parameter_constraints as a class attribute on the proxy classes. This fixed a few existing xfails.
  • Coerce all non-parameter attributes set on the _ArrayAPIWrapper._internal_model instance, not just the public ones.
  • Add support for customizing _params_from_cpu on ArrayAPIProxyBase, for cases where certain parameter combinations don't support the array-api.

Fixes #8014.
Fixes #8030.
Fixes #8031.

jcrist added 5 commits April 29, 2026 14:09
- Convert all non-parameter attributes set on an array-api estimator,
  not just public ones
- Add support for `_params_from_cpu` overrides on `ArrayAPIProxyBase`.
- Use a `classproperty` to define `_parameter_constraints`
@jcrist jcrist self-assigned this Apr 29, 2026
@jcrist jcrist requested a review from a team as a code owner April 29, 2026 20:10
@jcrist jcrist requested a review from betatim April 29, 2026 20:10
@jcrist jcrist added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change cuml-accel Issues related to cuml.accel labels Apr 29, 2026
@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Apr 29, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fc9a3c3c-2f08-4d25-b115-ca2e7ba7d1cf

📥 Commits

Reviewing files that changed from the base of the PR and between ae233f4 and be40842.

📒 Files selected for processing (1)
  • python/cuml/cuml_accel_tests/integration/test_preprocessing.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • python/cuml/cuml_accel_tests/integration/test_preprocessing.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added MinMaxScaler, MaxAbsScaler, and PolynomialFeatures to GPU-accelerated preprocessing with automatic CPU fallback for unsupported cases.
  • Documentation

    • Expanded estimator support list and detailed conditions that trigger CPU fallback for preprocessing.
  • Tests

    • Extended integration tests for the new preprocessors and parameter-constraint introspection; updated upstream test expectations removing some xfail entries.

Walkthrough

Adds cuml.accel ArrayAPI proxy support for three sklearn preprocessing estimators (MinMaxScaler, MaxAbsScaler, PolynomialFeatures), updates estimator-proxy plumbing for class-level properties and CPU param extraction, extends docs with CPU-fallback details, and expands/adjusts related tests and xfail list.

Changes

Cohort / File(s) Summary
Documentation
docs/source/cuml-accel/faq.rst, docs/source/cuml-accel/limitations.rst
Expanded estimator lists in FAQ and added CPU-fallback/limitations details for MinMaxScaler, MaxAbsScaler, and PolynomialFeatures.
Preprocessing Proxies
python/cuml/cuml/accel/_overrides/sklearn/preprocessing.py
Added MinMaxScaler, MaxAbsScaler, and PolynomialFeatures ArrayAPI proxy classes; exported sklearn internals for PolynomialFeatures; added _params_from_cpu handling and updated __all__.
Proxy Core
python/cuml/cuml/accel/estimator_proxy.py
Introduced classproperty; made _parameter_constraints a class-level property sourced from CPU class; added plumbing for _params_from_cpu overrides; refined attribute sync/conversion logic.
Tests
python/cuml/cuml_accel_tests/integration/test_preprocessing.py, python/cuml/cuml_accel_tests/test_estimator_proxy.py
Added/expanded integration tests for MinMaxScaler, MaxAbsScaler, PolynomialFeatures; parameterized partial-fit tests; added proxy parameter-constraint introspection test.
Test Config
python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml
Removed xfail entries for certain sklearn parameter-validation tests (DBSCAN, KMeans, SpectralClustering, LedoitWolf).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

feature request

Suggested reviewers

  • divyegala
  • csadorf
  • betatim
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.04% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding three preprocessing estimators to cuml.accel.
Description check ✅ Passed The description is directly related to the changeset, explaining what was added and why, including supporting modifications and issue references.
Linked Issues check ✅ Passed The PR successfully implements all three requirements: MinMaxScaler, MaxAbsScaler, and PolynomialFeatures support in cuml.accel with proper documentation and tests.
Out of Scope Changes check ✅ Passed All changes are within scope, including documentation updates, preprocessing estimator exports, proxy framework enhancements, and comprehensive test coverage for the new functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/cuml/cuml_accel_tests/integration/test_preprocessing.py (1)

82-96: ⚠️ Potential issue | 🟠 Major

Bug: Parameterized test doesn't use the cls parameter.

The test is parameterized over StandardScaler, MinMaxScaler, and MaxAbsScaler, but the test body hardcodes StandardScaler() on lines 86 and 88. The cls parameter is never used.

🐛 Proposed fix to use the parameterized class
 `@pytest.mark.parametrize`("cls", [StandardScaler, MinMaxScaler, MaxAbsScaler])
 def test_scaler_partial_fit(cls):
     X, _ = make_blobs(n_samples=100, centers=3, random_state=42)
 
-    model = StandardScaler().fit(X)
+    model = cls().fit(X)
 
-    model2 = StandardScaler()
+    model2 = cls()
     model2.partial_fit(X[:25])
     assert model2.n_samples_seen_ == 25
     model2.partial_fit(X[25:])
     assert model2.n_samples_seen_ == X.shape[0]
 
     sol = model.transform(X)
     res = model2.transform(X)
     np.testing.assert_allclose(sol, res)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml_accel_tests/integration/test_preprocessing.py` around lines
82 - 96, The test_scaler_partial_fit is parameterized with cls but incorrectly
instantiates StandardScaler directly; replace both hardcoded StandardScaler()
calls (the one assigned to model and the one assigned to model2) with cls() so
the test uses the parameterized class, ensuring model = cls().fit(X) and model2
= cls() before calling model2.partial_fit and model2.transform to validate
behavior across StandardScaler, MinMaxScaler, and MaxAbsScaler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@python/cuml/cuml_accel_tests/integration/test_preprocessing.py`:
- Around line 82-96: The test_scaler_partial_fit is parameterized with cls but
incorrectly instantiates StandardScaler directly; replace both hardcoded
StandardScaler() calls (the one assigned to model and the one assigned to
model2) with cls() so the test uses the parameterized class, ensuring model =
cls().fit(X) and model2 = cls() before calling model2.partial_fit and
model2.transform to validate behavior across StandardScaler, MinMaxScaler, and
MaxAbsScaler.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 09c9503c-10cc-4c29-a2f4-dfb39e93fc54

📥 Commits

Reviewing files that changed from the base of the PR and between c811e80 and ae233f4.

📒 Files selected for processing (7)
  • docs/source/cuml-accel/faq.rst
  • docs/source/cuml-accel/limitations.rst
  • python/cuml/cuml/accel/_overrides/sklearn/preprocessing.py
  • python/cuml/cuml/accel/estimator_proxy.py
  • python/cuml/cuml_accel_tests/integration/test_preprocessing.py
  • python/cuml/cuml_accel_tests/test_estimator_proxy.py
  • python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml
💤 Files with no reviewable changes (1)
  • python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml

Copy link
Copy Markdown
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Apr 30, 2026

/merge

@rapids-bot rapids-bot Bot merged commit 49d106a into rapidsai:main Apr 30, 2026
171 of 174 checks passed
@jcrist jcrist deleted the more-preprocessors branch April 30, 2026 12:18
Copy link
Copy Markdown
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


``MaxAbsScaler`` will fall back to CPU in the following cases:

- If ``X`` is sparse.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(potential follow-up) Delegate to cuML's implementation for sparse inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuml-accel Issues related to cuml.accel Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add cuml.accel support for PolynomialFeatures Add cuml.accel support for MaxAbsScaler Add cuml.accel support for MinMaxScaler

4 participants