Validate that X is 2 dimensional#7889
Validate that X is 2 dimensional#7889rapids-bot[bot] merged 4 commits intorapidsai:release/26.04from
Conversation
cuml has not been strict about requiring `X` be a 2-dimensional input. In the future we want to be strict about requiring 2D X, to better align with sklearn conventions and expectations. Here we add a deprecation warning if X isn't 2D, but continue to accept it everywhere we already do. If `cuml.accel` is enabled, we instead error, with the same error that sklearn would raise. In the next release we'll instead error, improving our compatibility.
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR enforces 2D input validation across cuML's input handling pipeline, introduces a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can customize the high-level summary generated by CodeRabbit.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/cuml/cuml/internals/validation.py`:
- Around line 107-112: The deprecation warning emitted in
cuml/internals/validation.py uses warnings.warn without a stacklevel, so it
points to this internal function instead of the user's callsite; update the
warnings.warn call in this module to pass an appropriate stacklevel (e.g.,
stacklevel=2 or 3) so the warning points to the user's code that passed non-2D
input (i.e., modify the warnings.warn(...) invocation in this validation code to
include stacklevel=<n>).
In `@python/cuml/tests/test_target_encoder.py`:
- Around line 36-38: Fix the typo in the test comment: change "tarnsform" to
"transform" above the pytest.warns block that asserts FutureWarning for
encoder.transform(df.category); update the comment text so it correctly reads
"Warns in transform".
In `@python/cuml/tests/test_tsne.py`:
- Around line 231-233: The current test_components_exception uses np.array([[]])
which is empty and can raise for zero features instead of testing
TSNE(n_components=3).fit; update the test to pass a minimally valid 2D X with at
least one sample and fewer features than n_components (e.g., shape (1,1) or
(2,1)) so the ValueError originates from the n_components check in TSNE.fit, and
(optionally) assert the raised exception message mentions n_components to ensure
the correct error path in TSNE.fit is exercised.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5febae2b-ff2c-488a-ab2c-b760fb4a38b6
📒 Files selected for processing (12)
python/cuml/cuml/internals/outputs.pypython/cuml/cuml/internals/validation.pypython/cuml/cuml/preprocessing/TargetEncoder.pypython/cuml/cuml/preprocessing/label.pypython/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yamlpython/cuml/tests/test_coordinate_descent.pypython/cuml/tests/test_dbscan.pypython/cuml/tests/test_label_binarizer.pypython/cuml/tests/test_label_encoder.pypython/cuml/tests/test_target_encoder.pypython/cuml/tests/test_tsne.pypython/cuml/tests/test_validation.py
|
/merge |
3cc7c9c
into
rapidsai:release/26.04
sklearnrequires that X is 2-dimensional, and errors nicely otherwise.cumldoesn't uniformly require that X is 2-dimensional. Some estimators error on non-2D-X, but they usually do so accidentally, not as part of the input validation code. This leads to a bunch of xfailed tests in our sklearn compatibility test suite (both in thecuml.accelupstream tests, as well as intest_sklearn_compatibility.py).This PR:
Xis non-2-dimensional. In 26.06 we'll remove the deprecation warning and error instead.cuml.accelis enabled, this warning is an error matching the sklearn error message instead. This lets us un-xfail a bunch of tests right now.TargetEncoder, only a few other locations needed it. Since it was so common forTargetEncoder, I added a deprecation test there as well to check that everything still worked on 1D inputs.reflectto supportreset="type", for setting the type on fit-like functions alone (and notn_features_in_/feature_names_in_). This was needed for 2 "transformers" sklearn (and cuml) supports that are meant to operate onyinstead ofX(LabelEncoderandLabelBinarizer). These estimators operate onyalone and shouldn't supportn_features_in_/feature_names_in_. They also shouldn't validate that the array input is 2 dimensional, sinceycan be 1D. This is a stop-gap solution as we refactor our validation functions - in the long run we might removeresetentirely fromreflectand instead move setting the input type to the validation functions (with thereflectdecorator only remaining for coercing outputs).As per our deprecation policy, I've marked this PR as "breaking" since it adds a new deprecation warning around non-2-dimensional X. All prior working code should continue to work, users providing 1D X should just see a warning.