Add `check_features` by jcrist · Pull Request #7874 · rapidsai/cuml

jcrist · 2026-03-10T22:23:34Z

This PR resolves many xfailures in test_sklearn_compatibility and the cuml.accel upstream sklearn test suite. In particular, it targets the check_n_features_in_after_fitting check (although several other checks were fixed while implementing).

This is a precursor to adding support for feature_names_in_ - once this PR lands, the next step is to add storing/checking that attribute within check_features as well. In the long run I expect most estimators will move to calling check_features from within a larger check_inputs ingestion routine (similar to sklearn's validate_data), but we're taking incremental steps to get there.

Summary:

Add check_features and corresponding test to cuml.internals.validation
Call check_features(self, X, reset=True) from within reflect, drop all calls to private _set_n_features_in method.
Call check_features(self, X) in all inference methods that expect an X aligned with the original fit features. Note that this should not be called in an inverse_transform method, since the input there is not aligned with the original X.
Update xfailed tests accordingly.

In addition, the following tangential issues were found and fixed while writing this PR:

Make predict_proba/predict_log_proba only available on SVC/LinearSVC if probability=True. This is needed for the sklearn generic tests to work properly - if the method is available it'll be called, so we want hasattr(model, "predict_proba") to only return True if probability=True. This matches upstream sklearn behavior. This is non-breaking, any user that was calling these methods before will still have them work fine. This just changes the nature of the error that would raise if they weren't supported. This fixed several xfails in our test suite.
Add requires_positive_X tag to ComplementNB/CategoricalNB. This was needed so the sklearn generic tests would generate positive data for these estimators, otherwise invalid data would be generated, masking what issues we actually have. This fixed several xfails in our test suite.
Fix the partial_fit methods in cuml.naive_bayes and cuml.IncrementalPCA to better distinguish between first and subsequent fits. This fixed several xfails in our test suite.
Simplify implementation of OrdinalEncoder/OneHotEncoder. These estimators don't follow the patterns we expect for Base subclasses yet, and were implementing n_features_in_ themselves (among other things). This is non-breaking, but will look larger on the diff.

Most other changes are pretty small, mechanical additions to add check_features calls in the correct spots. Hopefully a straightforward review.

Working towards #6650, part of #7428.

Adds calls to `check_features` everywhere they're needed. Removes now-passing xfails in `test_sklearn_compatibility.py`. While hacking through this, I also found a few unrelated bugs that got lumped in with this commit: - Adds `requires_positive_X` tag to `ComplementNB`/`CategoricalNB`. This tag affects sklearn's generic tests and led to some xfails now passing. Was necessary to get the tests to run without errors. - Fixes some state management around `partial_fit` for naive bayes and incremental PCA. These are tested by the sklearn generic tests.

coderabbitai · 2026-03-10T22:35:43Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3adcee5c-eed1-442a-af72-2f53dbcccd94

📥 Commits

Reviewing files that changed from the base of the PR and between bd8a3b9 and 2f0e73f.

📒 Files selected for processing (1)

python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml

💤 Files with no reviewable changes (1)

python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml

📝 Walkthrough

Summary by CodeRabbit

New Features
- LinearSVC now exposes predict_proba and predict_log_proba when probability=True.
Bug Fixes
- Centralized feature validation added across many estimators and transformers for consistent errors on feature-mismatch.
- Naive Bayes fit/partial-fit behavior and metadata handling improved.
- Encoders and preprocessing now manage feature metadata more robustly.
Tests
- Tests and expected failures updated to reflect stricter validation and behavior changes.

Walkthrough

Introduces a centralized check_features(estimator, X, reset=False) and helper _get_n_features in cuml.internals.validation; removes Base._set_n_features_in; replaces direct feature-count setting with check_features calls across many estimators/encoders, updates Naive Bayes context usage, adjusts tests and xfail expectations.

Changes

Cohort / File(s)	Summary
Core validation & base outputs `python/cuml/cuml/internals/validation.py`, `python/cuml/cuml/internals/base.py`, `python/cuml/cuml/internals/outputs.py`	Adds `check_features` and `_get_n_features`; removes `Base._set_n_features_in`; replaces direct feature-size updates with centralized validation and exports `check_features`.
Preprocessing & encoders `python/cuml/cuml/preprocessing/encoders.py`, `python/cuml/cuml/preprocessing/onehotencoder_mg.py`, `python/cuml/cuml/preprocessing/ordinalencoder_mg.py`, `python/cuml/cuml/preprocessing/TargetEncoder.py`	Replaces CheckFeaturesMixIn usage with centralized `check_features`; adds `_set_features` on BaseEncoder; removes setting/exposing `n_features_in_` in TargetEncoder.
Decomposition / PCA / TSVD `python/cuml/cuml/decomposition/base_mg.py`, `python/cuml/cuml/decomposition/incremental_pca.py`, `python/cuml/cuml/decomposition/pca.pyx`, `python/cuml/cuml/decomposition/tsvd.pyx`	Use `check_features(..., reset=True)` for initial-fit feature setup; add `check_features` in transform paths; incremental PCA adjusts first-call initialization.
Clustering, kmeans, projections `python/cuml/cuml/cluster/hdbscan/hdbscan.pyx`, `python/cuml/cuml/cluster/kmeans.pyx`, `python/cuml/cuml/random_projection/random_projection.py`	Insert `check_features` validations in membership_vector, approximate_predict, KMeans predict/transform, and RandomProjection.transform.
Ensembles & neighbors `python/cuml/cuml/ensemble/randomforestclassifier.py`, `python/cuml/cuml/ensemble/randomforestregressor.py`, `python/cuml/cuml/neighbors/nearest_neighbors.pyx`, `python/cuml/cuml/neighbors/kernel_density.py`	Add `check_features` in predict/predict_proba/kneighbors/score_samples and simplify some sparse-data guards.
SVM, linear models, kernel ridge, SVR `python/cuml/cuml/svm/linear_svc.py`, `python/cuml/cuml/svm/svc.py`, `python/cuml/cuml/svm/svr.py`, `python/cuml/cuml/linear_model/base.py`, `python/cuml/cuml/linear_model/base_mg.pyx`, `python/cuml/cuml/kernel_ridge/kernel_ridge.py`	Add `check_features` validation in predict/decision_function/predict_proba paths; add `@available_if` gating for probability methods and refine NotFittedError messaging.
Naive Bayes & related refactors `python/cuml/cuml/naive_bayes/naive_bayes.py`	Swaps `reflect`→`run_in_internal_context`, inserts `check_features` calls and first-call branching in fit/partial_fit/predict flows, and adds `_more_static_tags()` on several NB classes.
Other estimators/utilities `python/cuml/cuml/covariance/ledoit_wolf.py`, `python/cuml/cuml/manifold/umap/umap.pyx`, `python/cuml/cuml/experimental/linear_model/lars.pyx`	Add `check_features` to score/transform/predict paths; lars removes explicit `_set_n_features_in`/_set_output_type calls.
Tests & sklearn-compat adjustments `python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml`, `python/cuml/tests/test_validation.py`, `python/cuml/tests/test_linear_svm.py`, `python/cuml/tests/*`	Adds tests for `_get_n_features`/`check_features`, removes many xfail entries for n_features_in checks, and updates SVC/LinearSVC probability-related tests and expectations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

[BUG] Order of estimators effects results of common checks #7065: Centralizes feature-count validation and replaces ad-hoc n_features_in handling, addressing order-dependent failures tied to feature validation.

Possibly related PRs

Add cuml.internals.validation, check_is_fitted checks #7868: Extends the validation utilities and integrates feature-check calls across modules — strong code-level overlap.
CI Switch to strict xfail policy for common estimator checks #7753: Adjusts sklearn-compatibility xfail expectations; overlaps with test/xfail changes here.
CI Update xfail list #7768: Also modifies sklearn-compatibility xfail mappings related to these test updates.

Suggested reviewers

betatim
dantegd
csadorf

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 56.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Add `check_features`' is concise and clearly describes the main change: introducing a new validation helper function.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining the motivation, implementation details, and related fixes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

python/cuml/cuml/svm/linear_svc.py (1)
320-375: ⚠️ Potential issue | 🟠 Major

cuML's LinearSVC introduces API divergence from sklearn by adding unsupported parameters and methods.

The probability parameter and predict_proba/predict_log_proba methods do not exist in sklearn's LinearSVC. Sklearn recommends wrapping LinearSVC with CalibratedClassifierCV to obtain probability estimates, or using SVC(kernel="linear", probability=True) instead.

Per coding guidelines, function and parameter names must match scikit-learn without justification. Either remove the probability parameter and probability methods entirely, or document this as an intentional API deviation with clear justification for why cuML diverges here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/svm/linear_svc.py` around lines 320 - 375, The PR adds
sklearn-incompatible API changes to LinearSVC by introducing the probability
parameter and methods predict_proba and predict_log_proba; either remove these
members from the LinearSVC class (remove the probability attribute, the
`@available_if` wrappers, and the predict_proba/predict_log_proba methods) to
match scikit-learn, or if you intend to keep them, add a clear, explicit
justification in the class docstring and module-level docs describing the
intentional divergence and rationale (mention recommended alternatives like
CalibratedClassifierCV and SVC(kernel="linear", probability=True)); update any
generate_docstring/reflect annotations and tests referencing
predict_proba/predict_log_proba accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/internals/validation.py`:
- Around line 57-65: The _get_n_features branch that treats list/tuple inputs
assumes nested rows are only lists/tuples and returns 1 otherwise, which
miscounts cases like [np.array(...), ...] and also swallows errors with a
blanket except; update the logic in _get_n_features to: (1) explicitly handle
empty X by raising a clear ValueError, (2) inspect the first row using a robust
"array-like" check (e.g., isinstance(row, (list, tuple, np.ndarray)) or
hasattr(row, "__len__")) to determine n_features correctly for array-like rows
such as np.ndarray, (3) avoid a bare except—only catch specific exceptions if
needed and re-raise or convert them to a meaningful error about inconsistent row
formats, and (4) ensure functions that rely on n_features_in_ (the callers)
receive the correct integer or a clear error when ambiguous; locate this logic
in the _get_n_features function to make the changes.

In `@python/cuml/cuml/manifold/umap/umap.pyx`:
- Around line 1415-1419: Run the dimensionality check (the len(X.shape) != 2
guard that raises "Reshape your data: X should be two dimensional") before
calling check_features(self, X) inside the predict/transform path so 1D inputs
hit the sklearn-style reshape error first; then remove the later manual n_cols
!= self.n_features_in_ block because check_features already enforces
feature-count consistency; ensure references are to check_is_fitted, the
len(X.shape) 2D check, check_features(self, X), and the n_cols !=
self.n_features_in_ manual check so the correct spots are updated.

In `@python/cuml/cuml/naive_bayes/naive_bayes.py`:
- Around line 858-863: On the partial_fit path, when first_call is True and the
estimator is the discrete branch, require that the caller passes a non-None
classes parameter instead of silently inferring classes from the first batch; in
practice modify partial_fit (the method that calls self._set_output_type(X) and
check_features) to check if first_call and classes is None and raise ValueError
(matching scikit-learn behavior), and when classes is provided use it to
initialize self.classes_ rather than inferring from y in that first batch.

In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Around line 777-779: kneighbors currently uses an old width check via
X.shape[1] after calling check_features, causing divergent error paths and
failures for array-like inputs; update kneighbors to call check_is_fitted()
first, rely on check_features(self, X) to validate/normalize the input (remove
any subsequent direct uses of X.shape[1] or similar legacy width checks), then
branch to _kneighbors_dense/_kneighbors_sparse using the validated/normalized X
returned or ensured by check_features, and ensure the returned neighbors
distances/indices preserve the input type (cuDF/pandas/NumPy) consistent with
check_features behavior.

In `@python/cuml/cuml/preprocessing/encoders.py`:
- Around line 42-47: Update OneHotEncoder.transform to use the existing
_check_features helper so transform performs the same feature validation as
fit/fit_transform: call check_is_fitted(self), then invoke
self._check_features(X, reset=False) (which calls check_features and
sets/validates feature_names_in_), and ensure transform validates input types
(pandas/cuDF/ndarray) and shapes against the fitted
n_features_in_/feature_names_in_. Preserve the fitted column order when
selecting columns (do not iterate X.columns directly), raise on missing or extra
columns rather than silently dropping/reordering, and make sure the returned
encoded block type matches the input type (DataFrame for DataFrame inputs,
ndarray for ndarray inputs).
- Around line 62-63: The bug is that calling self._check_features(reset=True)
unconditionally inside _fit() causes the second validation of explicit
categories (via _check_input_fit/_check_input with is_categories=True) to
overwrite fitted metadata; change the flow so that _check_features(reset=True)
is only called when validating the main training X (not when validating explicit
categories), i.e., detect is_categories before resetting and skip the reset for
category-validation paths; ensure _fit() still clears previous model state and
initializes learned attributes (n_features_in_, feature_names_in_, etc.) before
any validation and use the existing _check_input/_check_input_fit calls to
validate inputs without mutating fitted metadata for categories.

In `@python/cuml/tests/test_sklearn_compatibility.py`:
- Around line 289-290: Remove the new xfail entries for "check_fit2d_predict1d"
(and the similar entries referenced at lines 359-360) from the xfail mapping so
we don't codify 1D-input regressions; instead fix the estimators that fail
validation by updating their predict/transform implementations (e.g.,
SVC.predict and any custom transformer/estimator transform methods) to perform
the same input-shape checks and error raising as scikit-learn for 1D arrays
(match function/parameter names and edge-case behavior), then run the
sklearn_compatibility tests to confirm check_fit2d_predict1d passes rather than
being marked xfail.

---

Outside diff comments:
In `@python/cuml/cuml/svm/linear_svc.py`:
- Around line 320-375: The PR adds sklearn-incompatible API changes to LinearSVC
by introducing the probability parameter and methods predict_proba and
predict_log_proba; either remove these members from the LinearSVC class (remove
the probability attribute, the `@available_if` wrappers, and the
predict_proba/predict_log_proba methods) to match scikit-learn, or if you intend
to keep them, add a clear, explicit justification in the class docstring and
module-level docs describing the intentional divergence and rationale (mention
recommended alternatives like CalibratedClassifierCV and SVC(kernel="linear",
probability=True)); update any generate_docstring/reflect annotations and tests
referencing predict_proba/predict_log_proba accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5d3ba8df-07be-4a81-afe5-8bfc0dfb80d0

📥 Commits

Reviewing files that changed from the base of the PR and between 5f2909d and f925fa9.

📒 Files selected for processing (35)

python/cuml/cuml/cluster/hdbscan/hdbscan.pyx
python/cuml/cuml/cluster/kmeans.pyx
python/cuml/cuml/covariance/ledoit_wolf.py
python/cuml/cuml/decomposition/base_mg.py
python/cuml/cuml/decomposition/incremental_pca.py
python/cuml/cuml/decomposition/pca.pyx
python/cuml/cuml/decomposition/tsvd.pyx
python/cuml/cuml/ensemble/randomforestclassifier.py
python/cuml/cuml/ensemble/randomforestregressor.py
python/cuml/cuml/experimental/linear_model/lars.pyx
python/cuml/cuml/internals/base.py
python/cuml/cuml/internals/outputs.py
python/cuml/cuml/internals/validation.py
python/cuml/cuml/kernel_ridge/kernel_ridge.py
python/cuml/cuml/linear_model/base.py
python/cuml/cuml/linear_model/base_mg.pyx
python/cuml/cuml/manifold/umap/umap.pyx
python/cuml/cuml/naive_bayes/naive_bayes.py
python/cuml/cuml/neighbors/kernel_density.py
python/cuml/cuml/neighbors/nearest_neighbors.pyx
python/cuml/cuml/preprocessing/TargetEncoder.py
python/cuml/cuml/preprocessing/encoders.py
python/cuml/cuml/preprocessing/onehotencoder_mg.py
python/cuml/cuml/preprocessing/ordinalencoder_mg.py
python/cuml/cuml/random_projection/random_projection.py
python/cuml/cuml/svm/linear_svc.py
python/cuml/cuml/svm/svc.py
python/cuml/cuml/svm/svr.py
python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml
python/cuml/tests/test_base.py
python/cuml/tests/test_incremental_pca.py
python/cuml/tests/test_linear_svm.py
python/cuml/tests/test_sklearn_compatibility.py
python/cuml/tests/test_svm.py
python/cuml/tests/test_validation.py

💤 Files with no reviewable changes (4)

python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml
python/cuml/cuml/experimental/linear_model/lars.pyx
python/cuml/cuml/internals/base.py
python/cuml/tests/test_base.py

viclafargue

Thanks! LGTM, even though a second pair of eyes would be welcome.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

python/cuml/cuml/preprocessing/encoders.py (1)

335-348: ⚠️ Potential issue | 🟠 Major

Validate DataFrame schema, not just width.

check_features(self, X) only checks n_features_in_. For DataFrame inputs with the same width but a different column order, this path still encodes in X.columns order, so the output block layout changes after fit. Reindex against self._features and raise on missing/extra columns before the transform loop.

Proposed fix

 def transform(self, X):
     """Transform X using one-hot encoding."""
     check_is_fitted(self)
     check_features(self, X)

     X = self._check_input(X)
+    if hasattr(X, "columns"):
+        missing = [f for f in self._features if f not in X.columns]
+        extra = [c for c in X.columns if c not in self._features]
+        if missing or extra:
+            raise ValueError(
+                f"X columns must match the fitted feature set. Missing={missing}, extra={extra}"
+            )
+        X = X[self._features]

@@
-            for feature in X.columns:
+            for feature in self._features:
                 encoder = self._encoders[feature]

In scikit-learn 1.5.0, does `OneHotEncoder.transform` preserve the fitted feature order for DataFrame inputs and reject missing or extra columns even when the column count matches?

As per coding guidelines, predict/transform methods must call check_is_fitted(), validate input dimensions against fitted dimensions, handle input type correctly (cuDF, pandas, NumPy), and ensure output type is consistent with input type.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/preprocessing/encoders.py` around lines 335 - 348, The
transform method only checks n_features_in_ via check_features(self, X) and then
iterates X.columns, which allows a DataFrame with the same width but reordered,
missing, or extra columns to silently produce a different encoding layout;
modify transform to first validate the DataFrame schema against the fitted
feature list (self._features) by checking for missing or unexpected columns and
raising an error if any are found, then reindex X to self._features (preserving
fitted column order) before calling self._check_input and iterating encoders in
self._encoders; ensure error messages reference the offending column names and
maintain input-type handling (cuDF/pandas/NumPy) and existing check_is_fitted()
usage.

python/cuml/cuml/internals/validation.py (1)

56-71: ⚠️ Potential issue | 🟠 Major

Don't infer a feature count from empty Python sequences.

[] currently returns 0 here, so any fit-like path that calls check_features(..., reset=True) can record a bogus n_features_in_ before the real validation layer rejects the input. The broad except Exception keeps the same “guess a width” behavior for other malformed row objects. This helper is now on the hot path for many estimators, so it should fail fast on empty/ambiguous sequence inputs instead of synthesizing a feature count.
Proposed fix
 def _get_n_features(X):
     if isinstance(X, (list, tuple)):
         if len(X) == 0:
-            return 0
+            raise TypeError(
+                "Unable to determine the number of features from an empty input"
+            )
         row = X[0]
@@
         if not isinstance(row, (str, bytes, dict)):
             try:
                 return len(row)
-            except Exception:
+            except TypeError:
                 pass
         return 1
In scikit-learn 1.5.0 source, what does `sklearn.utils.validation._num_features([])` do for an empty Python sequence, and does it infer `0` features or raise?
As per coding guidelines, behavior for edge cases (empty arrays, single sample) must match scikit-learn.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/internals/validation.py` around lines 56 - 71, The helper
_get_n_features currently returns 0 for empty Python sequences which causes
downstream code (e.g., check_features and n_features_in_) to record a bogus
feature count; change _get_n_features so that if X is a list or tuple and len(X)
== 0 it raises a ValueError (with a clear message like "Cannot infer number of
features from an empty sequence") instead of returning 0, leaving the existing
try/except behavior for non-empty rows intact so malformed row objects still
fall through to later validation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/preprocessing/encoders.py`:
- Around line 42-47: The _set_features helper only sets self.feature_names_in_
when X has a columns attribute, so if you refit from a DataFrame to a plain
array the old feature names linger; update _set_features (used by encoders' fit
paths) so that when X has no "columns" attribute it deletes/clears
self.feature_names_in_ (e.g., delattr or pop) before calling
check_features(self, X, reset=True), otherwise ensure feature_names_in_ is
removed on refit to reflect the new input without column names.

---

Duplicate comments:
In `@python/cuml/cuml/internals/validation.py`:
- Around line 56-71: The helper _get_n_features currently returns 0 for empty
Python sequences which causes downstream code (e.g., check_features and
n_features_in_) to record a bogus feature count; change _get_n_features so that
if X is a list or tuple and len(X) == 0 it raises a ValueError (with a clear
message like "Cannot infer number of features from an empty sequence") instead
of returning 0, leaving the existing try/except behavior for non-empty rows
intact so malformed row objects still fall through to later validation.

In `@python/cuml/cuml/preprocessing/encoders.py`:
- Around line 335-348: The transform method only checks n_features_in_ via
check_features(self, X) and then iterates X.columns, which allows a DataFrame
with the same width but reordered, missing, or extra columns to silently produce
a different encoding layout; modify transform to first validate the DataFrame
schema against the fitted feature list (self._features) by checking for missing
or unexpected columns and raising an error if any are found, then reindex X to
self._features (preserving fitted column order) before calling self._check_input
and iterating encoders in self._encoders; ensure error messages reference the
offending column names and maintain input-type handling (cuDF/pandas/NumPy) and
existing check_is_fitted() usage.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 40c08a5e-536c-4146-800f-60fbe3f08d8a

📥 Commits

Reviewing files that changed from the base of the PR and between f925fa9 and bd8a3b9.

📒 Files selected for processing (7)

python/cuml/cuml/internals/validation.py
python/cuml/cuml/neighbors/nearest_neighbors.pyx
python/cuml/cuml/preprocessing/encoders.py
python/cuml/cuml/preprocessing/onehotencoder_mg.py
python/cuml/cuml/preprocessing/ordinalencoder_mg.py
python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml
python/cuml/tests/test_validation.py

💤 Files with no reviewable changes (1)

python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml

🚧 Files skipped from review as they are similar to previous changes (1)

python/cuml/cuml/preprocessing/ordinalencoder_mg.py

jcrist · 2026-03-11T19:27:43Z

/merge

csadorf

LGTM!

jcrist added 6 commits March 10, 2026 12:47

Add check_features

f74ebdc

Use check_features, drop _set_n_features_in

27bc515

Use available_if to gate predict_proba

6a72d0d

Add test for check_features

b1ff6b2

Update xfail list

f925fa9

jcrist self-assigned this Mar 10, 2026

jcrist requested a review from a team as a code owner March 10, 2026 22:23

jcrist requested a review from betatim March 10, 2026 22:23

jcrist added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change cuml-accel Issues related to cuml.accel sklearn-api-compat Issues around cuml matching sklearn API conventions/standards labels Mar 10, 2026

github-actions Bot added the Cython / Python Cython or Python issue label Mar 10, 2026

coderabbitai Bot reviewed Mar 10, 2026

View reviewed changes

viclafargue approved these changes Mar 11, 2026

View reviewed changes

Comment thread python/cuml/cuml/experimental/linear_model/lars.pyx

Comment thread python/cuml/cuml/experimental/linear_model/lars.pyx

A few fixups

bd8a3b9

coderabbitai Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread python/cuml/cuml/preprocessing/encoders.py

One more xfail

2f0e73f

rapids-bot Bot merged commit 415760f into rapidsai:main Mar 11, 2026
243 of 249 checks passed

csadorf reviewed Mar 11, 2026

View reviewed changes

Comment thread python/cuml/cuml/naive_bayes/naive_bayes.py

jcrist deleted the check-features branch March 11, 2026 19:49

This was referenced Mar 12, 2026

Add support for feature_names_in_ #7877

Merged

Forward-merge release/26.04 into main #7894

Merged

This was referenced Apr 21, 2026

New input validation utilities #7973

Merged

CI Fix cuml.svm compatibility with scikit-learn 1.9 deprecation of probability= #7981

Merged

Conversation

jcrist commented Mar 10, 2026

Uh oh!

coderabbitai Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jcrist commented Mar 11, 2026

Uh oh!

Uh oh!

csadorf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coderabbitai Bot commented Mar 10, 2026 •

edited

Loading