Skip to content

FIX Raise TypeError when sparse input is not supported#7728

Merged
rapids-bot[bot] merged 10 commits intorapidsai:mainfrom
betatim:fix-sparse-exception
Feb 11, 2026
Merged

FIX Raise TypeError when sparse input is not supported#7728
rapids-bot[bot] merged 10 commits intorapidsai:mainfrom
betatim:fix-sparse-exception

Conversation

@betatim
Copy link
Copy Markdown
Member

@betatim betatim commented Jan 28, 2026

This makes cuml more compatible with scikit-learn by raising the exception type that it expects.

Tackling one of the many xfail'ed checks in the compatibility testing.

This changes the type from NotImplementedError to TypeError, which is technically a breaking change. However, I don't think this will effect many people as it is only a problem for those who are passing sparse data to an estimator that doesn't support it and then rely on catching the exception. I can't think of a way to handle this via a "deprecation cycle".

It is a small change that fixes several estimators are the same time and gives us a better overview of those that truly fail this check.

This makes cuml more compatible with scikit-learn by raising the
exception type that it expects.
@betatim betatim requested a review from a team as a code owner January 28, 2026 11:50
@betatim betatim requested a review from csadorf January 28, 2026 11:50
@betatim betatim added improvement Improvement / enhancement to an existing function breaking Breaking change labels Jan 28, 2026
@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Jan 28, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 4, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Passing sparse inputs now raises a clear TypeError indicating dense data is required.
  • Tests

    • Compatibility tests updated: many prior sparse expected-fail markers removed so sparse checks now run; two projection estimators’ expected messages adjusted to reflect observed ValueError behavior on small/sparse inputs.

Walkthrough

Sparse-input handling in internals changed: from_input now raises TypeError with an explicit "sparse matrix passed, dense required" message; tests updated accordingly. Many per-estimator xfail entries for sklearn sparse-data compatibility were removed and two random-projection xfail messages were adjusted.

Changes

Cohort / File(s) Summary
Array & exception tests
python/cuml/cuml/internals/array.py, python/cuml/tests/test_exceptions.py
Changed sparse-input error from NotImplementedError to TypeError with a message indicating sparse matrices are unsupported; updated tests to expect TypeError and match on "sparse".
Sklearn compatibility tests
python/cuml/tests/test_sklearn_compatibility.py
Removed many check_estimator_sparse_tag / sparse-related xfail entries from PER_ESTIMATOR_XFAIL_CHECKS; adjusted GaussianRandomProjection and SparseRandomProjection entries to map sparse-tag failures to ValueError messages about small-dataset failures.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • csadorf
  • viclafargue
  • jcrist
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'FIX Raise TypeError when sparse input is not supported' directly and clearly summarizes the main change: converting the exception type from NotImplementedError to TypeError for sparse input handling.
Description check ✅ Passed The description explains the rationale for the change (scikit-learn compatibility), acknowledges it as a breaking change, notes the impact scope, and describes the benefit of fixing multiple estimators simultaneously.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
python/cuml/tests/test_sklearn_compatibility.py (1)

572-572: Consider aligning the SparseRandomProjection rationale with GaussianRandomProjection's.

Line 543 says "...raises ValueError with small datasets even though it supports sparse inputs" which clearly explains why check_estimator_sparse_tag fails despite sparse being supported. Line 572 omits the "even though it supports sparse inputs" clarification. Since SparseRandomProjection also inherits SparseInputTagMixin and declares sparse support, the same tag-behavior nuance applies.

Proposed fix
-        "check_estimator_sparse_tag": "SparseRandomProjection raises ValueError with small datasets",
+        "check_estimator_sparse_tag": "SparseRandomProjection raises ValueError with small datasets even though it supports sparse inputs",

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@python/cuml/tests/test_sklearn_compatibility.py`:
- Around line 702-704: The xfail rationale masks a tag-behavior mismatch:
scikit-learn's check_estimator_sparse_tag detects that input_tags.sparse=True
(via SparseInputTagMixin on GaussianRandomProjection and SparseRandomProjection)
but fit() rejects CSR sparse input; update the PR to either remove
SparseInputTagMixin (so input_tags.sparse is not declared) if sparse inputs
truly aren't supported, or amend the xfail rationale text to explicitly state
that the failure is due to the mismatch between input_tags.sparse and actual
fit() behavior (i.e., tag declares sparse support but fit() raises ValueError
for small datasets) so reviewers know the check is catching a tag inconsistency
rather than just the Johnson–Lindenstrauss small-dataset constraint.

Comment thread python/cuml/tests/test_sklearn_compatibility.py Outdated
@jcrist
Copy link
Copy Markdown
Member

jcrist commented Feb 11, 2026

/merge

@rapids-bot rapids-bot Bot merged commit 7aa29db into rapidsai:main Feb 11, 2026
89 of 91 checks passed
@betatim betatim deleted the fix-sparse-exception branch February 11, 2026 14:31
dantegd added a commit to dantegd/cuml that referenced this pull request Feb 17, 2026
This makes cuml more compatible with scikit-learn by raising the exception type that it expects.

Tackling one of the many xfail'ed checks in the compatibility testing.

This changes the type from `NotImplementedError` to `TypeError`, which is technically a breaking change. However, I don't think this will effect many people as it is only a problem for those who are passing sparse data to an estimator that doesn't support it and then rely on catching the exception. I can't think of a way to handle this via a "deprecation cycle".

It is a small change that fixes several estimators are the same time and gives us a better overview of those that truly fail this check.

Authors:
  - Tim Head (https://github.com/betatim)
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - Simon Adorf (https://github.com/csadorf)
  - Jim Crist-Harif (https://github.com/jcrist)

URL: rapidsai#7728
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants