Make `r2_score` compatible with Scikit-Learn by jcrist · Pull Request #6337 · rapidsai/cuml

jcrist · 2025-02-19T20:07:35Z

This makes cuml.metrics.r2_score compatible with the same function in sklearn.

In particular, we now support the sample_weight and multioutput options.

Moved the implementation to use cupy instead of calling functions in libcuml. The other regression metrics already used cupy, so there's precedence here.
Since the implementation no longer uses cython, we can move the file to .py and delete a bunch of cython code. Woot.
All the regression metrics share some input setup and processing; the code paths here were updated to share code, there should be no change in functionality in the other metrics.
The testing for the regression metrics was updated to improve coverage of options and ensure compatibility with sklearn.

This also adds a sample_weight keyword to RegressorMixin.score, to improve scikit-learn compatibility for our regressor models. A corresponding test was added using LinearRegression.

cjnolet

Not sure why this is ripping out the calls to the c++ layer but as I understand it that’s not the ideal route to the 0cc path. Just want to flag this before it’s merged.

I feel this is moving the wrong direction. There are definitely some functions that still call cupy but in general we have been moving the other direction (we implement in cupy for convenience and will move things to c++ later / as needed).

csadorf · 2025-02-20T14:52:32Z

Not sure why this is ripping out the calls to the c++ layer but as I understand it that’s not the ideal route to the 0cc path. Just want to flag this before it’s merged.

I feel this is moving the wrong direction. There are definitely some functions that still call cupy but in general we have been moving the other direction (we implement in cupy for convenience and will move things to c++ later / as needed).

@cjnolet I fully agree that we should evaluate this further in broader context, but I would recommend that we do so after the patch release. Right now, I think the priority should be to reach parity and address our current failures.

Would you say that this is important enough that we need to address this immediately or can we reevaluate post-GTC?

cjnolet · 2025-02-20T16:47:42Z

@csadorf we should be calling c++ primitives when they exist and not removing them from the Python layer. The changes in this PR are a step backwards as they are and I'd like that to be fixed before they are merged.

jcrist · 2025-02-20T17:11:24Z

we should be calling c++ primitives when they exist and not removing them from the Python layer.

Why though? For this function, performance is likely uncritical, and the current c++ implementation is missing several features that we need in cuml. Having two code paths (one that calls the C++ function when possible, falling back to cupy otherwise) adds extra complication, especially when we're already using cupy for all the other metrics in this file. From a maintenance perspective I'd rather keep the code simple and isolated in cuml, rather than requiring we call a C++ function split across 2 repos just because it exists.

csadorf · 2025-02-20T17:47:51Z

@csadorf we should be calling c++ primitives when they exist and not removing them from the Python layer. The changes in this PR are a step backwards as they are and I'd like that to be fixed before they are merged.

The motivation for this change is to overcome a current limitation of the 0cc layer. I would be happy to classify this as tech debt if need be.

Taking advantage of existing primitives instead of replicating them is a win in my book as long as we maintain performance. In this particular case, we could easily achieve that by calling the c++ primitives whenever possible and only drop that if our benchmarks show no speed-up across all scales.

As it stands, it seems like this change is too controversial as is, so I’d say we just document the limitation for the patch release and revisit this at a later point.

cjnolet

Talked w/ Dante about this offline. I see the vision here. Approving.

dantegd · 2025-02-21T06:24:05Z

/merge

This updates `r2_score` to match the interface provided by `sklearn.metrics.r2_score`. In particular, we now support the `sample_weight` and `multioutput` options. - Moved the implementation to use `cupy` instead of calling functions in `libcuml`. The other regression metrics already used `cupy`, so there's precedence here. - Since the implementation no longer uses cython, we can move the file to `.py` and delete a bunch of cython code. Woot. - All the regression metrics share some input setup and processing; the code paths here were updated to share code, there should be no change in functionality in the other metrics. - The testing for the regression metrics was updated to improve coverage of options and ensure compatibility with sklearn.

All scikit-learn regressors support a `sample_weight` keyword in their `score` method. This adds support for that (improving sklearn compatibility) as well as a test.

PRs being backported: - [x] #6234 - [x] #6306 - [x] #6320 - [x] #6319 - [x] #6327 - [x] #6333 - [x] #6142 - [x] #6223 - [x] #6235 - [x] #6317 - [x] #6331 - [x] #6326 - [x] #6332 - [x] #6347 - [x] #6348 - [x] #6337 - [x] #6355 - [x] #6354 - [x] #6322 - [x] #6353 - [x] #6359 - [x] #6364 - [x] #6363 - [x] [FIL BATCH_TREE_REORG fix for SM90, 100 and 120](a3e419a) --------- Co-authored-by: William Hicks <whicks@nvidia.com>

jcrist requested review from a team as code owners February 19, 2025 20:07

jcrist requested review from cjnolet and vyasr February 19, 2025 20:07

github-actions Bot added Cython / Python Cython or Python issue CMake labels Feb 19, 2025

jcrist added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change cuml-cpu and removed Cython / Python Cython or Python issue CMake labels Feb 19, 2025

jcrist commented Feb 19, 2025

View reviewed changes

Comment thread python/cuml/cuml/metrics/regression.py

Comment thread python/cuml/cuml/tests/test_metrics.py

Comment thread python/cuml/cuml/tests/test_linear_model.py

Comment thread python/cuml/cuml/ensemble/randomforestregressor.pyx

csadorf reviewed Feb 19, 2025

View reviewed changes

Comment thread python/cuml/cuml/metrics/regression.py

cjnolet requested changes Feb 20, 2025

View reviewed changes

dantegd requested changes Feb 20, 2025

View reviewed changes

Comment thread python/cuml/cuml/metrics/regression.pxd

dantegd reviewed Feb 20, 2025

View reviewed changes

Comment thread python/cuml/cuml/metrics/regression.py

cjnolet approved these changes Feb 20, 2025

View reviewed changes

github-actions Bot added Cython / Python Cython or Python issue CMake labels Feb 20, 2025

jcrist force-pushed the r2-score-compat branch from e502405 to 96c9e7d Compare February 20, 2025 19:48

csadorf approved these changes Feb 20, 2025

View reviewed changes

dantegd approved these changes Feb 21, 2025

View reviewed changes

dantegd mentioned this pull request Feb 21, 2025

Backport release 25.04 PRs for patch release version 25.02.01 #6329

Merged

24 tasks

jcrist added 2 commits February 21, 2025 18:44

Add sample_weight to RegressorMixin.score

6093906

All scikit-learn regressors support a `sample_weight` keyword in their `score` method. This adds support for that (improving sklearn compatibility) as well as a test.

jcrist added 2 commits February 21, 2025 18:44

Deprecate handle and convert_dtype in r2_score

6b16101

Fixup docstring

ce4cf39

jcrist force-pushed the r2-score-compat branch from fe7c4d0 to ce4cf39 Compare February 21, 2025 18:44

rapids-bot Bot merged commit 23a08a3 into rapidsai:branch-25.04 Feb 21, 2025

jcrist deleted the r2-score-compat branch February 21, 2025 22:33

dantegd added a commit to dantegd/cuml that referenced this pull request Feb 23, 2025

Backport PR rapidsai#6337

6c05eff

dantegd added a commit to dantegd/cuml that referenced this pull request Feb 26, 2025

Backport PR rapidsai#6337

63db155

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `r2_score` compatible with Scikit-Learn#6337

Make `r2_score` compatible with Scikit-Learn#6337
rapids-bot[bot] merged 4 commits intorapidsai:branch-25.04from
jcrist:r2-score-compat

jcrist commented Feb 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cjnolet left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

csadorf commented Feb 20, 2025

Uh oh!

cjnolet commented Feb 20, 2025

Uh oh!

jcrist commented Feb 20, 2025

Uh oh!

csadorf commented Feb 20, 2025

Uh oh!

cjnolet left a comment

Uh oh!

dantegd commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jcrist commented Feb 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cjnolet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

csadorf commented Feb 20, 2025

Uh oh!

cjnolet commented Feb 20, 2025

Uh oh!

jcrist commented Feb 20, 2025

Uh oh!

csadorf commented Feb 20, 2025

Uh oh!

cjnolet left a comment

Choose a reason for hiding this comment

Uh oh!

dantegd commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cjnolet left a comment •

edited

Loading