Make r2_score compatible with Scikit-Learn#6337
Make r2_score compatible with Scikit-Learn#6337rapids-bot[bot] merged 4 commits intorapidsai:branch-25.04from
r2_score compatible with Scikit-Learn#6337Conversation
There was a problem hiding this comment.
Not sure why this is ripping out the calls to the c++ layer but as I understand it that’s not the ideal route to the 0cc path. Just want to flag this before it’s merged.
I feel this is moving the wrong direction. There are definitely some functions that still call cupy but in general we have been moving the other direction (we implement in cupy for convenience and will move things to c++ later / as needed).
@cjnolet I fully agree that we should evaluate this further in broader context, but I would recommend that we do so after the patch release. Right now, I think the priority should be to reach parity and address our current failures. Would you say that this is important enough that we need to address this immediately or can we reevaluate post-GTC? |
|
@csadorf we should be calling c++ primitives when they exist and not removing them from the Python layer. The changes in this PR are a step backwards as they are and I'd like that to be fixed before they are merged. |
Why though? For this function, performance is likely uncritical, and the current c++ implementation is missing several features that we need in cuml. Having two code paths (one that calls the C++ function when possible, falling back to cupy otherwise) adds extra complication, especially when we're already using |
The motivation for this change is to overcome a current limitation of the 0cc layer. I would be happy to classify this as tech debt if need be. Taking advantage of existing primitives instead of replicating them is a win in my book as long as we maintain performance. In this particular case, we could easily achieve that by calling the c++ primitives whenever possible and only drop that if our benchmarks show no speed-up across all scales. As it stands, it seems like this change is too controversial as is, so I’d say we just document the limitation for the patch release and revisit this at a later point. |
cjnolet
left a comment
There was a problem hiding this comment.
Talked w/ Dante about this offline. I see the vision here. Approving.
e502405 to
96c9e7d
Compare
|
/merge |
This updates `r2_score` to match the interface provided by `sklearn.metrics.r2_score`. In particular, we now support the `sample_weight` and `multioutput` options. - Moved the implementation to use `cupy` instead of calling functions in `libcuml`. The other regression metrics already used `cupy`, so there's precedence here. - Since the implementation no longer uses cython, we can move the file to `.py` and delete a bunch of cython code. Woot. - All the regression metrics share some input setup and processing; the code paths here were updated to share code, there should be no change in functionality in the other metrics. - The testing for the regression metrics was updated to improve coverage of options and ensure compatibility with sklearn.
All scikit-learn regressors support a `sample_weight` keyword in their `score` method. This adds support for that (improving sklearn compatibility) as well as a test.
fe7c4d0 to
ce4cf39
Compare
PRs being backported: - [x] #6234 - [x] #6306 - [x] #6320 - [x] #6319 - [x] #6327 - [x] #6333 - [x] #6142 - [x] #6223 - [x] #6235 - [x] #6317 - [x] #6331 - [x] #6326 - [x] #6332 - [x] #6347 - [x] #6348 - [x] #6337 - [x] #6355 - [x] #6354 - [x] #6322 - [x] #6353 - [x] #6359 - [x] #6364 - [x] #6363 - [x] [FIL BATCH_TREE_REORG fix for SM90, 100 and 120](a3e419a) --------- Co-authored-by: William Hicks <whicks@nvidia.com>
This makes
cuml.metrics.r2_scorecompatible with the same function insklearn.In particular, we now support the
sample_weightandmultioutputoptions.cupyinstead of calling functions inlibcuml. The other regression metrics already usedcupy, so there's precedence here..pyand delete a bunch of cython code. Woot.This also adds a
sample_weightkeyword toRegressorMixin.score, to improve scikit-learn compatibility for our regressor models. A corresponding test was added usingLinearRegression.