Ignore cudf's __dataframe__ deprecation.#6229
Ignore cudf's __dataframe__ deprecation.#6229raydouglass merged 3 commits intorapidsai:branch-25.02from
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
Help on this PR is welcome! Please feel free to push if you can fix any of the remaining test failures. |
|
|
||
|
|
||
| # Ignore FutureWarning: Using `__dataframe__` is deprecated | ||
| @pytest.mark.filterwarnings("ignore::FutureWarning") |
There was a problem hiding this comment.
This makes sure we only ignore the dataframe warning and not all FutureWarnings. Also remvose the need for the comment I'd say
| @pytest.mark.filterwarnings("ignore::FutureWarning") | |
| @pytest.mark.filterwarnings("ignore:Support for loading dataframes via the `__dataframe__` interchange protocol is deprecated") |
(same for the other occurrence)
There was a problem hiding this comment.
I'm unable to reproduce getting a warning in this test (and don't see how one could be generated). I think this one can just be dropped.
There was a problem hiding this comment.
@jcrist What version of cudf are you using? Only recent 25.02 nightlies will show this.
There was a problem hiding this comment.
25.02.00a273. I see the warning at the other location, but not here.
There was a problem hiding this comment.
I cannot reproduce this one either.
There was a problem hiding this comment.
I filed #6239 as a follow-up with this filter removed. My hope is that this PR passes CI and we can merge it as-is, then follow up with that PR.
|
Looking at the failed CI jobs I see a lot of: Which makes me think that somewhere in |
|
rapidsai/rmm#1775 would cause this warning, but we searched the RAPIDS code base extensively to make sure there were no internal uses of this that would trigger deprecations... I am looking now to see what we might have missed. |
|
I still don't see anything and can't reproduce locally. I am trying to rerun. |
|
I'm taking a look too. |
|
Seems like rerunning CI has fixed the problem. I suspect there was some intermediate state where the RMM PR had been merged but not all artifacts / dependencies agreed on how it was supposed to be used until the dependency tree (notably cudf) was rebuilt? Not sure. |
|
|
||
|
|
||
| # Ignore FutureWarning: Using `__dataframe__` is deprecated | ||
| @pytest.mark.filterwarnings("ignore::FutureWarning") |
There was a problem hiding this comment.
I'm unable to reproduce getting a warning in this test (and don't see how one could be generated). I think this one can just be dropped.
|
|
||
|
|
||
| # Ignore FutureWarning: Using `__dataframe__` is deprecated | ||
| @pytest.mark.filterwarnings("ignore::FutureWarning") |
There was a problem hiding this comment.
AFAICT this test checks that GridSearchCV works with cudf, which is no longer true after __dataframe__ was deprecated. IMO we should delete the test (or ask the cudf team to reconsider). Filtering the warning only puts things off until __dataframe__ is removed when it'll just break again.
There was a problem hiding this comment.
Are you able to reproduce this warning locally? If so, can you please try commenting out the __dataframe__ implementation in dataframe.py in cudf and try again? Is cuml using __dataframe__ explicitly or is it an implicit path such that a different path would be taken if this implementation doesn't exist?
There was a problem hiding this comment.
That's a good point! From reading the sklearn code I think sklearn has a path that would be taken once the __dataframe__ path is fully removed, so maybe filtering out the warning for now is fine. I'll need to get a local cudf build setup to try, will do later today.
There was a problem hiding this comment.
Just realized dataframe.py isn't in cython, so this was an easy quick check. I can confirm that once the __dataframe__ code is removed from cudf then things work again (though I can't say how efficiently). Using filterwarnings here seems fine (though with the recommendation to a more specific filter that Tim made above).
There was a problem hiding this comment.
The old __dataframe__ code path wasn't actually "efficient" in any meaningful way. There is no device data transport happening, only metadata. That's actually the core problem with this protocol: it doesn't specify who is responsible for actually transferring data across device boundaries, leading to consumers having to make per-library distinctions. That's discussed a bit more in rapidsai/cudf#17403.
There was a problem hiding this comment.
Thanks for testing so quickly, I was just spinning up my own dev environment to be able to verify this claim myself!
There was a problem hiding this comment.
Would prefer a more specific filter as well.
There was a problem hiding this comment.
I filed #6239 as a follow-up with the proposal above to make the filter more specific. My hope is that this PR passes CI and we can merge it as-is, then follow up with that PR.
|
Thanks for all the reviews. If CI passes, I think we should merge this as-is so that CI is unblocked.
|
|
CUDA 11.8 ARM wheel tests are failing with a message that is similar to some test failures we've seen popping up in cuVS. I do not know the root cause for this. Perhaps we can request an admin-merge on this PR or #6239 and handle the CUDA 11.8 ARM wheel tests separately. |
Currently CI is failing due to rapidsai/cudf#17736.
The
__dataframe__protocol appears to be used internally by scikit-learn: https://github.com/scikit-learn/scikit-learn/blob/311bf6badd74bb69081eb90e2643f15706d3473c/sklearn/utils/validation.py#L389Errors look like:
This PR ignores the
FutureWarningto allow CI to pass.