Support numeric_only for simple groupby aggregations for pandas 2.0 compatibility#9889
Conversation
|
There's one failure left to resolve, |
|
The test failure is resolved by 8b877b5. |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @j-bennet! This will be nice to have -- looking forward to seeing it merged
…ception was raised.
dask/dataframe/_compat.py
Outdated
| warnings.filterwarnings( | ||
| "ignore", | ||
| message="The default value of numeric_only in", | ||
| message="Dropping of nuisance columns", |
There was a problem hiding this comment.
In pandas 1.3, there's this warning on non-numeric data with some of the aggs, and it's different from what it does in 1.5. We caught the 1.5 warning before, but not this one.
There was a problem hiding this comment.
There was one test where this is still needed. I'm inclined to just handle it as follow-up work
| maybe_raise = not ( | ||
| func.__name__ == "agg" | ||
| and len(args) > 0 | ||
| and args[0] not in NUMERIC_ONLY_NOT_IMPLEMENTED | ||
| ) |
There was a problem hiding this comment.
Noting that this is to catch when operations in NUMERIC_ONLY_NOT_IMPLEMENTED are being used inside an agg(...) call
dask/dataframe/groupby.py
Outdated
| with warnings.catch_warnings(): | ||
| warnings.filterwarnings( | ||
| "ignore", | ||
| message="In a future version, the Index constructor will not infer numeric dtypes", | ||
| category=FutureWarning, | ||
| ) |
There was a problem hiding this comment.
Is this related to other changes in this PR or known flaky tests?
There was a problem hiding this comment.
This makes a flaky test happy.
| assert_eq(ddf.groupby(ddf.w).y.nunique(), df.groupby(df.w).y.nunique()) | ||
| assert_eq(ddf.y.groupby(ddf.w).count(), df.y.groupby(df.w).count()) |
There was a problem hiding this comment.
I'm curious why these are indented. Are we emitting warnings now? I would have expected us to match warnings from pandas and, since pandas didn't appear to be warning before, I'm confused why we might be
There was a problem hiding this comment.
Pandas was warning before, we didn't. Now we do, have to catch it.
…heck warning behavior.
Co-authored-by: James Bourbeau <[email protected]>
|
The last failure with minimal dependencies does not seem to be related... possibly flaky? https://github.com/dask/dask/actions/runs/4079092927/jobs/7030092277 |
Hmm I've not seen those before. My guess is they're somehow related to the changes in this PR |
jrbourbeau
left a comment
There was a problem hiding this comment.
Turns out there were a couple of bugs in older versions of pandas that weren't straightforward to workaround. I pushed a9e27d9 which just skips those specific configurations for now. I'll also push up a PR that bumps our minimum pandas version (it's been a while since we've done that).
|
|
||
|
|
||
| def test_unknown_categoricals(shuffle_method): | ||
| # TODO: Remove the filterwarnings below |
There was a problem hiding this comment.
When can we do this TODO?
numeric_only for simple groupby aggregations for pandas 2.0 compatibility
This PR adds `dtypes` property to `GroupBy`, this will also fix some upstream dask breaking changes introduced in: dask/dask#9889 Issue was discovered in: #12768 (comment) Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Ashwin Srinath (https://github.com/shwina) URL: #12783
Partially implement
numeric_onlyon GroupBy operations, to align the behavior with Pandas.This PR only includes changes for aggs that are using
_single_agginternally. More complicated aggs will have to be handled separately.Xref #9736.
Xref #9471.
pre-commit run --all-files