Skip to content

fix: Support varying quantile values per group in group_by aggregation#25606

Open
wtn wants to merge 3 commits intopola-rs:mainfrom
wtn:quantile
Open

fix: Support varying quantile values per group in group_by aggregation#25606
wtn wants to merge 3 commits intopola-rs:mainfrom
wtn:quantile

Conversation

@wtn
Copy link
Copy Markdown
Contributor

@wtn wtn commented Dec 3, 2025

Fixes #20951 and its duplicate #25888.

When using group_by() with a quantile parameter that varies per group (e.g., pl.col.quantile.first()), all groups incorrectly received the same quantile value instead of each group using its own.

Reproduction

df = pl.DataFrame({
    "value": [1, 2, 1, 2],
    "quantile": [0, 0, 1, 1],
})
df.group_by("quantile").agg(pl.col("value").quantile(pl.col("quantile").first()))
# Expected: quantile=0 -> 1.0, quantile=1 -> 2.0
# Actual: both groups returned 1.0

Cause

AggQuantileExpr::evaluate_on_groups() always called get_quantile() which evaluates the quantile expression against the full dataframe, returning a single scalar. This worked for literal quantile values but failed when the quantile expression varied per group (e.g., first() aggregation).

Fix

Added agg_varying_quantile which accepts a slice of quantile values (one per group) and computes quantile per group using the existing aggregation helpers.

polars-core changes:

  • Added agg_helper_idx_on_all_with_idx and _agg_helper_slice_with_idx helpers that pass the group index to closures
  • Added agg_varying_quantile_generic that iterates over groups with their corresponding quantile values
  • Added agg_varying_quantile methods to Float32Chunked, Float64Chunked, integer ChunkedArray, Series, and Column

polars-expr changes:

  • AggQuantileExpr::evaluate_on_groups() now detects whether the quantile is uniform (literal/scalar) or varies per group, and dispatches to the appropriate path

@github-actions github-actions Bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Dec 3, 2025
@wtn wtn marked this pull request as ready for review December 3, 2025 18:00
@wtn wtn force-pushed the quantile branch 2 times, most recently from 1269250 to ea1dbcd Compare December 3, 2025 18:27
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 93.93939% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.43%. Comparing base (0a50a14) to head (228ed1d).
⚠️ Report is 188 commits behind head on main.

Files with missing lines Patch % Lines
...polars-core/src/frame/group_by/aggregations/mod.rs 93.61% 6 Missing ⚠️
...s-core/src/frame/group_by/aggregations/dispatch.rs 90.56% 5 Missing ⚠️
crates/polars-expr/src/expressions/aggregation.rs 97.56% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main   #25606    +/-   ##
========================================
  Coverage   81.43%   81.43%            
========================================
  Files        1801     1801            
  Lines      246750   246930   +180     
  Branches     3081     3081            
========================================
+ Hits       200936   201084   +148     
- Misses      45028    45060    +32     
  Partials      786      786            

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread crates/polars-core/src/frame/group_by/aggregations/mod.rs Outdated
@wtn wtn force-pushed the quantile branch 6 times, most recently from 652fb47 to 2fffe21 Compare December 10, 2025 19:52
@aparna2198
Copy link
Copy Markdown
Contributor

@orlp @alexander-beedie @ritchie46 @reswqa @c-peters @MarcoGorelli can we get the review here, we have alot of duplicate issues arising for this one. thanks

@wtn wtn force-pushed the quantile branch 2 times, most recently from 8251863 to 9b2f5dc Compare December 29, 2025 22:14
return None;
}
let take = { ca.take_unchecked(idx) };
take._quantile(quantile, method).unwrap_unchecked()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was in the original code, but can you make this just a normal unwrap()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed unwrap_unchecked()unwrap().

let ca = ca.rechunk();
agg_helper_idx_on_all_with_idx::<K, _>(groups, |(group_idx, idx)| {
debug_assert!(idx.len() <= ca.len());
if idx.is_empty() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not match len here for a single-element fast path like below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added single-element fast path with match idx.len().

assert result["group"].to_list() == ["a", "b"]


def test_quantile_varying_by_group_float32() -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these are way too many individual tests. You can search around for pytest.mark.parametrize to see how to parametrize the tests. You can verify against a naive Python implementation which groups into lists then manually calls quantile on each list with a specific quantile.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated into test_group_by_varying_quantile, parametrized over dtype and method, with numpy reference verification.

@wtn wtn force-pushed the quantile branch 3 times, most recently from 0c543cc to cf61dd8 Compare December 30, 2025 21:19
@wtn
Copy link
Copy Markdown
Contributor Author

wtn commented Dec 30, 2025

OK, I've pushed my changes. 🏓

@wtn wtn force-pushed the quantile branch 7 times, most recently from eb2a7b3 to 5f80e32 Compare January 16, 2026 17:36
@wtn wtn force-pushed the quantile branch 7 times, most recently from da3b78e to e8a3b81 Compare February 6, 2026 20:52
@wtn wtn marked this pull request as draft February 9, 2026 03:37
@wtn wtn force-pushed the quantile branch 2 times, most recently from 3df53e3 to 2027e97 Compare February 9, 2026 04:20
@wtn wtn marked this pull request as ready for review February 9, 2026 04:20
@wtn wtn force-pushed the quantile branch 3 times, most recently from 65d114c to 32cd2d3 Compare February 12, 2026 17:36
wtn added a commit to wtn/polars that referenced this pull request Feb 20, 2026
wtn added a commit to wtn/polars that referenced this pull request Feb 26, 2026
@wtn wtn force-pushed the quantile branch 2 times, most recently from c40ea73 to fc86da2 Compare February 26, 2026 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Bug fix python Related to Python Polars rust Related to Rust Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Varying quantile by group is broken

4 participants