Support multiple ordered `array_agg` aggregations #16625

findepi · 2025-06-30T13:51:59Z

Which issue does this PR close?

None. #8582 is related.

Rationale for this change

Before the change, array_agg with ordering would depend on input being
ordered. As a result, it was impossible to do two or more array_agg(x ORDER BY ...) with incompatible ordering.

What changes are included in this PR?

This change moves ordering
responsibility into OrderSensitiveArrayAggAccumulator. When input is
pre-ordered (beneficial ordering), no additional work is done. However,
when it's not, array_agg accumulator will order the data on its own.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

Due to `..` in the pattern, the `OrderSensitiveArrayAggAccumulator::merge_batch` did not validate it's not receiving additional states columns it ignores. Update the code to check number of inputs.

Before the change, `array_agg` with ordering would depend on input being ordered. As a result, it was impossible to do two or more `array_agg(x ORDER BY ...)` with incompatible ordering. This change moves ordering responsibility into `OrderSensitiveArrayAggAccumulator`. When input is pre-ordered (beneficial ordering), no additional work is done. However, when it's not, `array_agg` accumulator will order the data on its own.

ozankabak

A few comments:

FIRST_VALUE and LAST_VALUE implementations use requirement_satisfied and with_requirement_satisfied names for this thing, IMO it would be a good idea to follow suit here for consistency/readability purposes.
There are some changes where the planner was able to get away with appending a key to an already-existing sort, and operate the accumulator in an efficient mode. This PR loses that. Good news is that we already have the mechanism to address this in place (i.e. OrderingRequirements) and the solution is not hard. Aggregations involving functions that benefit from existing ordering should return OrderingRequirements::Soft, and the enforce_sorting rule should leverage any already existing sort (if present).

findepi · 2025-06-30T15:26:37Z

thanks for your review @ozankabak

FIRST_VALUE and LAST_VALUE implementations use requirement_satisfied and with_requirement_satisfied names for this thing, IMO it would be a good idea to follow suit here for consistency/readability purposes.

will align for array_agg

i am ll for consistency. do you think we could update AggregateUDFImpl::with_beneficial_ordering(beneficial_ordering bool) signature to follow the desired naming?

There are some changes where the planner was able to get away with appending a key to an already-existing sort, and operate the accumulator in an efficient mode. This PR loses that.

Yes, i noticed the PR drops some redundant sorts, which is likely quite a big improvement for aggregations with GROUP BY. Am i reading this correctly? I agree that in some rare situations those sorts are still the optimal way to go. However, I don't see a way for UDAF to declare OrderingRequirements::Soft. Is this a hard blocker for this PR, or can we avoid scope creep?

findepi · 2025-06-30T15:38:34Z

will align for array_agg

Sorry, with requirement_satisfied i am afraid it's easy to not to know what exact requirement is guaranteed to be satisfied. I propose that we change the FIRST_VALUE and LAST_VALUE implementations. #16631

ozankabak · 2025-06-30T15:58:14Z

Sorry, with requirement_satisfied i am afraid it's easy to not to know what exact requirement is guaranteed to be satisfied. I propose that we change the FIRST_VALUE and LAST_VALUE implementations. #16631

I don't think it matters and I think the name was already kind of obvious, but since you already opened a PR for it I went ahead and approved it.

Regarding your question about plan changes, I don't think we have enough information to call it scope creep -- the PR adds a new capability that we were working towards for a while, but also loses something that already exists. That was actually the main reason why we didn't take the plunge just yet as we worked on this over the past few months.

If getting this over the finish line in the short term is important to you, I think a reasonable step forward is to take a look at what solving it entails: There are basically two steps: (1) AggregateExec needs to consult the UDAF definition as it forms its required input ordering (this is the easy step), (2) The enforce sorting rule needs to address the case when there is already a sort with a prefix of a soft requirement, and just extend the sort keys (this is the harder step).

I think the solution will ultimately be small in terms of LOC changes, but the second step will require some thinking. If making an attempt at solving reveals that the problem has challenges that we don't foresee now, we can reconsider whether we want to accept the plan changes and open an issue to track the work that I described in the above paragraph.

Thanks

findepi · 2025-06-30T18:43:26Z

I think global array_agg is not a very interesting scenario and a grouped array_agg no longer requires global sorting.
Can we agree the latter is an improvement? Perhaps a big once, since global sorting requires single-threadedness.

alamb

Thanks @findepi -- the idea makes sense to me

My only concern is switching to use sort of ScalarValue rather than arrays (in SortExec) as it might be quite slower

There is a current related outstanding PR from @sfluor :

#16519

I think @rluvaton is familar with ths code too. Perhaps he has some time to review as well?

datafusion/functions-aggregate/src/array_agg.rs

datafusion/sqllogictest/test_files/aggregate.slt

alamb · 2025-06-30T22:46:23Z

datafusion/sqllogictest/test_files/aggregate.slt

-04)------SortExec: expr=[c2@1 DESC, c3@2 ASC NULLS LAST], preserve_partitioning=[true]
-05)--------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
-06)----------DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/aggregate_agg_multi_order.csv]]}, projection=[c1, c2, c3], file_type=csv, has_header=true
+04)------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1


I wonder if we should benchmark this -- sorting by ScalarValue is likely a lot less efficient than using the fast array sort / etc that is done in SortExec

for global aggregation -- agreed. but then, a global array_agg aggregation cannot feasible operate on large amounts of data, can it? (or rather: it can, but that's unlikely a common scenario)

@alamb i pushed some changes following @ozankabak 's suggestions and the plans have (expectedly) changed. Can you take another look?

The sorting consideration before aggregations did respect only ordered aggregation functions with `AggregateOrderSensitivity::HardRequirement`. This change includes sorting expectations from `AggregateOrderSensitivity::Beneficial` functions. When beneficial ordered function requirements are not satisfied, no error is raised, they are considered in the second pass only.

findepi · 2025-07-01T09:01:56Z

datafusion/sqllogictest/test_files/group_by.slt

+DataFusion error: Internal error: Input field name last_value(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST] does not match with the projection expression first_value(sales_global.amount) ORDER BY [sales_global.amount ASC NULLS LAST].
+This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker


Unblocking beneficial ordering triggered an error around reversing first_value / last_value.
Fix coming.

Upon reversing, a schema and field mismatch would happen.

ozankabak · 2025-07-01T09:28:22Z

datafusion/physical-expr/src/aggregate.rs

            ReversedUDAF::NotSupported => None,
            ReversedUDAF::Identical => Some(self.clone()),
            ReversedUDAF::Reversed(reverse_udf) => {
-                let mut name = self.name().to_string();


I think removing this may have other unintended effects. I will request some more eyes on this

Thanks!
I agree this code was deliberate & nice. I hope we don't parse those names though.
If there is a better solution to agg reverse causing failures (#16625 (comment)), let me know. I can also drop this fix, I don't like it too.

Alternatively to the fix, I can block reversing for beneficial functions and thus hide the problem for now. Would it be preferred for this PR?

alamb

I took another quick look --- thank you @findepi

I am worried about the unintended consequences of this change (mostly because I don't understand the code well enough to know what the invariants are / if we are breaking them).

Some of the plan changes definitely look wrong to me -- I am not sure if it is just a column naming thing or øf the expressions are actually wrong now 🤔

alamb · 2025-07-01T21:01:20Z

datafusion/sqllogictest/test_files/group_by.slt

 physical_plan
 01)ProjectionExec: expr=[country@0 as country, array_agg(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST]@1 as amounts, first_value(sales_global.amount) ORDER BY [sales_global.amount ASC NULLS LAST]@2 as fv1, last_value(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST]@3 as fv2]
-02)--AggregateExec: mode=Single, gby=[country@0 as country], aggr=[array_agg(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST], last_value(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST], last_value(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST]]
+02)--AggregateExec: mode=Single, gby=[country@0 as country], aggr=[array_agg(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST], first_value(sales_global.amount) ORDER BY [sales_global.amount ASC NULLS LAST], last_value(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST]]


that certainly seems like an improvement

alamb · 2025-07-01T21:02:13Z

datafusion/sqllogictest/test_files/group_by.slt

 01)ProjectionExec: expr=[country@0 as country, first_value(sales_global.amount) ORDER BY [sales_global.ts DESC NULLS FIRST]@1 as fv1, last_value(sales_global.amount) ORDER BY [sales_global.ts DESC NULLS FIRST]@2 as lv1, sum(sales_global.amount) ORDER BY [sales_global.ts DESC NULLS FIRST]@3 as sum1]
 02)--AggregateExec: mode=Single, gby=[country@0 as country], aggr=[first_value(sales_global.amount) ORDER BY [sales_global.ts DESC NULLS FIRST], last_value(sales_global.amount) ORDER BY [sales_global.ts DESC NULLS FIRST], sum(sales_global.amount) ORDER BY [sales_global.ts DESC NULLS FIRST]]
-03)----DataSourceExec: partitions=1, partition_sizes=[1]
+03)----SortExec: expr=[ts@1 DESC], preserve_partitioning=[false]


It is weird -- the comments say this plan should have a SortExec but the plan that is checked in does not have one

alamb · 2025-07-01T21:05:05Z

datafusion/sqllogictest/test_files/aggregate.slt

 01)AggregateExec: mode=Final, gby=[], aggr=[first_value(convert_first_last_table.c1) ORDER BY [convert_first_last_table.c3 DESC NULLS FIRST]]
 02)--CoalescePartitionsExec
-03)----AggregateExec: mode=Partial, gby=[], aggr=[last_value(convert_first_last_table.c1) ORDER BY [convert_first_last_table.c3 ASC NULLS LAST]]
+03)----AggregateExec: mode=Partial, gby=[], aggr=[first_value(convert_first_last_table.c1) ORDER BY [convert_first_last_table.c3 DESC NULLS FIRST]]


Now this makes it look like the plan is wrong:

the query calls for first_value(c1 ORDER BY c3 desc) but the table is sorted by c3 ASC

I think internally the optimizer has rewritten first_value(c1 ORDER BY c3 desc) to last_value(c1 ORDER BY c3 ASC)

However this plan makes it look like that didn't happen

There was renaming to address this, but the renaming did not exactly work -- #16625 (comment).

ozankabak · 2025-07-01T21:10:32Z

Agreed, I think this will need some time to brew. As I said previously, I hope to get some more eyes on this in the short term (maybe early next week)

…-array-aggs

ozankabak · 2025-07-03T14:59:07Z

I think we will find a solution that avoids this redundancy. Expect some feedback from me (or someone on my team) in a few days

findepi · 2025-07-16T19:27:28Z

@ozankabak @alamb can you please help me understand where you would want to go with this?

or maybe DF doesn't need to support ordered array_aggs (more than one in a query)?

alamb · 2025-07-18T20:41:46Z

@ozankabak @alamb can you please help me understand where you would want to go with this?

I think supporting multiple ordered array_agg aggregations makes sense to me; I have not had a chance to review this PR recently.

Is it ready for another review?

alamb

Thanks for pushing this forward @findepi

I personally think supporting multiple sorted aggregates is a useful feature and we should work on it. My only real concern with the code currently in this PR is that it may breaking the FFI API (which is ideally supposed to be stable)

I re-read the comments on this PR and I wonder if you tried implementing the solution suggested by @ozankabak in #16625 (comment):

If getting this over the finish line in the short term is important to you, I think a reasonable step forward is to take a look at what solving it entails: There are basically two steps: (1) AggregateExec needs to consult the UDAF definition as it forms its required input ordering (this is the easy step), (2) The enforce sorting rule needs to address the case when there is already a sort with a prefix of a soft requirement, and just extend the sort keys (this is the harder step).

I think the solution will ultimately be small in terms of LOC changes, but the second step will require some thinking.

This PR seems similar except that it adds the SoftRequirement stage as well. If we could avoid the need for SoftRequirement I think this PR would be pretty great

alamb · 2025-07-19T11:35:32Z

datafusion/ffi/src/udaf/mod.rs

 pub enum FFI_AggregateOrderSensitivity {
    Insensitive,
    HardRequirement,
+    SoftRequirement,


Technically speaking this is an FFI API change -- I am not sure what implication that has (note this would not be released until DataFusion 50 anyways).

cc @timsaucer -- I wonder if we should gather up the FFI breaking changes into their own PR / more carefully schedule such breakages

This new thing doesn't need to be supported in the FFI.
However, i didn't know how to avoid adding this.
When looking at impl From<AggregateOrderSensitivity> for FFI_AggregateOrderSensitivity i am under impression that this particular part of FFI API is tightly coupled with the datafusion core, so in this particular place it cannot deliver API stability without inhibiting datafusion core progress. The necessary solution might be replacing this From with TryFrom, same with (impl From<FFI_AggregateOrderSensitivity> for AggregateOrderSensitivity).

My understanding is that, by tightly coupling AggregateOrderSensitivity and FFI_AggregateOrderSensitivity code author chose to let these enums naturally evolve over time, considering this not a breaking change, or an acceptable breaking change.

datafusion/functions-aggregate-common/src/order.rs

findepi · 2025-07-20T07:13:40Z

I re-read the comments on this PR and I wonder if you tried implementing the solution suggested by @ozankabak in #16625 (comment):

If getting this over the finish line in the short term is important to you, I think a reasonable step forward is to take a look at what solving it entails: There are basically two steps: (1) AggregateExec needs to consult the UDAF definition as it forms its required input ordering (this is the easy step), (2) The enforce sorting rule needs to address the case when there is already a sort with a prefix of a soft requirement, and just extend the sort keys (this is the harder step).
I think the solution will ultimately be small in terms of LOC changes, but the second step will require some thinking.

This PR seems similar except that it adds the SoftRequirement stage as well. If we could avoid the need for SoftRequirement I think this PR would be pretty great

#16625 (comment)
#16625 (comment)

I am not convinced it's actually desired to make the existing Beneficial behave like the new SoftRequirement does though.

Consider first_value / top_1 function, which is O(n). It benefits from the input being sorted, becoming O(1) in such case. It does not, however, want to impose input sorting, as that would be O(n log n).
This is different from e.g. sorted array_agg, which needs to sort either way.

One can argue that sorting for "first_value order by a, b" can be added only if there already is some sorting on a. It's not a bad argument, but note that within the group of least a value, it's still O(group size) -> O(group size * log group size) change.

Thus, it seems optimal to be able to distinguish functions that

benefit from ordering, e.g. first_value (Beneficial)
those are simply better if input can be ordered, e.g. ordered array_agg in this PR (SoftRequirement being added in this PR)
those which cannot execute if input is not pre-ordered, e.g. ordered array_agg before this PR (HardRequirement)
those which do not care about input ordering (Insensitive)

Thus it makes sense for the AggregateOrderSensitivity to have 4 options.

alamb · 2025-07-24T12:35:58Z

Thus, it seems optimal to be able to distinguish functions that

benefit from ordering, e.g. first_value (Beneficial)

those are simply better if input can be ordered, e.g. ordered array_agg in this PR (SoftRequirement being added in this PR)

those which cannot execute if input is not pre-ordered, e.g. ordered array_agg before this PR (HardRequirement)

those which do not care about input ordering (Insensitive)

I see -- my confusion stemmed from that I understand the theoretical difference between

"needs the ordering to correctly run" (HardRequirement)
"can take advantage of the ordering" (Beneficial)
"is always better to use sorting" (SoftRequirement)

What I think I am confused about is what is the practical difference between HardRequirement and SoftRequiement -- specifically, what different plan / decision will be made.

I believe the result is that DataFusion will attempt to sort the input according to the requirement, but if it can not (because it will cause a conflict with another aggregate function's requirements, for example) then the aggregate can still be run with the different ordering

alamb · 2025-07-24T12:49:55Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing findepi/two-ordered-array-aggs (5f00ec4) to 6870cc1 diff
BENCH_NAME=array_agg
BENCH_COMMAND=cargo bench --bench array_agg
BENCH_FILTER=
BENCH_BRANCH_NAME=findepi_two-ordered-array-aggs
Results will be posted here when complete

alamb

I went through this PR carefully and it makes sense to me -- thank you @findepi

I still don't really understand the implications of the FFI change, but maybe they are ok.

I also kicked off a benchmark for array_agg to ensure this PR doesn't introduce regressions, and as long as that is good I think this PR is good to merge

datafusion/functions-aggregate-common/src/order.rs

datafusion/physical-plan/src/aggregates/mod.rs

alamb · 2025-07-24T12:58:37Z

🤖: Benchmark completed

Details

group                                                                              findepi_two-ordered-array-aggs         main
-----                                                                              ------------------------------         ----
array_agg i64 merge_batch 30% nulls, 0% of nulls point to a zero length array      1.04    568.6±0.80µs        ? ?/sec    1.00    545.6±1.22µs        ? ?/sec
array_agg i64 merge_batch 30% nulls, 100% of nulls point to a zero length array    1.00      6.0±0.02µs        ? ?/sec    1.00      6.0±0.02µs        ? ?/sec
array_agg i64 merge_batch 30% nulls, 50% of nulls point to a zero length array     1.01    548.0±1.38µs        ? ?/sec    1.00    545.2±1.12µs        ? ?/sec
array_agg i64 merge_batch 30% nulls, 90% of nulls point to a zero length array     1.00    547.4±1.17µs        ? ?/sec    1.00    547.1±1.18µs        ? ?/sec
array_agg i64 merge_batch 30% nulls, 99% of nulls point to a zero length array     1.00    547.0±0.96µs        ? ?/sec    1.00    545.3±0.78µs        ? ?/sec
array_agg i64 merge_batch 70% nulls, 0% of nulls point to a zero length array      1.00    243.2±1.72µs        ? ?/sec    1.00    242.6±0.33µs        ? ?/sec
array_agg i64 merge_batch 70% nulls, 100% of nulls point to a zero length array    1.01      5.9±0.02µs        ? ?/sec    1.00      5.9±0.03µs        ? ?/sec
array_agg i64 merge_batch 70% nulls, 50% of nulls point to a zero length array     1.01    245.0±0.42µs        ? ?/sec    1.00    243.5±0.33µs        ? ?/sec
array_agg i64 merge_batch 70% nulls, 90% of nulls point to a zero length array     1.00    242.8±2.88µs        ? ?/sec    1.00    243.6±0.35µs        ? ?/sec
array_agg i64 merge_batch 70% nulls, 99% of nulls point to a zero length array     1.00    243.0±1.72µs        ? ?/sec    1.00    243.8±0.42µs        ? ?/sec
array_agg i64 merge_batch all nulls, 100% of nulls point to a zero length array    1.04     90.5±0.12ns        ? ?/sec    1.00     87.4±0.22ns        ? ?/sec
array_agg i64 merge_batch all nulls, 90% of nulls point to a zero length array     1.00     86.9±0.10ns        ? ?/sec    1.08     94.1±1.21ns        ? ?/sec
array_agg i64 merge_batch no nulls                                                 1.04    105.3±0.18ns        ? ?/sec    1.00    101.0±0.09ns        ? ?/sec

ozankabak · 2025-07-24T13:03:03Z

I didn't have time to dig deeper on this, so we can go ahead with the merge. We can unify Beneficial and SoftRequirement later in the future if we find a good way to do so.

alamb · 2025-07-24T15:55:05Z

🤖: Benchmark completed

IN my opinion, the benchmark results show no meaningful difference

findepi · 2025-07-24T18:00:23Z

What I think I am confused about is what is the practical difference between HardRequirement and SoftRequiement -- specifically, what different plan / decision will be made.

I believe the result is that DataFusion will attempt to sort the input according to the requirement, but if it can not (because it will cause a conflict with another aggregate function's requirements, for example) then the aggregate can still be run with the different ordering

HardRequirement -- if the input sort requirement is not satisfied, then query planning errors. This is the case when aggregate functions want the input to be sorted, but lack sorting capability if it is not.
- ideally this should never happen. We might want to work towards removing HardRequirement in the future.
SoftRequiement -- if the input is sorted, great, if it's not, also fine

Co-authored-by: Andrew Lamb <[email protected]>

findepi · 2025-07-24T18:07:29Z

Thank you @alamb for review and @ozankabak for feedback.
Updated per Andrew's review comments.

alamb · 2025-07-28T19:14:40Z

Onwards. Thanks @findepi and @ozankabak

* Validate states shape in merge_batch Due to `..` in the pattern, the `OrderSensitiveArrayAggAccumulator::merge_batch` did not validate it's not receiving additional states columns it ignores. Update the code to check number of inputs. * Support multiple ordered array_agg Before the change, `array_agg` with ordering would depend on input being ordered. As a result, it was impossible to do two or more `array_agg(x ORDER BY ...)` with incompatible ordering. This change moves ordering responsibility into `OrderSensitiveArrayAggAccumulator`. When input is pre-ordered (beneficial ordering), no additional work is done. However, when it's not, `array_agg` accumulator will order the data on its own. * Generate sorts based on aggregations soft requirements The sorting consideration before aggregations did respect only ordered aggregation functions with `AggregateOrderSensitivity::HardRequirement`. This change includes sorting expectations from `AggregateOrderSensitivity::Beneficial` functions. When beneficial ordered function requirements are not satisfied, no error is raised, they are considered in the second pass only. * Fix reversing first_value, last_value Upon reversing, a schema and field mismatch would happen. * Revert "Fix reversing first_value, last_value" This reverts commit 9b7e94d. * sort array_agg input the old way whenever possible * revert some now unnecessary change * Improve doc for SoftRequiement Co-authored-by: Andrew Lamb <[email protected]> * Add comment for include_soft_requirement Co-authored-by: Andrew Lamb <[email protected]> * Document include_soft_requirement param * fmt * doc fix --------- Co-authored-by: Andrew Lamb <[email protected]>

findepi · 2025-08-07T09:29:47Z

datafusion/functions-aggregate/src/array_agg.rs

 /// ARRAY_AGG aggregate expression
 pub struct ArrayAgg {
    signature: Signature,
+    is_input_pre_ordered: bool,


Adding this new field should trigger adding equals/hash_value implementations.
Being fixed in #17065

Validate states shape in merge_batch

c32eb2f

Due to `..` in the pattern, the `OrderSensitiveArrayAggAccumulator::merge_batch` did not validate it's not receiving additional states columns it ignores. Update the code to check number of inputs.

github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jun 30, 2025

findepi force-pushed the findepi/two-ordered-array-aggs branch from 1264858 to cf4d8ae Compare June 30, 2025 14:06

github-actions bot added the physical-plan Changes to the physical-plan crate label Jun 30, 2025

findepi added enhancement New feature or request and removed physical-plan Changes to the physical-plan crate labels Jun 30, 2025

findepi changed the title ~~Support multiple ordered array_agg~~ Support multiple ordered array_agg aggregations Jun 30, 2025

ozankabak reviewed Jun 30, 2025

View reviewed changes

findepi mentioned this pull request Jun 30, 2025

Improve field naming in first_value, last_value implementation #16631

Merged

alamb reviewed Jun 30, 2025

View reviewed changes

github-actions bot added physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate labels Jul 1, 2025

findepi force-pushed the findepi/two-ordered-array-aggs branch from 8f2eab7 to 4cd992c Compare July 1, 2025 08:46

github-actions bot removed the physical-expr Changes to the physical-expr crates label Jul 1, 2025

findepi commented Jul 1, 2025

View reviewed changes

Fix reversing first_value, last_value

9b7e94d

Upon reversing, a schema and field mismatch would happen.

github-actions bot added the physical-expr Changes to the physical-expr crates label Jul 1, 2025

ozankabak reviewed Jul 1, 2025

View reviewed changes

alamb reviewed Jul 1, 2025

View reviewed changes

adriangb mentioned this pull request Jul 1, 2025

Fix parquet filter_pushdown: respect parquet filter pushdown config in scan #16646

Merged

findepi mentioned this pull request Jul 2, 2025

cherry pick the array agg fix to support multiple sorted aggregations sdf-labs/datafusion#97

Merged

Merge remote-tracking branch 'upstream/main' into findepi/two-ordered…

a551e7d

…-array-aggs

alamb added the api change Changes the API exposed to users of the crate label Jul 19, 2025

alamb reviewed Jul 19, 2025

View reviewed changes

alamb mentioned this pull request Jul 24, 2025

Discussion: DataFusion Improvement Proposal (DIPs) Process? #16886

Open

alamb changed the title ~~Support multiple ordered array_agg aggregations~~ Support multiple ordered array_agg aggregations Jul 24, 2025

alamb approved these changes Jul 24, 2025

View reviewed changes

datafusion/functions-aggregate-common/src/order.rs Outdated Show resolved Hide resolved

datafusion/physical-plan/src/aggregates/mod.rs Show resolved Hide resolved

datafusion/physical-plan/src/aggregates/mod.rs Show resolved Hide resolved

findepi and others added 4 commits July 24, 2025 20:01

Improve doc for SoftRequiement

134da5a

Co-authored-by: Andrew Lamb <[email protected]>

Add comment for include_soft_requirement

1420f8d

Co-authored-by: Andrew Lamb <[email protected]>

Document include_soft_requirement param

5c1bce9

fmt

8a1abe8

doc fix

a1031e0

alamb merged commit e1a5cdf into apache:main Jul 28, 2025
27 checks passed

nuno-faria mentioned this pull request Aug 1, 2025

string_agg does not respect ORDER BY on 49.0.0 #17011

Closed

findepi deleted the findepi/two-ordered-array-aggs branch August 4, 2025 10:27

nuno-faria mentioned this pull request Aug 6, 2025

[branch-49] fix: string_agg not respecting ORDER BY #17058

Merged

findepi mentioned this pull request Aug 7, 2025

[EPIC] ScalarUDFImpl::equals default implementation is error-prone #16677

Closed

findepi commented Aug 7, 2025

View reviewed changes

alamb mentioned this pull request Sep 29, 2025

A collection of array_agg improvements and issues #17829

Open

6 tasks

		DataFusion error: Internal error: Input field name last_value(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST] does not match with the projection expression first_value(sales_global.amount) ORDER BY [sales_global.amount ASC NULLS LAST].
		This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Support multiple ordered array_agg aggregations #16625

Support multiple ordered array_agg aggregations #16625

Uh oh!

Conversation

findepi commented Jun 30, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

ozankabak left a comment

Choose a reason for hiding this comment

Uh oh!

findepi commented Jun 30, 2025

Uh oh!

findepi commented Jun 30, 2025

Uh oh!

ozankabak commented Jun 30, 2025

Uh oh!

findepi commented Jun 30, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ozankabak commented Jul 1, 2025

Uh oh!

ozankabak commented Jul 3, 2025

Uh oh!

findepi commented Jul 16, 2025

Uh oh!

alamb commented Jul 18, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

findepi commented Jul 20, 2025

Uh oh!

alamb commented Jul 24, 2025

Uh oh!

alamb commented Jul 24, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alamb commented Jul 24, 2025

Uh oh!

Support multiple ordered `array_agg` aggregations #16625

Support multiple ordered `array_agg` aggregations #16625