Skip to content

Conversation

@findepi
Copy link
Member

@findepi findepi commented Jun 30, 2025

Which issue does this PR close?

None. #8582 is related.

Rationale for this change

Before the change, array_agg with ordering would depend on input being
ordered. As a result, it was impossible to do two or more array_agg(x ORDER BY ...) with incompatible ordering.

What changes are included in this PR?

This change moves ordering
responsibility into OrderSensitiveArrayAggAccumulator. When input is
pre-ordered (beneficial ordering), no additional work is done. However,
when it's not, array_agg accumulator will order the data on its own.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

Due to `..` in the pattern, the
`OrderSensitiveArrayAggAccumulator::merge_batch` did not validate it's
not receiving additional states columns it ignores. Update the code to
check number of inputs.
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jun 30, 2025
Before the change, `array_agg` with ordering would depend on input being
ordered. As a result, it was impossible to do two or more `array_agg(x
ORDER BY ...)` with incompatible ordering. This change moves ordering
responsibility into `OrderSensitiveArrayAggAccumulator`. When input is
pre-ordered (beneficial ordering), no additional work is done. However,
when it's not, `array_agg` accumulator will order the data on its own.
@findepi findepi force-pushed the findepi/two-ordered-array-aggs branch from 1264858 to cf4d8ae Compare June 30, 2025 14:06
@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jun 30, 2025
@findepi findepi added enhancement New feature or request and removed physical-plan Changes to the physical-plan crate labels Jun 30, 2025
@findepi findepi changed the title Support multiple ordered array_agg Support multiple ordered array_agg aggregations Jun 30, 2025
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments:

  • FIRST_VALUE and LAST_VALUE implementations use requirement_satisfied and with_requirement_satisfied names for this thing, IMO it would be a good idea to follow suit here for consistency/readability purposes.
  • There are some changes where the planner was able to get away with appending a key to an already-existing sort, and operate the accumulator in an efficient mode. This PR loses that. Good news is that we already have the mechanism to address this in place (i.e. OrderingRequirements) and the solution is not hard. Aggregations involving functions that benefit from existing ordering should return OrderingRequirements::Soft, and the enforce_sorting rule should leverage any already existing sort (if present).

@findepi
Copy link
Member Author

findepi commented Jun 30, 2025

thanks for your review @ozankabak

  • FIRST_VALUE and LAST_VALUE implementations use requirement_satisfied and with_requirement_satisfied names for this thing, IMO it would be a good idea to follow suit here for consistency/readability purposes.

will align for array_agg

i am ll for consistency. do you think we could update AggregateUDFImpl::with_beneficial_ordering(beneficial_ordering bool) signature to follow the desired naming?

  • There are some changes where the planner was able to get away with appending a key to an already-existing sort, and operate the accumulator in an efficient mode. This PR loses that.

Yes, i noticed the PR drops some redundant sorts, which is likely quite a big improvement for aggregations with GROUP BY. Am i reading this correctly? I agree that in some rare situations those sorts are still the optimal way to go. However, I don't see a way for UDAF to declare OrderingRequirements::Soft. Is this a hard blocker for this PR, or can we avoid scope creep?

@findepi
Copy link
Member Author

findepi commented Jun 30, 2025

will align for array_agg

Sorry, with requirement_satisfied i am afraid it's easy to not to know what exact requirement is guaranteed to be satisfied. I propose that we change the FIRST_VALUE and LAST_VALUE implementations. #16631

@ozankabak
Copy link
Contributor

Sorry, with requirement_satisfied i am afraid it's easy to not to know what exact requirement is guaranteed to be satisfied. I propose that we change the FIRST_VALUE and LAST_VALUE implementations. #16631

I don't think it matters and I think the name was already kind of obvious, but since you already opened a PR for it I went ahead and approved it.

Regarding your question about plan changes, I don't think we have enough information to call it scope creep -- the PR adds a new capability that we were working towards for a while, but also loses something that already exists. That was actually the main reason why we didn't take the plunge just yet as we worked on this over the past few months.

If getting this over the finish line in the short term is important to you, I think a reasonable step forward is to take a look at what solving it entails: There are basically two steps: (1) AggregateExec needs to consult the UDAF definition as it forms its required input ordering (this is the easy step), (2) The enforce sorting rule needs to address the case when there is already a sort with a prefix of a soft requirement, and just extend the sort keys (this is the harder step).

I think the solution will ultimately be small in terms of LOC changes, but the second step will require some thinking. If making an attempt at solving reveals that the problem has challenges that we don't foresee now, we can reconsider whether we want to accept the plan changes and open an issue to track the work that I described in the above paragraph.

Thanks

@findepi
Copy link
Member Author

findepi commented Jun 30, 2025

I think global array_agg is not a very interesting scenario and a grouped array_agg no longer requires global sorting.
Can we agree the latter is an improvement? Perhaps a big once, since global sorting requires single-threadedness.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @findepi -- the idea makes sense to me

My only concern is switching to use sort of ScalarValue rather than arrays (in SortExec) as it might be quite slower

There is a current related outstanding PR from @sfluor :

I think @rluvaton is familar with ths code too. Perhaps he has some time to review as well?

04)------SortExec: expr=[c2@1 DESC, c3@2 ASC NULLS LAST], preserve_partitioning=[true]
05)--------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
06)----------DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/aggregate_agg_multi_order.csv]]}, projection=[c1, c2, c3], file_type=csv, has_header=true
04)------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should benchmark this -- sorting by ScalarValue is likely a lot less efficient than using the fast array sort / etc that is done in SortExec

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for global aggregation -- agreed. but then, a global array_agg aggregation cannot feasible operate on large amounts of data, can it? (or rather: it can, but that's unlikely a common scenario)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb i pushed some changes following @ozankabak 's suggestions and the plans have (expectedly) changed. Can you take another look?

The sorting consideration before aggregations did respect only ordered
aggregation functions with `AggregateOrderSensitivity::HardRequirement`.
This change includes sorting expectations from
`AggregateOrderSensitivity::Beneficial` functions. When beneficial
ordered function requirements are not satisfied, no error is raised,
they are considered in the second pass only.
@github-actions github-actions bot added physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate labels Jul 1, 2025
@findepi findepi force-pushed the findepi/two-ordered-array-aggs branch from 8f2eab7 to 4cd992c Compare July 1, 2025 08:46
@github-actions github-actions bot removed the physical-expr Changes to the physical-expr crates label Jul 1, 2025
Comment on lines 2759 to 2760
DataFusion error: Internal error: Input field name last_value(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST] does not match with the projection expression first_value(sales_global.amount) ORDER BY [sales_global.amount ASC NULLS LAST].
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking beneficial ordering triggered an error around reversing first_value / last_value.
Fix coming.

Upon reversing, a schema and field mismatch would happen.
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jul 1, 2025
ReversedUDAF::NotSupported => None,
ReversedUDAF::Identical => Some(self.clone()),
ReversedUDAF::Reversed(reverse_udf) => {
let mut name = self.name().to_string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing this may have other unintended effects. I will request some more eyes on this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
I agree this code was deliberate & nice. I hope we don't parse those names though.
If there is a better solution to agg reverse causing failures (#16625 (comment)), let me know. I can also drop this fix, I don't like it too.

Alternatively to the fix, I can block reversing for beneficial functions and thus hide the problem for now. Would it be preferred for this PR?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another quick look --- thank you @findepi

I am worried about the unintended consequences of this change (mostly because I don't understand the code well enough to know what the invariants are / if we are breaking them).

Some of the plan changes definitely look wrong to me -- I am not sure if it is just a column naming thing or øf the expressions are actually wrong now 🤔

physical_plan
01)ProjectionExec: expr=[country@0 as country, array_agg(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST]@1 as amounts, first_value(sales_global.amount) ORDER BY [sales_global.amount ASC NULLS LAST]@2 as fv1, last_value(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST]@3 as fv2]
02)--AggregateExec: mode=Single, gby=[country@0 as country], aggr=[array_agg(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST], last_value(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST], last_value(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST]]
02)--AggregateExec: mode=Single, gby=[country@0 as country], aggr=[array_agg(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST], first_value(sales_global.amount) ORDER BY [sales_global.amount ASC NULLS LAST], last_value(sales_global.amount) ORDER BY [sales_global.amount DESC NULLS FIRST]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that certainly seems like an improvement

01)ProjectionExec: expr=[country@0 as country, first_value(sales_global.amount) ORDER BY [sales_global.ts DESC NULLS FIRST]@1 as fv1, last_value(sales_global.amount) ORDER BY [sales_global.ts DESC NULLS FIRST]@2 as lv1, sum(sales_global.amount) ORDER BY [sales_global.ts DESC NULLS FIRST]@3 as sum1]
02)--AggregateExec: mode=Single, gby=[country@0 as country], aggr=[first_value(sales_global.amount) ORDER BY [sales_global.ts DESC NULLS FIRST], last_value(sales_global.amount) ORDER BY [sales_global.ts DESC NULLS FIRST], sum(sales_global.amount) ORDER BY [sales_global.ts DESC NULLS FIRST]]
03)----DataSourceExec: partitions=1, partition_sizes=[1]
03)----SortExec: expr=[ts@1 DESC], preserve_partitioning=[false]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird -- the comments say this plan should have a SortExec but the plan that is checked in does not have one

01)AggregateExec: mode=Final, gby=[], aggr=[first_value(convert_first_last_table.c1) ORDER BY [convert_first_last_table.c3 DESC NULLS FIRST]]
02)--CoalescePartitionsExec
03)----AggregateExec: mode=Partial, gby=[], aggr=[last_value(convert_first_last_table.c1) ORDER BY [convert_first_last_table.c3 ASC NULLS LAST]]
03)----AggregateExec: mode=Partial, gby=[], aggr=[first_value(convert_first_last_table.c1) ORDER BY [convert_first_last_table.c3 DESC NULLS FIRST]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this makes it look like the plan is wrong:

  1. the query calls for first_value(c1 ORDER BY c3 desc) but the table is sorted by c3 ASC

I think internally the optimizer has rewritten first_value(c1 ORDER BY c3 desc) to last_value(c1 ORDER BY c3 ASC)

However this plan makes it look like that didn't happen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was renaming to address this, but the renaming did not exactly work -- #16625 (comment).

@ozankabak
Copy link
Contributor

Agreed, I think this will need some time to brew. As I said previously, I hope to get some more eyes on this in the short term (maybe early next week)

@ozankabak
Copy link
Contributor

I think we will find a solution that avoids this redundancy. Expect some feedback from me (or someone on my team) in a few days

@findepi
Copy link
Member Author

findepi commented Jul 16, 2025

@ozankabak @alamb can you please help me understand where you would want to go with this?

or maybe DF doesn't need to support ordered array_aggs (more than one in a query)?

@alamb
Copy link
Contributor

alamb commented Jul 18, 2025

@ozankabak @alamb can you please help me understand where you would want to go with this?

I think supporting multiple ordered array_agg aggregations makes sense to me; I have not had a chance to review this PR recently.

Is it ready for another review?

@alamb alamb added the api change Changes the API exposed to users of the crate label Jul 19, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this forward @findepi

I personally think supporting multiple sorted aggregates is a useful feature and we should work on it. My only real concern with the code currently in this PR is that it may breaking the FFI API (which is ideally supposed to be stable)

I re-read the comments on this PR and I wonder if you tried implementing the solution suggested by @ozankabak in #16625 (comment):

If getting this over the finish line in the short term is important to you, I think a reasonable step forward is to take a look at what solving it entails: There are basically two steps: (1) AggregateExec needs to consult the UDAF definition as it forms its required input ordering (this is the easy step), (2) The enforce sorting rule needs to address the case when there is already a sort with a prefix of a soft requirement, and just extend the sort keys (this is the harder step).

I think the solution will ultimately be small in terms of LOC changes, but the second step will require some thinking.

This PR seems similar except that it adds the SoftRequirement stage as well. If we could avoid the need for SoftRequirement I think this PR would be pretty great

pub enum FFI_AggregateOrderSensitivity {
Insensitive,
HardRequirement,
SoftRequirement,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking this is an FFI API change -- I am not sure what implication that has (note this would not be released until DataFusion 50 anyways).

cc @timsaucer -- I wonder if we should gather up the FFI breaking changes into their own PR / more carefully schedule such breakages

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new thing doesn't need to be supported in the FFI.
However, i didn't know how to avoid adding this.
When looking at impl From<AggregateOrderSensitivity> for FFI_AggregateOrderSensitivity i am under impression that this particular part of FFI API is tightly coupled with the datafusion core, so in this particular place it cannot deliver API stability without inhibiting datafusion core progress. The necessary solution might be replacing this From with TryFrom, same with (impl From<FFI_AggregateOrderSensitivity> for AggregateOrderSensitivity).

My understanding is that, by tightly coupling AggregateOrderSensitivity and FFI_AggregateOrderSensitivity code author chose to let these enums naturally evolve over time, considering this not a breaking change, or an acceptable breaking change.

@findepi
Copy link
Member Author

findepi commented Jul 20, 2025

I re-read the comments on this PR and I wonder if you tried implementing the solution suggested by @ozankabak in #16625 (comment):

If getting this over the finish line in the short term is important to you, I think a reasonable step forward is to take a look at what solving it entails: There are basically two steps: (1) AggregateExec needs to consult the UDAF definition as it forms its required input ordering (this is the easy step), (2) The enforce sorting rule needs to address the case when there is already a sort with a prefix of a soft requirement, and just extend the sort keys (this is the harder step).
I think the solution will ultimately be small in terms of LOC changes, but the second step will require some thinking.

This PR seems similar except that it adds the SoftRequirement stage as well. If we could avoid the need for SoftRequirement I think this PR would be pretty great

#16625 (comment)
#16625 (comment)

I am not convinced it's actually desired to make the existing Beneficial behave like the new SoftRequirement does though.

Consider first_value / top_1 function, which is O(n). It benefits from the input being sorted, becoming O(1) in such case. It does not, however, want to impose input sorting, as that would be O(n log n).
This is different from e.g. sorted array_agg, which needs to sort either way.

One can argue that sorting for "first_value order by a, b" can be added only if there already is some sorting on a. It's not a bad argument, but note that within the group of least a value, it's still O(group size) -> O(group size * log group size) change.

Thus, it seems optimal to be able to distinguish functions that

  • benefit from ordering, e.g. first_value (Beneficial)
  • those are simply better if input can be ordered, e.g. ordered array_agg in this PR (SoftRequirement being added in this PR)
  • those which cannot execute if input is not pre-ordered, e.g. ordered array_agg before this PR (HardRequirement)
  • those which do not care about input ordering (Insensitive)

Thus it makes sense for the AggregateOrderSensitivity to have 4 options.

@alamb
Copy link
Contributor

alamb commented Jul 24, 2025

Thus, it seems optimal to be able to distinguish functions that

  • benefit from ordering, e.g. first_value (Beneficial)
  • those are simply better if input can be ordered, e.g. ordered array_agg in this PR (SoftRequirement being added in this PR)
  • those which cannot execute if input is not pre-ordered, e.g. ordered array_agg before this PR (HardRequirement)
  • those which do not care about input ordering (Insensitive)

I see -- my confusion stemmed from that I understand the theoretical difference between

  1. "needs the ordering to correctly run" (HardRequirement)
  2. "can take advantage of the ordering" (Beneficial)
  3. "is always better to use sorting" (SoftRequirement)

What I think I am confused about is what is the practical difference between HardRequirement and SoftRequiement -- specifically, what different plan / decision will be made.

I believe the result is that DataFusion will attempt to sort the input according to the requirement, but if it can not (because it will cause a conflict with another aggregate function's requirements, for example) then the aggregate can still be run with the different ordering

@alamb alamb changed the title Support multiple ordered array_agg aggregations Support multiple ordered array_agg aggregations Jul 24, 2025
@alamb
Copy link
Contributor

alamb commented Jul 24, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing findepi/two-ordered-array-aggs (5f00ec4) to 6870cc1 diff
BENCH_NAME=array_agg
BENCH_COMMAND=cargo bench --bench array_agg
BENCH_FILTER=
BENCH_BRANCH_NAME=findepi_two-ordered-array-aggs
Results will be posted here when complete

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this PR carefully and it makes sense to me -- thank you @findepi

I still don't really understand the implications of the FFI change, but maybe they are ok.

I also kicked off a benchmark for array_agg to ensure this PR doesn't introduce regressions, and as long as that is good I think this PR is good to merge

@alamb
Copy link
Contributor

alamb commented Jul 24, 2025

🤖: Benchmark completed

Details

group                                                                              findepi_two-ordered-array-aggs         main
-----                                                                              ------------------------------         ----
array_agg i64 merge_batch 30% nulls, 0% of nulls point to a zero length array      1.04    568.6±0.80µs        ? ?/sec    1.00    545.6±1.22µs        ? ?/sec
array_agg i64 merge_batch 30% nulls, 100% of nulls point to a zero length array    1.00      6.0±0.02µs        ? ?/sec    1.00      6.0±0.02µs        ? ?/sec
array_agg i64 merge_batch 30% nulls, 50% of nulls point to a zero length array     1.01    548.0±1.38µs        ? ?/sec    1.00    545.2±1.12µs        ? ?/sec
array_agg i64 merge_batch 30% nulls, 90% of nulls point to a zero length array     1.00    547.4±1.17µs        ? ?/sec    1.00    547.1±1.18µs        ? ?/sec
array_agg i64 merge_batch 30% nulls, 99% of nulls point to a zero length array     1.00    547.0±0.96µs        ? ?/sec    1.00    545.3±0.78µs        ? ?/sec
array_agg i64 merge_batch 70% nulls, 0% of nulls point to a zero length array      1.00    243.2±1.72µs        ? ?/sec    1.00    242.6±0.33µs        ? ?/sec
array_agg i64 merge_batch 70% nulls, 100% of nulls point to a zero length array    1.01      5.9±0.02µs        ? ?/sec    1.00      5.9±0.03µs        ? ?/sec
array_agg i64 merge_batch 70% nulls, 50% of nulls point to a zero length array     1.01    245.0±0.42µs        ? ?/sec    1.00    243.5±0.33µs        ? ?/sec
array_agg i64 merge_batch 70% nulls, 90% of nulls point to a zero length array     1.00    242.8±2.88µs        ? ?/sec    1.00    243.6±0.35µs        ? ?/sec
array_agg i64 merge_batch 70% nulls, 99% of nulls point to a zero length array     1.00    243.0±1.72µs        ? ?/sec    1.00    243.8±0.42µs        ? ?/sec
array_agg i64 merge_batch all nulls, 100% of nulls point to a zero length array    1.04     90.5±0.12ns        ? ?/sec    1.00     87.4±0.22ns        ? ?/sec
array_agg i64 merge_batch all nulls, 90% of nulls point to a zero length array     1.00     86.9±0.10ns        ? ?/sec    1.08     94.1±1.21ns        ? ?/sec
array_agg i64 merge_batch no nulls                                                 1.04    105.3±0.18ns        ? ?/sec    1.00    101.0±0.09ns        ? ?/sec

@ozankabak
Copy link
Contributor

I didn't have time to dig deeper on this, so we can go ahead with the merge. We can unify Beneficial and SoftRequirement later in the future if we find a good way to do so.

@alamb
Copy link
Contributor

alamb commented Jul 24, 2025

🤖: Benchmark completed

IN my opinion, the benchmark results show no meaningful difference

@findepi
Copy link
Member Author

findepi commented Jul 24, 2025

What I think I am confused about is what is the practical difference between HardRequirement and SoftRequiement -- specifically, what different plan / decision will be made.

I believe the result is that DataFusion will attempt to sort the input according to the requirement, but if it can not (because it will cause a conflict with another aggregate function's requirements, for example) then the aggregate can still be run with the different ordering

  • HardRequirement -- if the input sort requirement is not satisfied, then query planning errors. This is the case when aggregate functions want the input to be sorted, but lack sorting capability if it is not.
    • ideally this should never happen. We might want to work towards removing HardRequirement in the future.
  • SoftRequiement -- if the input is sorted, great, if it's not, also fine

@findepi
Copy link
Member Author

findepi commented Jul 24, 2025

Thank you @alamb for review and @ozankabak for feedback.
Updated per Andrew's review comments.

@alamb alamb merged commit e1a5cdf into apache:main Jul 28, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 28, 2025

Onwards. Thanks @findepi and @ozankabak

@findepi findepi deleted the findepi/two-ordered-array-aggs branch August 4, 2025 10:27
Standing-Man pushed a commit to Standing-Man/datafusion that referenced this pull request Aug 4, 2025
* Validate states shape in merge_batch

Due to `..` in the pattern, the
`OrderSensitiveArrayAggAccumulator::merge_batch` did not validate it's
not receiving additional states columns it ignores. Update the code to
check number of inputs.

* Support multiple ordered array_agg

Before the change, `array_agg` with ordering would depend on input being
ordered. As a result, it was impossible to do two or more `array_agg(x
ORDER BY ...)` with incompatible ordering. This change moves ordering
responsibility into `OrderSensitiveArrayAggAccumulator`. When input is
pre-ordered (beneficial ordering), no additional work is done. However,
when it's not, `array_agg` accumulator will order the data on its own.

* Generate sorts based on aggregations soft requirements

The sorting consideration before aggregations did respect only ordered
aggregation functions with `AggregateOrderSensitivity::HardRequirement`.
This change includes sorting expectations from
`AggregateOrderSensitivity::Beneficial` functions. When beneficial
ordered function requirements are not satisfied, no error is raised,
they are considered in the second pass only.

* Fix reversing first_value, last_value

Upon reversing, a schema and field mismatch would happen.

* Revert "Fix reversing first_value, last_value"

This reverts commit 9b7e94d.

* sort array_agg input the old way whenever possible

* revert some now unnecessary change

* Improve doc for SoftRequiement

Co-authored-by: Andrew Lamb <[email protected]>

* Add comment for include_soft_requirement

Co-authored-by: Andrew Lamb <[email protected]>

* Document include_soft_requirement param

* fmt

* doc fix

---------

Co-authored-by: Andrew Lamb <[email protected]>
/// ARRAY_AGG aggregate expression
pub struct ArrayAgg {
signature: Signature,
is_input_pre_ordered: bool,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this new field should trigger adding equals/hash_value implementations.
Being fixed in #17065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate enhancement New feature or request ffi Changes to the ffi crate functions Changes to functions implementation physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants