Skip to content

Conversation

@mertak-synnada
Copy link

Which issue does this PR close?

Closes #14446.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Feb 5, 2025
03)----Aggregate: groupBy=[[aggregate_test_100.c3]], aggr=[[min(aggregate_test_100.c1)]]
04)------TableScan: aggregate_test_100 projection=[c1, c3]
physical_plan
01)GlobalLimitExec: skip=0, fetch=5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that SKIP is missing.

Copy link
Contributor

@zhuqi-lucas zhuqi-lucas Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the SKIP setting is not supported for all with_fetch operators, for example:

SortPreservingMergeExec also support fetch but not support setting skip, it's default to 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We default to use skip=0 for the SortPreservingMergeExec:

 1 => match self.fetch {
                Some(fetch) => {
                    let stream = self.input.execute(0, context)?;
                    debug!("Done getting stream for SortPreservingMergeExec::execute with 1 input with {fetch}");
                    Ok(Box::pin(LimitStream::new(
                        stream,
                        0,
                        Some(fetch),
                        BaselineMetrics::new(&self.metrics, partition),
                    )))
                }
impl LimitStream {
    pub fn new(
        input: SendableRecordBatchStream,
        skip: usize,
        fetch: Option<usize>,
        baseline_metrics: BaselineMetrics,
    ) -> Self {
        let schema = input.schema();
        Self {
            skip,
            fetch: fetch.unwrap_or(usize::MAX),
            input: Some(input),
            schema,
            baseline_metrics,
        }
    }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as @zhuqi-lucas mentioned, only Limit operators support skip, and limit_pushdown is adding a Limit operator if skip exists, so this is only affecting plans without skip.

Here's the query result with skip:

query TT
EXPLAIN SELECT DISTINCT c3, min(c1) FROM aggregate_test_100 group by c3 limit 5 offset 3;
----
logical_plan
01)Limit: skip=3, fetch=5
02)--Aggregate: groupBy=[[aggregate_test_100.c3, min(aggregate_test_100.c1)]], aggr=[[]]
03)----Aggregate: groupBy=[[aggregate_test_100.c3]], aggr=[[min(aggregate_test_100.c1)]]
04)------TableScan: aggregate_test_100 projection=[c1, c3]
physical_plan
01)GlobalLimitExec: skip=3, fetch=5
02)--CoalescePartitionsExec: fetch=8
03)----AggregateExec: mode=FinalPartitioned, gby=[c3@0 as c3, min(aggregate_test_100.c1)@1 as min(aggregate_test_100.c1)], aggr=[], lim=[8]
04)------CoalesceBatchesExec: target_batch_size=8192
05)--------RepartitionExec: partitioning=Hash([c3@0, min(aggregate_test_100.c1)@1], 4), input_partitions=4
06)----------AggregateExec: mode=Partial, gby=[c3@0 as c3, min(aggregate_test_100.c1)@1 as min(aggregate_test_100.c1)], aggr=[], lim=[8]
07)------------AggregateExec: mode=FinalPartitioned, gby=[c3@0 as c3], aggr=[min(aggregate_test_100.c1)]
08)--------------CoalesceBatchesExec: target_batch_size=8192
09)----------------RepartitionExec: partitioning=Hash([c3@0], 4), input_partitions=4
10)------------------AggregateExec: mode=Partial, gby=[c3@1 as c3], aggr=[min(aggregate_test_100.c1)]
11)--------------------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
12)----------------------CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1, c3], has_header=true

cx: &mut Context<'_>,
) -> Poll<Option<Self::Item>> {
let poll = self.inner.poll_next_unpin(cx);
let poll = self.limit_reached(poll);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

let Some(fetch) = self.fetch else { return poll };

May be we can move the following to above, only call limit_reached when

let Some(fetch) = self.fetch {
    poll = self.limit_reached(poll);
}

Copy link
Contributor

@zhuqi-lucas zhuqi-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

@berkaysynnada berkaysynnada merged commit 55730dc into apache:main Feb 6, 2025
25 checks passed
joroKr21 pushed a commit to coralogix/arrow-datafusion that referenced this pull request Jun 19, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned
joroKr21 pushed a commit to coralogix/arrow-datafusion that referenced this pull request Jun 19, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned
joroKr21 added a commit to coralogix/arrow-datafusion that referenced this pull request Jun 19, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned

Co-authored-by: mertak-synnada <[email protected]>
avantgardnerio pushed a commit to coralogix/arrow-datafusion that referenced this pull request Sep 23, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned

Co-authored-by: mertak-synnada <[email protected]>
avantgardnerio pushed a commit to coralogix/arrow-datafusion that referenced this pull request Oct 1, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned

Co-authored-by: mertak-synnada <[email protected]>
avantgardnerio pushed a commit to coralogix/arrow-datafusion that referenced this pull request Oct 15, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned

Co-authored-by: mertak-synnada <[email protected]>
Dandandan added a commit to coralogix/arrow-datafusion that referenced this pull request Nov 17, 2025
* Get working build

* Add pool_size method to MemoryPool (#218) (#230)

* Add pool_size method to MemoryPool

* Fix

* Fmt

Co-authored-by: Daniël Heres <[email protected]>

* Respect `IGNORE NULLS` flag in `ARRAY_AGG` (#260) (apache#15544) v48

* Hook for doing distributed `CollectLeft` joins (#269)

* Ignore writer shutdown error (#271)

* ignore writer shutdown error

* cargo check

* Fix  bug in `swap_hash_join` (#278)

* Try and fix swap_hash_join

* Only swap projections when join does not have projections

* just backport upstream fix

* remove println

* Support Duration in min/max agg functions (#283) (apache#15310) v47

* Support Duration in min/max agg functions

* Attempt to fix build

* Attempt to fix build - Fix chrono version

* Revert "Attempt to fix build - Fix chrono version"

This reverts commit fd76fe6.

* Revert "Attempt to fix build"

This reverts commit 9114b86.

---------

Co-authored-by: svranesevic <[email protected]>

* Fix panics in array_union (#287) (apache#15149) v48

* Drop rust-toolchain

* Fix panics in array_union

* Fix the chrono

* Backport `GroupsAccumulator` for Duration min/max agg (#288) (apache#15322) v47

* Fix array_sort for empty record batch (#290) (apache#15149) v48

* fix: rewrite fetch, skip of the Limit node in correct order (apache#14496) v46

* fix: rewrite fetch, skip of the Limit node in correct order

* style: fix clippy

* Support aliases in ConstEvaluator (apache#14734) (#281) v46

* Support aliases in ConstEvaluator (apache#14734)

Not sure why they are not supported. It seems that if we're not careful,
some transformations can introduce aliases nested inside other expressions.

* Format Cargo.toml

* Preserve the name of grouping sets in SimplifyExpressions (#282) (apache#14888) v46

Whenever we use `recompute_schema` or `with_exprs_and_inputs`,
this ensures that we obtain the same schema.

* Support Duration in min/max agg functions (#284) (apache#15310) v47

Co-authored-by: svranesevic <[email protected]>

* fix case_column_or_null with nullable when conditions (apache#13886) v45

* fix case_column_or_null with nullable when conditions

* improve sqllogictests for case_column_or_null

---------

Co-authored-by: zhangli20 <[email protected]>

* Fix casewhen (apache#14156) v45

* Cherry-pick topk limit pushdown fix (apache#14192) v45

* fix: FULL OUTER JOIN and LIMIT produces wrong results (apache#14338) v45

* fix: FULL OUTER JOIN and LIMIT produces wrong results

* Fix minor slt testing

* fix test

(cherry picked from commit ecc5694)

* Cherry-pick global limit fix (apache#14245) v45

* fix: Limits are not applied correctly (apache#14418) v46

* fix: Limits are not applied correctly

* Add easy fix

* Add fix

* Add slt testing

* Address comments

* Disable grouping set in CSE

* Fix spm + limit (apache#14569) v46

* prost 0.13 / fix parquet dep

* Delete unreliable checks

* Segfault in ByteGroupValueBuilder (#294) (apache#15968) v50

* test to demonstrate segfault in ByteGroupValueBuilder

* check for offset overflow

* clippy

(cherry picked from commit 5bdaeaf)

* Update arrow dependency to include rowid (#295)

* Update arrow version

* Feat: Add fetch to CoalescePartitionsExec (apache#14499) (#298) v46

* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned

Co-authored-by: mertak-synnada <[email protected]>

* Fix `CoalescePartitionsExec` proto serialization (apache#15824) (#299) v48

* add fetch to CoalescePartitionsExecNode

* gen proto code

* Add test

* fix

* fix build

* Fix test build

* remove comments

Co-authored-by: 张林伟 <[email protected]>

* Add JoinContext with JoinLeftData to TaskContext in HashJoinExec (#300)

* Add JoinContext with JoinLeftData to TaskContext in HashJoinExec

* Expose random state as const

* re-export ahash::RandomState

* JoinContext default impl

* Add debug log when setting join left data

* Update arrow version for not preserving dict_id (#303)

* Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream (apache#13995) (#302) v45

* Refactor spill handling in GroupedHashAggregateStream to use partial aggregate schema

* Implement aggregate functions with spill handling in tests

* Add tests for aggregate functions with and without spill handling

* Move test related imports into mod test

* Rename spill pool test functions for clarity and consistency

* Refactor aggregate function imports to use fully qualified paths

* Remove outdated comments regarding input batch schema for spilling in GroupedHashAggregateStream

* Update aggregate test to use AVG instead of MAX

* assert spill count

* Refactor partial aggregate schema creation to use create_schema function

* Refactor partial aggregation schema creation and remove redundant function

* Remove unused import of Schema from arrow::datatypes in row_hash.rs

* move spill pool testing for aggregate functions to physical-plan/src/aggregates

* Use Arc::clone for schema references in aggregate functions

(cherry picked from commit 81b50c4)

Co-authored-by: kosiew <[email protected]>

* Update tag

* Push limits past windows (#337) (apache#17347) v50

* Restore old method for DQE

* feat(optimizer): Enable filter pushdown on window functions (apache#14026) v45

* Avoid Aliased Window Expr Enter Unreachable Code (apache#14109) v45

(cherry picked from commit fda500a)

* Use `Expr::qualified_name()` and `Column::new()` to extract partition keys from window and aggregate operators (#355) (apache#17757) v51

* Update PR template to be relevant to our fork

* Make limit pushdown work for SortPreservingMergeExec (apache#17893) (#361)

* re-publicise functions DQE relies on

* Handle columns in with_new_exprs with a Join (apache#15055) (#384)

apache#15055

* handle columns in with_new_exprs with Join

* test doesn't return result

* take join from result

* clippy

* make test fallible

* accept any pair of expression for new_on in with_new_exprs for Join

* use with_capacity

Co-authored-by: delamarch3 <[email protected]>

---------

Co-authored-by: Georgi Krastev <[email protected]>
Co-authored-by: Daniël Heres <[email protected]>
Co-authored-by: Dan Harris <[email protected]>
Co-authored-by: Faiaz Sanaulla <[email protected]>
Co-authored-by: Sava Vranešević <[email protected]>
Co-authored-by: svranesevic <[email protected]>
Co-authored-by: Yingwen <[email protected]>
Co-authored-by: Zhang Li <[email protected]>
Co-authored-by: zhangli20 <[email protected]>
Co-authored-by: Aleksey Kirilishin <[email protected]>
Co-authored-by: xudong.w <[email protected]>
Co-authored-by: Qi Zhu <[email protected]>
Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: mertak-synnada <[email protected]>
Co-authored-by: 张林伟 <[email protected]>
Co-authored-by: kosiew <[email protected]>
Co-authored-by: nuno-faria <[email protected]>
Co-authored-by: Berkay Şahin <[email protected]>
Co-authored-by: Mason Hall <[email protected]>
Co-authored-by: delamarch3 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CoalescePartitionsExec fetch (limit) support

5 participants