-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix push_down_projection through a distinct #4849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thank you @Jefffrey
| )?; | ||
| from_plan(plan, &plan.expressions(), &[child]) | ||
| } | ||
| // at a distinct, all columns are required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| let table_scan = test_table_scan()?; | ||
|
|
||
| let plan = LogicalPlanBuilder::from(table_scan) | ||
| .project(vec![col("a"), col("b")])? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
cc @jackwener |
jackwener
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me, thanks @Jefffrey
|
Benchmark runs are scheduled for baseline = ceff6cb and contender = c4f4dff. c4f4dff is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4667
Rationale for this change
During PushDownProjection optimization, it does not properly account for distinct, as it will attempt to push directly through the distinct, from input plan:
Will generate output plan:
Where it has inadvertently removed the
data.dataprojection during the push down and hence leading to incorrect distinct behaviour.What changes are included in this PR?
Properly handle distincts during recursion in PushDownProjection, to 'restart' the traversal with all required columns (derived from schema of the distinct) rather than using the input required columns.
After fix, the above plan output will instead be:
Are these changes tested?
New unit test in push_down_projection
Are there any user-facing changes?