-
Notifications
You must be signed in to change notification settings - Fork 1.9k
CSE shorthand alias #10868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
CSE shorthand alias #10868
Changes from 1 commit
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
a3001ff
initial change
MohamedAbdeen21 0a1e1ce
test renaming
MohamedAbdeen21 7184263
use counter instead of indexmap
MohamedAbdeen21 a4fceb5
order slt tests
MohamedAbdeen21 00e5a05
change cse tests
MohamedAbdeen21 ae5e8b4
restore slt tests
MohamedAbdeen21 19d69e2
fix slt test
MohamedAbdeen21 eef86f9
formatting
MohamedAbdeen21 72e16a4
ensure no alias collision
MohamedAbdeen21 2ee4d9a
keep original alias numbers for collision
MohamedAbdeen21 f76087c
ensure no collision in aggregate cse
MohamedAbdeen21 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -171,9 +171,13 @@ impl CommonSubexprEliminate { | |
| input.schema().iter().for_each(|(qualifier, field)| { | ||
| let name = field.name(); | ||
| if name.starts_with('#') { | ||
| let index = name.trim_start_matches('#').parse::<usize>().unwrap_or(1); | ||
| let expr = Expr::from((qualifier, field)); | ||
| common_exprs.insert(name.clone(), (expr, index)); | ||
| match name.trim_start_matches('#').parse::<usize>() { | ||
| Ok(index) => { | ||
| let expr = Expr::from((qualifier, field)); | ||
| common_exprs.insert(name.clone(), (expr, index)); | ||
| } | ||
| Err(_) => (), // probably user-assigned alias, skip if not numeric | ||
| } | ||
| } | ||
| }); | ||
|
|
||
|
|
@@ -342,7 +346,7 @@ impl CommonSubexprEliminate { | |
| Aggregate::try_new(Arc::new(new_input), new_group_expr, new_aggr_expr) | ||
| .map(LogicalPlan::Aggregate) | ||
| } else { | ||
| let mut expr_number = common_exprs.len(); | ||
| let mut expr_number = common_exprs.values().map(|t| t.1).max().unwrap_or(0); | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This needs a test case |
||
|
|
||
| let mut agg_exprs = common_exprs | ||
| .into_iter() | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should suppose that all extracted aliases remain in the plan:
build_common_expr_project_plan()are out of sync...IMO the best thing we can do is to choose a unique aliases for a common expressions in
CommonSubexprRewriterwhen we found the expression and store the alias incommon_exprstogether with the expression. In that case we don't need to deal with index sync issues and don't get plans with unnecessary aliases like here: https://github.com/apache/datafusion/pull/10868/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0R1403There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both issues don't affect correctness.
One thing I'd like to point out is that adding unused columns (all input's columns) in intermediate projection is the behavior of current CSE, it's not introduced in this PR. You can try copying the new test and running it against main. You'll get this output.
Extra projections are removed by other rules, so the final plan doesn't contain these projections.
Also, you may have noticed that extra projections make the aliases "out-of-sync" and to be honest I don't mind the
#2instead of#1(as long as it's not something ridiculous like#1023for example), and I don't see a way to fix that without patching some hacky global state/counter or asking other rules to reuse aliases when removing the extra projections.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what I meant by "idexes go out of sync" is that if your modified CSE rule runs on a plan that we got in the 3rd step (i.e. there is no
#1in the plan) e.g.:then it produces an incorrect plan:
This is because you inject
#2intocommon_exprs, but you don't inject it toexpr_stats(and others).IMO modifying
common_exprsis hacky if you don't do it inCommonSubexprRewriter, that's why I suggested the solution in my previous comment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed index usage; now we keep the original alias inside the
common_exprs.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this starts to look as the suggested because we assign the unique aliases in
CommonSubexprRewriterand store it incommon_exprstogether with the common expression.But why do you still inject previous
#aliases tocommon_exprs? I think you just need to find the biggest one here and pass that number toCommonSubexprRewriterand simply start assigning new#aliases inf_down()from that number + 1.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a solution that can produce a unique alias fast. There is no problem with having gaps if we can do it constant time (vs. no gaps with linear time to the number of common expressions).
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n is usually really small. I don't think this is a big performance hit, and so filling the gaps is a good tradeoff IMO
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why do you want to fill the gaps? These are artifical aliases so having consecutive numbers has no use, all that matter is they are short, unique and easy to read. Also, if you don't inject anything into
common_exprsthen the pontless#1 AS #1aliases won't get added.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I don't think I'll be able to do that anytime soon.
If that's the only remaining issue, I can mark the PR as ready and a maintainer can push that change.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do it and let me try to open a PR with the fix to your PR tomorrow or during the weekend.