-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Avoid Expr copies OptimizeProjection, 12% faster planning, encapsulate indicies
#10216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f120d80 to
9bd4636
Compare
| /// See [Self::index_of_column] for a version that returns an error if the | ||
| /// column is not found | ||
| pub fn maybe_index_of_column(&self, col: &Column) -> Option<usize> { | ||
| self.index_of_column_by_name(col.relation.as_ref(), &col.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hung me up for a while -- the original code uses flat_map which discards the Err if the column is not found, was not obvious to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much more clear, Thanks!
OptimizeProjectionExpr copies, encapsulate the handling of child indicies in OptimizeProjection
2434484 to
4d4c2bd
Compare
Expr copies, encapsulate the handling of child indicies in OptimizeProjectionExpr copies OptimizeProjection, 12% faster planning, encapsulate indicies
| ) -> Result<Option<LogicalPlan>> { | ||
| // `child_required_indices` stores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this documentation moved to RequiredIndices
| @@ -339,31 +318,35 @@ fn optimize_projections( | |||
| let left_len = join.left.schema().fields().len(); | |||
| let (left_req_indices, right_req_indices) = | |||
| split_join_requirements(left_len, indices, &join.join_type); | |||
| let exprs = plan.expressions(); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call copies all expressions in the plan
| @@ -123,12 +120,13 @@ fn optimize_projections( | |||
| // that appear in this plan's expressions to its child. All these | |||
| // operators benefit from "small" inputs, so the projection_beneficial | |||
| // flag is `true`. | |||
| let exprs = plan.expressions(); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, this was also copying all expressions
| @@ -137,13 +135,9 @@ fn optimize_projections( | |||
| // that appear in this plan's expressions to its child. These operators | |||
| // do not benefit from "small" inputs, so the projection_beneficial | |||
| // flag is `false`. | |||
| let exprs = plan.expressions(); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And another copy removed
| @@ -178,16 +170,12 @@ fn optimize_projections( | |||
| Make sure `.necessary_children_exprs` implementation of the `UserDefinedLogicalNode` is \ | |||
| consistent with actual children length for the node."); | |||
| } | |||
| // Expressions used by node. | |||
| let exprs = plan.expressions(); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another copy
| @@ -408,26 +392,6 @@ fn optimize_projections( | |||
| } | |||
| } | |||
|
|
|||
| /// This function applies the given function `f` to the projection indices | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of those code was moved into RequiredIndices as methods both to encapsulate the logic a bit more as well as to allow creation without copying
mustafasrepo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alamb. This PR is LGTM!. As well as improving performance, it improves maintainability also.
|
Thanks for the reiew @mustafasrepo -- I am slowly working through how to refactor this pass to avoid copies. I think a few more PRs and I'll have it |
Which issue does this PR close?
Part of #10209
Rationale for this change
This is part 1 of improving the performance of
OptimizeProjection(by less copying) in #10209.While trying to rework how
OptimizeProjection works, I found that the populationof child indices pretty much required calling
LogicalPlan::expressions()which results in an owed copy of all the plansExprsRemoving these copies themselves improves the situation (and I think may make the code easier to reasona about), and sets us up to stop copying the
LogicalPlannodes as a follow on PRWhat changes are included in this PR?
This PR encapsulates that logic for managing required indices into a new struct
RequiredIndicieswhich avoids some Expr copies (and sets up the rewrite to avoidLogicalPlancopies)Are these changes tested?
Covered by existing tests
Are there any user-facing changes?
no functional changes
performance benchmarks show an overall 14% improvements for tpch_all and tpcds_all, which is pretty neat
Details