-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Generic constant expression evaluation #1153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2fdadd9 to
55b6c1e
Compare
| .query_execution_start_time | ||
| .timestamp_nanos(), | ||
| ))), | ||
| Expr::ScalarFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rewrite of to_timestamp no longer needs to be special cased -- it now handled, along with other functions, by ConstEvaluator
| impl<'a> ExprRewriter for Simplifier<'a> { | ||
| /// rewrite the expression simplifying any constant expressions | ||
| fn mutate(&mut self, expr: Expr) -> Result<Expr> { | ||
| let new_expr = match expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code to handle true != true type expressions should also be handled by ConstEvaluator but it turns out there is a missing support for such degenerate expressions in the Expr evaluation code. I will fix this but plan to do so as a follow on PR to keep this PR reasonable to review
| .build() | ||
| .unwrap(); | ||
|
|
||
| let expected = "Projection: totimestamp(Utf8(\"I\'M NOT A TIMESTAMP\"))\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously such plans would error at runtime, now they error at plan time which I think is a reasonable change in behavior (earlier errors are better errors?)
It might have unfortunate edge cases where a plan like SELECT to_timestamp('FOOO') from table_with_no_rows previously would error but now it will error.
However, I don't think this is an important difference
| } | ||
|
|
||
| #[test] | ||
| fn to_timestamp_expr_no_arg() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this plan would panic at runtime (because to_timestamp expects a single argument). Now it panics at plan time. Since to_timestamp() is illformed I didn't think this was a valuable test so I removed it
| } | ||
|
|
||
| impl ExprRewriter for ConstEvaluator { | ||
| fn pre_visit(&mut self, expr: &Expr) -> Result<RewriteRecursion> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the guts of the logic for expression evaluation. I am quite pleased with how concise it is
datafusion/src/optimizer/utils.rs
Outdated
| let ctx_state = ExecutionContextState::new(); | ||
| let input_schema = DFSchema::empty(); | ||
|
|
||
| // The dummy column name uis used doesn't matter as only scalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes directly from @pjmore ❤️
datafusion/src/optimizer/utils.rs
Outdated
| Expr::Alias(..) => false, | ||
| Expr::AggregateFunction { .. } => false, | ||
| Expr::AggregateUDF { .. } => false, | ||
| // TODO handle in constantant propagator pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will file follow on ticket
| // parenthesization matters: can't rewrite | ||
| // (rand() + 1) + 2 --> (rand() + 1) + 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting one -- and something perhaps the Simplifer should do (use associative rules to rewrite rand() + 1) + 2 into rand() + (1 + 2) and then let the evaluator constant fold it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked this improvement in #1162
| }; | ||
| use datafusion::{execution::context::ExecutionContext, physical_plan::displayable}; | ||
|
|
||
| /// A macro to assert that one string is contained within another with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved into test_utils
|
|
||
| // TODO constant folder hould be able to run again and fold | ||
| // this whole thing down | ||
| // TODO add ticket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // TODO add ticket | |
| // https://github.com/apache/arrow-datafusion/issues/1160 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we commit this issue into the PR branch?
|
FYI @rdettai @pjmore @Dandandan @houqp this PR is ready for review |
|
@rdettai made a good point on my initial implementation about how stable functions should be evaluatable in a constant context. Maybe the constant folding optimizer should have a flag to allow for evaluating stable functions in a constant context if provided execution properties? This might be useful even if it's just to make the option available to the partition pruning logic. Doing this would allow the special case for now in the expression simplifier to be removed. While this doesn't matter much at the moment because it is the only stable function if more stable built-ins are added this might start mattering. Besides the question of constant evaluating stable functions this looks good to me, |
| fn volatility_ok(volatility: Volatility) -> bool { | ||
| match volatility { | ||
| Volatility::Immutable => true, | ||
| // To evaluate stable functions, need ExecutionProps, see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason not to provide ExecutionProps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My rationale was to minimize the changes to datafusion/src/optimizer/constant_folding.rs and keep this PR smaller to review. I agree it makes sense to do now() constant folding in the simplifier.
Filed #1175 to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #1176 (built on this PR)
| let plan = LogicalPlanBuilder::from(table_scan) | ||
| .filter( | ||
| now_expr() | ||
| .lt(cast_to_int64_expr(to_timestamp_expr(ts_string)) + lit(50000)), | ||
| ) | ||
| .unwrap() | ||
| .build() | ||
| .unwrap(); | ||
|
|
||
| // Note that constant folder should be able to run again and fold | ||
| // this whole expression down to a single constant; | ||
| // https://github.com/apache/arrow-datafusion/issues/1160 | ||
| let expected = "Filter: TimestampNanosecond(1599566400000000000) < CAST(totimestamp(Utf8(\"2020-09-08T12:05:00+00:00\")) AS Int64) + Int32(50000)\ | ||
| \n TableScan: test projection=None"; | ||
| let actual = get_optimized_plan_formatted(&plan, &time); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the ConstEvaluator supposed to solve this in one run as the expression in constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind, let's have this conversation in #1160 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conversation in #1160 (comment) 👍
Which issue does this PR close?
Closes #1070.
Rationale for this change
See #1070
This basically lets DataFusion rewrite expressions like the following at plan time rather than once per row
5 < 10-->truesubstr('foobar', 1 2)-->'oo'This is useful when predicates are created by some other system (e.g. a GUI) or other optimization passes
It works for all expression types that can be evaluated, including simple (e.g.
+), complex (e.g.CASE) , and user defned functions.Check out some examples within the datafusion-cli:
Note there is also prior work from @pjmore from #1128 which is partially included in this pr (see #1128 (comment))
What changes are included in this PR?
ConstEvalatorexpression rewriterConstEvaluatorintoConstantFoldingoptimizer passFollow on work
boolean == booleanandboolean != booleanoperators #1159Are there any user-facing changes?
ConstEvalatorin optimizer utilities