-
Notifications
You must be signed in to change notification settings - Fork 29k
[WIP][Spark-SQL] Optimize the Constant Folding for Expression #482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
2645d4f
3c045c7
536c005
9cf0396
543ef9d
9ccefdb
b28e03a
27ea3d7
80f9f18
50444cc
29c8166
68b9fad
2f14b50
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -91,10 +91,53 @@ object ColumnPruning extends Rule[LogicalPlan] { | |
| */ | ||
| object ConstantFolding extends Rule[LogicalPlan] { | ||
| def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
| case q: LogicalPlan => q transformExpressionsDown { | ||
| case q: LogicalPlan => q transformExpressionsUp { | ||
| // Skip redundant folding of literals. | ||
| case l: Literal => l | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no need to skip literals since none of the conditions below can ever match a raw literal.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking if put the literal matching in the beginning, maybe helpful avoid the further pattern matching of the rest rules. Just a tiny performance optimization for Literal.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By that logic it would be an optimization to skip any class that won't match the cases below. Why is Literal a special case?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The same as the rule ConstantFolding, NullPropagation won't do any transformation for Literal.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but in the case of In |
||
| // if it's foldable | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can omit this comment since the line beneath reads |
||
| case e if e.foldable => Literal(e.eval(null), e.dataType) | ||
| case e @ Count(Literal(null, _)) => Literal(null, e.dataType) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think |
||
| case e @ Sum(Literal(null, _)) => Literal(null, e.dataType) | ||
| case e @ Average(Literal(null, _)) => Literal(null, e.dataType) | ||
| case e @ IsNull(Literal(null, _)) => Literal(true, BooleanType) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some of these already fold correctly. scala> sql("SELECT null IS NULL")
res4: org.apache.spark.sql.SchemaRDD =
SchemaRDD[0] at RDD at SchemaRDD.scala:96
== Query Plan ==
Project [true AS c0#0]Maybe we should write tests for each case, before adding the rule, to make sure it is broken.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, correctly, if all of the operands are literal, and it's covered by the rule |
||
| case e @ IsNull(Literal(_, _)) => Literal(false, BooleanType) | ||
| case e @ IsNull(c @ Rand) => Literal(false, BooleanType) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this more generally stated as |
||
| case e @ IsNotNull(Literal(null, _)) => Literal(false, BooleanType) | ||
| case e @ IsNotNull(Literal(_, _)) => Literal(true, BooleanType) | ||
| case e @ IsNotNull(c @ Rand) => Literal(true, BooleanType) | ||
| case e @ GetItem(Literal(null, _), _) => Literal(null, e.dataType) | ||
| case e @ GetItem(_, Literal(null, _)) => Literal(null, e.dataType) | ||
| case e @ GetField(Literal(null, _), _) => Literal(null, e.dataType) | ||
| case e @ Coalesce(children) => { | ||
| val newChildren = children.filter(c => c match { | ||
| case Literal(null, _) => false | ||
| case _ => true | ||
| }) | ||
| if(newChildren.length == null) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can |
||
| Literal(null, e.dataType) | ||
| } else if(newChildren.length == children.length){ | ||
| e | ||
| } else { | ||
| Coalesce(newChildren) | ||
| } | ||
| } | ||
| case e @ If(Literal(v, _), trueValue, falseValue) => if(v == true) trueValue else falseValue | ||
| case e @ In(Literal(v, _), list) if(list.exists(c => c match { | ||
| case Literal(candidate, _) if(candidate == v) => true | ||
| case _ => false | ||
| })) => Literal(true, BooleanType) | ||
|
|
||
| case e @ SortOrder(_, _) => e | ||
| // put exceptional cases(Unary & Binary Expression) before here. | ||
| case e: UnaryExpression => e.child match { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think its reasonable to enforce this nullability semantic on unary and binary nodes, but we should add something to their scaladoc. Maybe also just make
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, I've update the SortOrder which inherits from UnaryNode instead of UnaryExpression. |
||
| case Literal(null, _) => Literal(null, e.dataType) | ||
| case _ => e | ||
| } | ||
| case e: BinaryExpression => e.children match { | ||
| case Literal(null, _) :: right :: Nil => Literal(null, e.dataType) | ||
| case left :: Literal(null, _) :: Nil => Literal(null, e.dataType) | ||
| case _ => e | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want
Up? I believe this means we are going to call evaluate on each foldable node working up instead of just calling it once at the top.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expression.foldableis ok by traveling from top to bottom, while null propagation is opposite. I've put them into different rule objects (ConstantFolding & NullPropagation).