Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ abstract class Expression extends TreeNode[Expression] {
* - A [[expressions.Cast Cast]] or [[expressions.UnaryMinus UnaryMinus]] is foldable if its
* child is foldable.
*/
// TODO: Supporting more foldable expressions. For example, deterministic Hive UDFs.
def foldable: Boolean = false
def nullable: Boolean
def references: Set[Attribute]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ case class GetItem(child: Expression, ordinal: Expression) extends Expression {
val children = child :: ordinal :: Nil
/** `Null` is returned for invalid ordinals. */
override def nullable = true
override def foldable = child.foldable && ordinal.foldable
override def references = children.flatMap(_.references).toSet
def dataType = child.dataType match {
case ArrayType(dt) => dt
Expand Down Expand Up @@ -69,7 +70,8 @@ case class GetField(child: Expression, fieldName: String) extends UnaryExpressio
type EvaluatedType = Any

def dataType = field.dataType
def nullable = field.nullable
override def nullable = field.nullable
override def foldable = child.foldable

protected def structType = child.dataType match {
case s: StructType => s
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,7 @@ abstract class BinaryPredicate extends BinaryExpression with Predicate {
def nullable = left.nullable || right.nullable
}

case class Not(child: Expression) extends Predicate with trees.UnaryNode[Expression] {
def references = child.references
case class Not(child: Expression) extends UnaryExpression with Predicate {
override def foldable = child.foldable
def nullable = child.nullable
override def toString = s"NOT $child"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,10 +91,53 @@ object ColumnPruning extends Rule[LogicalPlan] {
*/
object ConstantFolding extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
case q: LogicalPlan => q transformExpressionsUp {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want Up? I believe this means we are going to call evaluate on each foldable node working up instead of just calling it once at the top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expression.foldable is ok by traveling from top to bottom, while null propagation is opposite. I've put them into different rule objects (ConstantFolding & NullPropagation).

// Skip redundant folding of literals.
case l: Literal => l
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to skip literals since none of the conditions below can ever match a raw literal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if put the literal matching in the beginning, maybe helpful avoid the further pattern matching of the rest rules. Just a tiny performance optimization for Literal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By that logic it would be an optimization to skip any class that won't match the cases below. Why is Literal a special case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as the rule ConstantFolding, NullPropagation won't do any transformation for Literal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in the case of ConstantFolding the subsequent pattern will match Literal, since a Literal is technically foldable. Matching the next pattern causes the rule to invoke the expression evaluator and create an identical, wasted object.

In NullPropogation, a Literal will not match any of the later rules. So in essence you are second guessing the code generated by the pattern matcher. While there may be extreme cases where that is required for performance, I don't think this is one of them.

// if it's foldable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can omit this comment since the line beneath reads if e.foldable.

case e if e.foldable => Literal(e.eval(null), e.dataType)
case e @ Count(Literal(null, _)) => Literal(null, e.dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think COUNT(null) actually evaluates to 0.

case e @ Sum(Literal(null, _)) => Literal(null, e.dataType)
case e @ Average(Literal(null, _)) => Literal(null, e.dataType)
case e @ IsNull(Literal(null, _)) => Literal(true, BooleanType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these already fold correctly.

scala> sql("SELECT null IS NULL")
res4: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[0] at RDD at SchemaRDD.scala:96
== Query Plan ==
Project [true AS c0#0]

Maybe we should write tests for each case, before adding the rule, to make sure it is broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, correctly, if all of the operands are literal, and it's covered by the rule expression.foldable in most of cases. I removed the overlapped cases from the rule NullPropagation

case e @ IsNull(Literal(_, _)) => Literal(false, BooleanType)
case e @ IsNull(c @ Rand) => Literal(false, BooleanType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this more generally stated as case IsNull(e) if e.nullable == false => Literal(false)? Similarly below.

case e @ IsNotNull(Literal(null, _)) => Literal(false, BooleanType)
case e @ IsNotNull(Literal(_, _)) => Literal(true, BooleanType)
case e @ IsNotNull(c @ Rand) => Literal(true, BooleanType)
case e @ GetItem(Literal(null, _), _) => Literal(null, e.dataType)
case e @ GetItem(_, Literal(null, _)) => Literal(null, e.dataType)
case e @ GetField(Literal(null, _), _) => Literal(null, e.dataType)
case e @ Coalesce(children) => {
val newChildren = children.filter(c => c match {
case Literal(null, _) => false
case _ => true
})
if(newChildren.length == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can .length == null? Also, transform already does equality checking to minimize GC churn, so I think its okay to just return a filtered version of the children.

Literal(null, e.dataType)
} else if(newChildren.length == children.length){
e
} else {
Coalesce(newChildren)
}
}
case e @ If(Literal(v, _), trueValue, falseValue) => if(v == true) trueValue else falseValue
case e @ In(Literal(v, _), list) if(list.exists(c => c match {
case Literal(candidate, _) if(candidate == v) => true
case _ => false
})) => Literal(true, BooleanType)

case e @ SortOrder(_, _) => e
// put exceptional cases(Unary & Binary Expression) before here.
case e: UnaryExpression => e.child match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its reasonable to enforce this nullability semantic on unary and binary nodes, but we should add something to their scaladoc. Maybe also just make SortOrder not a UnaryNode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I've update the SortOrder which inherits from UnaryNode instead of UnaryExpression.

case Literal(null, _) => Literal(null, e.dataType)
case _ => e
}
case e: BinaryExpression => e.children match {
case Literal(null, _) :: right :: Nil => Literal(null, e.dataType)
case left :: Literal(null, _) :: Nil => Literal(null, e.dataType)
case _ => e
}
}
}
}
Expand Down
11 changes: 11 additions & 0 deletions sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import scala.collection.mutable.ArrayBuffer
import org.apache.hadoop.hive.common.`type`.HiveDecimal
import org.apache.hadoop.hive.ql.exec.UDF
import org.apache.hadoop.hive.ql.exec.{FunctionInfo, FunctionRegistry}
import org.apache.hadoop.hive.ql.udf.{UDFType => HiveUDFType}
import org.apache.hadoop.hive.ql.udf.generic._
import org.apache.hadoop.hive.serde2.objectinspector._
import org.apache.hadoop.hive.serde2.objectinspector.primitive._
Expand Down Expand Up @@ -213,6 +214,16 @@ private[hive] case class HiveGenericUdf(name: String, children: Seq[Expression])

@transient
protected lazy val returnInspector = function.initialize(argumentInspectors.toArray)

@transient
protected lazy val isUDFDeterministic = {
val udfType = function.getClass().getAnnotation(classOf[HiveUDFType])
(udfType != null && udfType.deterministic())
}

override def foldable = {
isUDFDeterministic && children.foldLeft(true)((prev, n) => prev && n.foldable)
}

val dataType: DataType = inspectorToDataType(returnInspector)

Expand Down