Skip to content
Closed
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
17b76e2
test aggregate filter
beliefer Nov 7, 2019
3f0583f
Fix scalastyle.
beliefer Nov 7, 2019
f64d14c
Resolve build issue.
beliefer Nov 7, 2019
d521be1
Resolve build issue.
beliefer Nov 7, 2019
8e342da
Add UT and check
beliefer Nov 11, 2019
0e56d03
Add UT and check
beliefer Nov 11, 2019
fd6461f
Add UT and check
beliefer Nov 11, 2019
f32ac4d
Add UT and check
beliefer Nov 11, 2019
5d33dab
uncomment test case.
beliefer Nov 11, 2019
9ea4736
Add test case output
beliefer Nov 11, 2019
4dcd0d3
Optimize code.
beliefer Nov 15, 2019
060d3d4
Optimize code.
beliefer Nov 15, 2019
4443883
Optimize code.
beliefer Nov 15, 2019
8beff8a
Optimize code.
beliefer Nov 15, 2019
4d0c3aa
Merge branch 'master' into support-aggregate-filter
beliefer Nov 18, 2019
14f2b21
Add FILTER to non reserved.
beliefer Nov 18, 2019
895f6ac
Remove filter.
beliefer Nov 19, 2019
675dca9
Optimize code.
beliefer Nov 20, 2019
1c1cf52
Optimize code.
beliefer Nov 20, 2019
fb8f477
Support sub query.
beliefer Nov 20, 2019
b677268
Add test case for sub query.
beliefer Nov 20, 2019
b831855
Fix bug
beliefer Nov 20, 2019
4c644ca
Fix bug
beliefer Nov 20, 2019
255650a
Merge branch 'master' into support-aggregate-filter
beliefer Nov 21, 2019
518aa4f
Optimize code base SPARK-29968
beliefer Nov 21, 2019
d979509
inputs -> arguments
beliefer Nov 25, 2019
6082e57
Add test cases in SQLQueryTestSuite.
beliefer Nov 25, 2019
ed80517
Add test cases in SQLQueryTestSuite.
beliefer Nov 25, 2019
3d37370
Delete variable filterExpressions
beliefer Nov 25, 2019
392c18d
Move predicates inside generateProcessRow
beliefer Nov 25, 2019
9a127e4
Move predicates inside generateProcessRow
beliefer Nov 25, 2019
967b135
Optimize variable isFinalOrMerge
beliefer Nov 25, 2019
07f774a
Reduce overkilling overhead
beliefer Nov 25, 2019
c86b691
Optimize code.
beliefer Nov 26, 2019
81c9482
Optimize code.
beliefer Nov 26, 2019
747b3ab
Optimize code.
beliefer Nov 26, 2019
f66c180
Optimize code
beliefer Nov 26, 2019
3652aef
Optimize code
beliefer Nov 26, 2019
8bfff6f
Optimize code
beliefer Nov 26, 2019
0911a76
test date
beliefer Nov 26, 2019
61bf6fd
Add test cases for date.
beliefer Nov 27, 2019
ea472aa
Add test cases for date.
beliefer Nov 27, 2019
583d51f
Add test cases for date.
beliefer Nov 27, 2019
030a9dc
Add test cases for to_date and to_timestamp
beliefer Nov 28, 2019
df643ba
Add test cases for to_date and to_timestamp
beliefer Nov 28, 2019
14daee6
Add test cases for to_date and to_timestamp
beliefer Nov 28, 2019
ce51461
Support distinct aggregate with filter
beliefer Nov 29, 2019
ce53930
Support distinct aggregate with filter
beliefer Nov 29, 2019
cb31eea
Support distinct aggregate with filter
beliefer Nov 29, 2019
f154622
Add comment for aggregate with DISTINCT.
beliefer Dec 2, 2019
4d1413f
Uncomment test cases for filter
beliefer Dec 3, 2019
0d20561
Optimize code.
beliefer Dec 4, 2019
bc2ad92
use canonicalized filter.
beliefer Dec 4, 2019
1297e03
use canonicalized filter.
beliefer Dec 4, 2019
f56400a
Rewrite aggregate with filter
beliefer Dec 9, 2019
eb856df
Adjust test cases
beliefer Dec 9, 2019
cffe318
Adjust test cases
beliefer Dec 9, 2019
4523616
Add comment for rewrite multi distinct aggregates
beliefer Dec 9, 2019
1cb0725
Fix scala style.
beliefer Dec 9, 2019
33d2b5b
Optimize comment.
beliefer Dec 9, 2019
c3e0f6a
Add test cases for multiple distinct aggregate
beliefer Dec 10, 2019
6c878d3
Add test cases for multiple distinct aggregate
beliefer Dec 10, 2019
40e31be
Add test cases for multiple distinct aggregate
beliefer Dec 10, 2019
affb6c0
Add test cases for multiple distinct aggregate
beliefer Dec 10, 2019
de11c4d
Optimize code
beliefer Dec 10, 2019
7c40292
Expand filter first
beliefer Dec 11, 2019
258a6c6
Expand filter first
beliefer Dec 11, 2019
4a494ae
Expand filter first
beliefer Dec 11, 2019
8cdd92d
Expand filter first
beliefer Dec 11, 2019
46c4980
Optimize code
beliefer Dec 11, 2019
d3f38f2
Optimize code
beliefer Dec 11, 2019
d40dd9f
File new jira and update comment
beliefer Dec 11, 2019
2518692
Merge branch 'master' into support-aggregate-clause-test
beliefer Dec 12, 2019
94a4a06
Update test result
beliefer Dec 12, 2019
9adfd2d
Update comment
beliefer Dec 12, 2019
0a4a5a2
Restore to distinct agg children
beliefer Dec 12, 2019
66ceeca
Revert to support FILTER without DISTINCT
beliefer Dec 17, 2019
3a350cb
Revert to support FILTER without DISTINCT
beliefer Dec 17, 2019
01f306e
Revert to support FILTER without DISTINCT
beliefer Dec 17, 2019
87697ec
Revert to support FILTER without DISTINCT
beliefer Dec 17, 2019
4dd7527
Revert to support FILTER without DISTINCT
beliefer Dec 17, 2019
53a6f2a
Revert to support FILTER without DISTINCT
beliefer Dec 17, 2019
b29ef0f
Revert to support FILTER without DISTINCT
beliefer Dec 17, 2019
c15389d
Adjust code
beliefer Dec 17, 2019
0cabcb6
Adjust code
beliefer Dec 17, 2019
b58c126
Adjust code
beliefer Dec 17, 2019
a3bb997
Adjust code
beliefer Dec 17, 2019
e520938
Adjust code
beliefer Dec 18, 2019
a2a79d5
Adjust code
beliefer Dec 18, 2019
ff1147f
Merge branch 'support-aggregate-clause-test' into support-aggregate-c…
beliefer Dec 18, 2019
b3584c8
Remove filter from TableIdentifierParserSuite
beliefer Dec 18, 2019
e03a959
Optimize code
beliefer Dec 18, 2019
998929c
Optimize code
beliefer Dec 18, 2019
a3d71f4
Optimize code
beliefer Dec 18, 2019
91fe90e
Merge branch 'support-aggregate-clause-test' into support-aggregate-c…
beliefer Dec 18, 2019
eb96463
Optimize code
beliefer Dec 19, 2019
97bb440
Optimize code
beliefer Dec 19, 2019
bb14439
Merge branch 'support-aggregate-clause-test' into support-aggregate-c…
beliefer Dec 19, 2019
c1dbb27
not support non-deterministic Filter expression
beliefer Dec 19, 2019
000ae72
not support non-deterministic Filter expression
beliefer Dec 19, 2019
27ad46e
Optimize code
beliefer Dec 20, 2019
436b1c0
Optimize code
beliefer Dec 23, 2019
d5aa8ee
Optimize code
beliefer Dec 23, 2019
5b2a1b9
Optimize code
beliefer Dec 23, 2019
6f9e839
Optimize tests
beliefer Dec 24, 2019
d98ea41
Optimize tests
beliefer Dec 24, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/sql-keywords.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ Below is a list of all the keywords in Spark SQL.
<tr><td>FALSE</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
<tr><td>FETCH</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
<tr><td>FIELDS</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
<tr><td>FILTER</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make it reserved under ansi mode. i.e. don't put the keyword in ansiNonreserved

Copy link
Contributor Author

@beliefer beliefer Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan
There exists a issue as below:

no viable alternative at input 'filter'(line 1, pos 7)
== SQL ==
select filter(ys, y -> y > 30) as v from nested
-------^^^

filter is a function.
It seems we can't put the keyword in ansiNonreserved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see. We should put it in functionName, which means that it's a reserved keyword, but can be used as function name.

Copy link
Contributor Author

@beliefer beliefer Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it in functionName still occur the issue.

<tr><td>FILEFORMAT</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
<tr><td>FIRST</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
<tr><td>FIRST_VALUE</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -745,7 +745,7 @@ primaryExpression
| '(' namedExpression (',' namedExpression)+ ')' #rowConstructor
| '(' query ')' #subqueryExpression
| functionName '(' (setQuantifier? argument+=expression (',' argument+=expression)*)? ')'
(OVER windowSpec)? #functionCall
(FILTER '(' WHERE where=booleanExpression ')')? (OVER windowSpec)? #functionCall
| identifier '->' expression #lambda
| '(' identifier (',' identifier)+ ')' '->' expression #lambda
| value=primaryExpression '[' index=valueExpression ']' #subscript
Expand Down Expand Up @@ -1023,6 +1023,7 @@ ansiNonReserved
| EXTERNAL
| EXTRACT
| FIELDS
| FILTER
| FILEFORMAT
| FIRST
| FOLLOWING
Expand Down Expand Up @@ -1262,6 +1263,7 @@ nonReserved
| EXTRACT
| FALSE
| FETCH
| FILTER
| FIELDS
| FILEFORMAT
| FIRST
Expand Down Expand Up @@ -1524,6 +1526,7 @@ EXTRACT: 'EXTRACT';
FALSE: 'FALSE';
FETCH: 'FETCH';
FIELDS: 'FIELDS';
FILTER: 'FILTER';
FILEFORMAT: 'FILEFORMAT';
FIRST: 'FIRST';
FIRST_VALUE: 'FIRST_VALUE';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1283,8 +1283,8 @@ class Analyzer(
*/
def expandStarExpression(expr: Expression, child: LogicalPlan): Expression = {
expr.transformUp {
case f1: UnresolvedFunction if containsStar(f1.children) =>
f1.copy(children = f1.children.flatMap {
case f1: UnresolvedFunction if containsStar(f1.arguments) =>
f1.copy(arguments = f1.arguments.flatMap {
case s: Star => s.expand(child, resolver)
case o => o :: Nil
})
Expand Down Expand Up @@ -1636,26 +1636,33 @@ class Analyzer(
s"its class is ${other.getClass.getCanonicalName}, which is not a generator.")
}
}
case u @ UnresolvedFunction(funcId, children, isDistinct) =>
case u @ UnresolvedFunction(funcId, arguments, isDistinct, filter) =>
withPosition(u) {
v1SessionCatalog.lookupFunction(funcId, children) match {
v1SessionCatalog.lookupFunction(funcId, arguments) match {
// AggregateWindowFunctions are AggregateFunctions that can only be evaluated within
// the context of a Window clause. They do not need to be wrapped in an
// AggregateExpression.
case wf: AggregateWindowFunction =>
if (isDistinct) {
failAnalysis(
s"DISTINCT specified, but ${wf.prettyName} is not an aggregate function")
} else if (filter.isDefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both words specified, only the DISTINCT is shown in error messages, so how about this?

                  val notSupportedWords = (if (isDistinct) "DISTINCT" :: Nil else Nil) ++
                    (if (filter.isDefined) "FILTER" :: Nil else Nil)
                  if (notSupportedWords.nonEmpty) {
                    failAnalysis(
                      s"${notSupportedWords.mkString(" and ")} specified, but ${wf.prettyName} " +
                        "is not an aggregate function")
                  } else {
                    wf
                  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

failAnalysis("FILTER predicate specified, " +
s"but ${wf.prettyName} is not an aggregate function")
} else {
wf
}
// We get an aggregate function, we need to wrap it in an AggregateExpression.
case agg: AggregateFunction => AggregateExpression(agg, Complete, isDistinct)
case agg: AggregateFunction =>
AggregateExpression(agg, Complete, isDistinct, filter)
// This function is not an aggregate function, just return the resolved one.
case other =>
if (isDistinct) {
failAnalysis(
s"DISTINCT specified, but ${other.prettyName} is not an aggregate function")
} else if (filter.isDefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

failAnalysis("FILTER predicate specified, " +
s"but ${other.prettyName} is not an aggregate function")
} else {
other
}
Expand Down Expand Up @@ -2253,15 +2260,15 @@ class Analyzer(

// Extract Windowed AggregateExpression
case we @ WindowExpression(
ae @ AggregateExpression(function, _, _, _),
ae @ AggregateExpression(function, _, _, _, _),
spec: WindowSpecDefinition) =>
val newChildren = function.children.map(extractExpr)
val newFunction = function.withNewChildren(newChildren).asInstanceOf[AggregateFunction]
val newAgg = ae.copy(aggregateFunction = newFunction)
seenWindowAggregates += newAgg
WindowExpression(newAgg, spec)

case AggregateExpression(aggFunc, _, _, _) if hasWindowFunction(aggFunc.children) =>
case AggregateExpression(aggFunc, _, _, _, _) if hasWindowFunction(aggFunc.children) =>
failAnalysis("It is not allowed to use a window function inside an aggregate " +
"function. Please use the inner window function in a sub-query.")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ trait CheckAnalysis extends PredicateHelper {
case g: GroupingID =>
failAnalysis("grouping_id() can only be used with GroupingSets/Cube/Rollup")

case w @ WindowExpression(AggregateExpression(_, _, true, _), _) =>
case w @ WindowExpression(AggregateExpression(_, _, true, _, _), _) =>
failAnalysis(s"Distinct window functions are not supported: $w")

case w @ WindowExpression(_: OffsetWindowFunction,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,14 @@ import org.apache.spark.sql.types.DataType
case class ResolveHigherOrderFunctions(catalog: SessionCatalog) extends Rule[LogicalPlan] {

override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
case u @ UnresolvedFunction(fn, children, false)
case u @ UnresolvedFunction(fn, children, false, filter)
if hasLambdaAndResolvedArguments(children) =>
withPosition(u) {
catalog.lookupFunction(fn, children) match {
case func: HigherOrderFunction => func
case func: HigherOrderFunction =>
filter.foreach(_.failAnalysis("FILTER predicate specified, " +
s"but ${func.prettyName} is not an aggregate function"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for this path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

func
case other => other.failAnalysis(
"A lambda function should only be used in a higher order function. However, " +
s"its class is ${other.getClass.getCanonicalName}, which is not a " +
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,10 +243,16 @@ case class UnresolvedGenerator(name: FunctionIdentifier, children: Seq[Expressio

case class UnresolvedFunction(
name: FunctionIdentifier,
children: Seq[Expression],
isDistinct: Boolean)
arguments: Seq[Expression],
isDistinct: Boolean,
filter: Option[Expression] = None)
extends Expression with Unevaluable {

override def children: Seq[Expression] = filter match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: arguments ++ filter.toSeq

case Some(expr) => arguments :+ expr
case _ => arguments
}

override def dataType: DataType = throw new UnresolvedException(this, "dataType")
override def foldable: Boolean = throw new UnresolvedException(this, "foldable")
override def nullable: Boolean = throw new UnresolvedException(this, "nullable")
Expand All @@ -257,8 +263,8 @@ case class UnresolvedFunction(
}

object UnresolvedFunction {
def apply(name: String, children: Seq[Expression], isDistinct: Boolean): UnresolvedFunction = {
UnresolvedFunction(FunctionIdentifier(name, None), children, isDistinct)
def apply(name: String, arguments: Seq[Expression], isDistinct: Boolean): UnresolvedFunction = {
UnresolvedFunction(FunctionIdentifier(name, None), arguments, isDistinct)
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,23 +71,27 @@ object AggregateExpression {
def apply(
aggregateFunction: AggregateFunction,
mode: AggregateMode,
isDistinct: Boolean): AggregateExpression = {
isDistinct: Boolean,
filter: Option[Expression] = None): AggregateExpression = {
AggregateExpression(
aggregateFunction,
mode,
isDistinct,
filter,
NamedExpression.newExprId)
}
}

/**
* A container for an [[AggregateFunction]] with its [[AggregateMode]] and a field
* (`isDistinct`) indicating if DISTINCT keyword is specified for this function.
* (`isDistinct`) indicating if DISTINCT keyword is specified for this function and
* a field (`filter`) indicating if filter clause is specified for this function.
*/
case class AggregateExpression(
aggregateFunction: AggregateFunction,
mode: AggregateMode,
isDistinct: Boolean,
filter: Option[Expression],
resultId: ExprId)
extends Expression
with Unevaluable {
Expand All @@ -104,6 +108,8 @@ case class AggregateExpression(
UnresolvedAttribute(aggregateFunction.toString)
}

lazy val filterAttributes: AttributeSet = filter.map(_.references).getOrElse(AttributeSet.empty)

// We compute the same thing regardless of our final result.
override lazy val canonicalized: Expression = {
val normalizedAggFunc = mode match {
Expand All @@ -119,18 +125,24 @@ case class AggregateExpression(
normalizedAggFunc.canonicalized.asInstanceOf[AggregateFunction],
mode,
isDistinct,
filter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also get canonicalized filter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the meaning of canonicalized filter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter.map(_. canonicalized), like what we do for the normalizedAggFunc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will change it.

ExprId(0))
}

override def children: Seq[Expression] = aggregateFunction :: Nil
override def children: Seq[Expression] = filter match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

@beliefer beliefer Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

override def children: Seq[Expression] = aggregateFunction ++ filter.toSeq
This cannot compile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregateFunction +: filter.toSeq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

case Some(expr) => aggregateFunction :: expr :: Nil
case _ => aggregateFunction :: Nil
}

override def dataType: DataType = aggregateFunction.dataType
override def foldable: Boolean = false
override def nullable: Boolean = aggregateFunction.nullable

@transient
override lazy val references: AttributeSet = {
mode match {
case Partial | Complete => aggregateFunction.references
case Partial | Complete =>
aggregateFunction.references ++ filter.map(_.references).getOrElse(AttributeSet.empty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ++ filterAttributes

case PartialMerge | Final => AttributeSet(aggregateFunction.aggBufferAttributes)
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1459,7 +1459,7 @@ object DecimalAggregates extends Rule[LogicalPlan] {

def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
case we @ WindowExpression(ae @ AggregateExpression(af, _, _, _), _) => af match {
case we @ WindowExpression(ae @ AggregateExpression(af, _, _, _, _), _) => af match {
case Sum(e @ DecimalType.Expression(prec, scale)) if prec + 10 <= MAX_LONG_DIGITS =>
MakeDecimal(we.copy(windowFunction = ae.copy(aggregateFunction = Sum(UnscaledValue(e)))),
prec + 10, scale)
Expand All @@ -1473,7 +1473,7 @@ object DecimalAggregates extends Rule[LogicalPlan] {

case _ => we
}
case ae @ AggregateExpression(af, _, _, _) => af match {
case ae @ AggregateExpression(af, _, _, _, _) => af match {
case Sum(e @ DecimalType.Expression(prec, scale)) if prec + 10 <= MAX_LONG_DIGITS =>
MakeDecimal(ae.copy(aggregateFunction = Sum(UnscaledValue(e))), prec + 10, scale)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -526,9 +526,9 @@ object NullPropagation extends Rule[LogicalPlan] {
case q: LogicalPlan => q transformExpressionsUp {
case e @ WindowExpression(Cast(Literal(0L, _), _, _), _) =>
Cast(Literal(0L), e.dataType, Option(SQLConf.get.sessionLocalTimeZone))
case e @ AggregateExpression(Count(exprs), _, _, _) if exprs.forall(isNullLiteral) =>
case e @ AggregateExpression(Count(exprs), _, _, _, _) if exprs.forall(isNullLiteral) =>
Cast(Literal(0L), e.dataType, Option(SQLConf.get.sessionLocalTimeZone))
case ae @ AggregateExpression(Count(exprs), _, false, _) if !exprs.exists(_.nullable) =>
case ae @ AggregateExpression(Count(exprs), _, false, _, _) if !exprs.exists(_.nullable) =>
// This rule should be only triggered when isDistinct field is false.
ae.copy(aggregateFunction = Count(Literal(1)))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -368,7 +368,7 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] {
// in the expression with the value they would return for zero input tuples.
// Also replace attribute refs (for example, for grouping columns) with NULL.
val rewrittenExpr = expr transform {
case a @ AggregateExpression(aggFunc, _, _, resultId) =>
case a @ AggregateExpression(aggFunc, _, _, resultId, _) =>
aggFunc.defaultResult.getOrElse(Literal.default(NullType))

case _: AttributeReference => Literal.default(NullType)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1591,8 +1591,9 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
case expressions =>
expressions
}
val filter = Option(ctx.where).map(expression(_))
val function = UnresolvedFunction(
getFunctionIdentifier(ctx.functionName), arguments, isDistinct)
getFunctionIdentifier(ctx.functionName), arguments, isDistinct, filter)

// Check if the function is evaluated in a windowed context.
ctx.windowSpec match {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,11 +169,21 @@ class AnalysisErrorSuite extends AnalysisTest {
CatalystSqlParser.parsePlan("SELECT hex(DISTINCT a) FROM TaBlE"),
"DISTINCT specified, but hex is not an aggregate function" :: Nil)

errorTest(
"non aggregate function with filter predicate",
CatalystSqlParser.parsePlan("SELECT hex(a) filter (where c = 1) FROM TaBlE2"),
Copy link
Member

@dongjoon-hyun dongjoon-hyun Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter -> FILTER and where -> WHERE will be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

"FILTER predicate specified, but hex is not an aggregate function" :: Nil)

errorTest(
"distinct window function",
CatalystSqlParser.parsePlan("SELECT percent_rank(DISTINCT a) over () FROM TaBlE"),
"DISTINCT specified, but percent_rank is not an aggregate function" :: Nil)

errorTest(
"window function with filter predicate",
CatalystSqlParser.parsePlan("SELECT percent_rank(a) filter (where c > 1) over () FROM TaBlE2"),
"FILTER predicate specified, but percent_rank is not an aggregate function" :: Nil)

errorTest(
"nested aggregate functions",
testRelation.groupBy('a)(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,7 @@ class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper {
"fetch",
"fields",
"fileformat",
"filter",
"first",
"first_value",
"following",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -219,14 +219,14 @@ object AggregatingAccumulator {
val typedImperatives = mutable.Buffer.empty[TypedImperativeAggregate[_]]
val inputAttributeSeq: AttributeSeq = inputAttributes
val resultExpressions = functions.map(_.transform {
case AggregateExpression(agg: DeclarativeAggregate, _, _, _) =>
case AggregateExpression(agg: DeclarativeAggregate, _, _, _, _) =>
aggBufferAttributes ++= agg.aggBufferAttributes
inputAggBufferAttributes ++= agg.inputAggBufferAttributes
initialValues ++= agg.initialValues
updateExpressions ++= agg.updateExpressions
mergeExpressions ++= agg.mergeExpressions
agg.evaluateExpression
case AggregateExpression(agg: ImperativeAggregate, _, _, _) =>
case AggregateExpression(agg: ImperativeAggregate, _, _, _, _) =>
val imperative = BindReferences.bindReference(agg
.withNewMutableAggBufferOffset(aggBufferAttributes.size)
.withNewInputAggBufferOffset(inputAggBufferAttributes.size),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,19 +135,27 @@ object AggUtils {
}
val distinctAttributes = namedDistinctExpressions.map(_.toAttribute)
val groupingAttributes = groupingExpressions.map(_.toAttribute)
val filterWithDistinctAttributes = functionsWithDistinct.flatMap(_.filterAttributes.toSeq)

// 1. Create an Aggregate Operator for partial aggregations.
val partialAggregate: SparkPlan = {
val aggregateExpressions = functionsWithoutDistinct.map(_.copy(mode = Partial))
val aggregateAttributes = aggregateExpressions.map(_.resultAttribute)
// We will group by the original grouping expression, plus an additional expression for the
// DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, the grouping
// expressions will be [key, value].
// DISTINCT column and the expression in the FILTER clause associated with each aggregate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the expression -> the referred attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

// function. For example:
// 1.for the AVG (DISTINCT value) GROUP BY key, the grouping expression will be [key, value];
// 2.for the AVG (value) Filter (WHERE value2> 20) GROUP BY key, the grouping expression
// will be [key, value2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, seems you do not even use the additional grouping exprs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry! I wrote a wrong comment. I will change it.

// 3.for AVG (DISTINCT value) Filter (WHERE value2> 20) GROUP BY key, the grouping expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we add additional grouping expression for filter expression or attributes? Looks like you add only attribute not the filter expression?

Copy link
Contributor Author

@beliefer beliefer Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I only added the referred attributes of the filter expression

// will be [key, value, value2].
createAggregate(
groupingExpressions = groupingExpressions ++ namedDistinctExpressions,
groupingExpressions = groupingExpressions ++ namedDistinctExpressions ++
filterWithDistinctAttributes,
aggregateExpressions = aggregateExpressions,
aggregateAttributes = aggregateAttributes,
resultExpressions = groupingAttributes ++ distinctAttributes ++
filterWithDistinctAttributes ++
aggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes),
child = child)
}
Expand All @@ -159,11 +167,13 @@ object AggUtils {
createAggregate(
requiredChildDistributionExpressions =
Some(groupingAttributes ++ distinctAttributes),
groupingExpressions = groupingAttributes ++ distinctAttributes,
groupingExpressions = groupingAttributes ++ distinctAttributes ++
filterWithDistinctAttributes,
aggregateExpressions = aggregateExpressions,
aggregateAttributes = aggregateAttributes,
initialInputBufferOffset = (groupingAttributes ++ distinctAttributes).length,
resultExpressions = groupingAttributes ++ distinctAttributes ++
filterWithDistinctAttributes ++
aggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes),
child = partialAggregate)
}
Expand All @@ -174,7 +184,7 @@ object AggUtils {
// Children of an AggregateFunction with DISTINCT keyword has already
// been evaluated. At here, we need to replace original children
// to AttributeReferences.
case agg @ AggregateExpression(aggregateFunction, mode, true, _) =>
case agg @ AggregateExpression(aggregateFunction, mode, true, _, _) =>
aggregateFunction.transformDown(distinctColumnAttributeLookup)
.asInstanceOf[AggregateFunction]
case agg =>
Expand All @@ -194,7 +204,8 @@ object AggUtils {
// its input will have distinct arguments.
// We just keep the isDistinct setting to true, so when users look at the query plan,
// they still can see distinct aggregations.
val expr = AggregateExpression(func, Partial, isDistinct = true)
val filter = functionsWithDistinct(i).filter
val expr = AggregateExpression(func, Partial, isDistinct = true, filter)
// Use original AggregationFunction to lookup attributes, which is used to build
// aggregateFunctionToAttribute
val attr = functionsWithDistinct(i).resultAttribute
Expand Down
Loading