[SPARK-35712][SQL] Simplify ResolveAggregateFunctions #32470

cloud-fan · 2021-05-07T19:11:55Z

What changes were proposed in this pull request?

Currently, ResolveAggregateFunctions is a complicated rule that recursively calls the entire analyzer to resolve aggregate functions in parent nodes of aggregate. It's kind of necessary as we need to do many things to identify the aggregate function and push it down to the aggregate node: resolve columns as if they are in the aggregate node, resolve functions, apply type coercion, etc. However, this is overly complicated and it's hard to fully understand how the resolution is done there. It also leads to hacks such as the char/varchar hack, subquery hack, grouping function hack, etc.

This PR simplifies the ResolveAggregateFunctions rule and clarifies the resolution logic. To resolve aggregate functions/grouping columns in HAVING, ORDER BY and df.where, we expand the aggregate node below to output these required aggregate functions/grouping columns. In details, when resolving an expression from the parent of an aggregate node:

try to resolve columns with agg.child and wrap the result with TempResolvedColumn.
try to resolve subqueries with agg.child
if the expression is not resolved, return it and wait for other rules to resolve it, such as resolve functions, type coercions, etc.
if the expression is resolved, we transform it and push aggregate functions/grouping columns into the aggregate node below.
4.1 the expression may already present in agg.aggregateExpressions, we can simply replace the expression with attr ref.
4.2 if a TempResolvedColumn is neither inside an aggregate function, or wrap a grouping column, turn it back to an UnresolvedAttribute
after the main resolution batch, remove all TempResolvedColumn and turn them back to UnresolvedAttribute.

Why are the changes needed?

Code cleanup

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing test

SparkQA · 2021-05-07T20:11:37Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42783/

SparkQA · 2021-05-07T20:53:32Z

Test build #138261 has finished for PR 32470 at commit a8c708f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-05-08T07:54:45Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42803/

SparkQA · 2021-05-08T07:54:46Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42803/

SparkQA · 2021-05-08T08:29:03Z

Test build #138281 has finished for PR 32470 at commit c67a801.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-05-10T07:51:22Z

Test build #138326 has finished for PR 32470 at commit dbc00c8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-05-10T08:15:41Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42848/

SparkQA · 2021-05-10T08:15:42Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42848/

SparkQA · 2021-05-10T15:00:02Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42858/

SparkQA · 2021-05-10T15:00:04Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42858/

SparkQA · 2021-05-10T16:04:48Z

Test build #138336 has finished for PR 32470 at commit ab31ab1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-05-11T06:01:54Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42880/

SparkQA · 2021-05-11T06:01:55Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42880/

SparkQA · 2021-05-11T09:11:47Z

Test build #138358 has finished for PR 32470 at commit c0bb807.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-10T03:32:21Z

Test build #139609 has finished for PR 32470 at commit e33c0a8.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-10T04:22:22Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44136/

SparkQA · 2021-06-10T04:57:19Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44136/

SparkQA · 2021-06-10T05:17:06Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44137/

SparkQA · 2021-06-10T05:52:07Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44137/

SparkQA · 2021-06-10T07:50:01Z

Test build #139612 has finished for PR 32470 at commit d08d8e4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-06-10T09:20:19Z

sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q35.sf100/explain.txt

the only change is the removed alias

SparkQA · 2021-06-12T01:35:26Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44242/

SparkQA · 2021-06-12T02:15:11Z

Test build #139716 has finished for PR 32470 at commit d5f2259.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-12T03:05:37Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44242/

allisonwang-db · 2021-06-14T08:13:35Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

The children of the resolved subquery that are not aggregate expressions should also be wrapped in TempResolvedColumn (but it seems the current logic can't handle this case with proper error messages either. This can be a follow-up in the future)

select c1 from t1 group by c1 having (select sum(c2) from t2 where t1.c2 = t2.c1) > 0 -- org.apache.spark.sql.AnalysisException: Resolved attribute(s) c2#10 missing from c1#9 in operator -- !Filter (scalar-subquery#8 [c2#10] > cast(0 as bigint)).;

SparkQA · 2021-06-15T11:00:02Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44335/

SparkQA · 2021-06-15T11:35:33Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44335/

SparkQA · 2021-06-15T14:27:54Z

Test build #139808 has finished for PR 32470 at commit b362a09.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-06-16T04:24:33Z

@viirya @maropu Can you take one more look?

cloud-fan · 2021-06-16T09:52:18Z

thanks for the review, merging to master!

maropu · 2021-06-16T12:22:31Z

late lgtm.

### What changes were proposed in this pull request? This is an alternative PR to #51810 to fix a regresion introduced in Spark 3.2 with #32470. This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`: ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics === 'Sort ['s DESC NULLS LAST], true 'Sort ['s DESC NULLS LAST], true !+- 'UnresolvedHaving ('count('product) > 2) +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2) ! +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L] +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L] ! +- SubqueryAlias t +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] ! +- LocalRelation [product#261, region#262, amount#263] +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] ! +- SubqueryAlias t ! +- LocalRelation [product#261, region#262, amount#263] ``` to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`): ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions === 'Sort ['s DESC NULLS LAST], true 'Sort ['s DESC NULLS LAST], true !+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint)) +- Project [product#269, region#270, s#264L] ! +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L] +- Filter (count(product)#272L > cast(2 as bigint)) ! +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L] ! +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] ! +- SubqueryAlias t +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] ! +- LocalRelation [product#261, region#262, amount#263] +- SubqueryAlias t ! +- LocalRelation [product#261, region#262, amount#263] ``` ### Why are the changes needed? Fix a correctness isue described in #51810. ### Does this PR introduce _any_ user-facing change? Yes, it fixes a correctness issue. ### How was this patch tested? Added new UT from #51810. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51820 from peter-toth/SPARK-53094-fix-cube-having. Lead-authored-by: Peter Toth <[email protected]> Co-authored-by: harris233 <[email protected]> Signed-off-by: Peter Toth <[email protected]>

…uses This is an alternative PR to apache#51810 to fix a regresion introduced in Spark 3.2 with apache#32470. This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`: ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics === 'Sort ['s DESC NULLS LAST], true 'Sort ['s DESC NULLS LAST], true !+- 'UnresolvedHaving ('count('product) > 2) +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2) ! +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L] +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L] ! +- SubqueryAlias t +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] ! +- LocalRelation [product#261, region#262, amount#263] +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] ! +- SubqueryAlias t ! +- LocalRelation [product#261, region#262, amount#263] ``` to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`): ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions === 'Sort ['s DESC NULLS LAST], true 'Sort ['s DESC NULLS LAST], true !+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint)) +- Project [product#269, region#270, s#264L] ! +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L] +- Filter (count(product)#272L > cast(2 as bigint)) ! +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L] ! +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] ! +- SubqueryAlias t +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] ! +- LocalRelation [product#261, region#262, amount#263] +- SubqueryAlias t ! +- LocalRelation [product#261, region#262, amount#263] ``` Fix a correctness isue described in apache#51810. Yes, it fixes a correctness issue. Added new UT from apache#51810. No. Closes apache#51820 from peter-toth/SPARK-53094-fix-cube-having. Lead-authored-by: Peter Toth <[email protected]> Co-authored-by: harris233 <[email protected]> Signed-off-by: Peter Toth <[email protected]>

…uses ### What changes were proposed in this pull request? This is an alternative PR to #51810 to fix a regresion introduced in Spark 3.2 with #32470. This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`: ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics === 'Sort ['s DESC NULLS LAST], true 'Sort ['s DESC NULLS LAST], true !+- 'UnresolvedHaving ('count('product) > 2) +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2) ! +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L] +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L] ! +- SubqueryAlias t +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] ! +- LocalRelation [product#261, region#262, amount#263] +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] ! +- SubqueryAlias t ! +- LocalRelation [product#261, region#262, amount#263] ``` to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`): ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions === 'Sort ['s DESC NULLS LAST], true 'Sort ['s DESC NULLS LAST], true !+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint)) +- Project [product#269, region#270, s#264L] ! +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L] +- Filter (count(product)#272L > cast(2 as bigint)) ! +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L] ! +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] ! +- SubqueryAlias t +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] ! +- LocalRelation [product#261, region#262, amount#263] +- SubqueryAlias t ! +- LocalRelation [product#261, region#262, amount#263] ``` ### Why are the changes needed? Fix a correctness isue described in #51810. ### Does this PR introduce _any_ user-facing change? Yes, it fixes a correctness issue. ### How was this patch tested? Added new UT from #51810. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51854 from peter-toth/SPARK-53094-fix-cube-having-4.0. Authored-by: Peter Toth <[email protected]> Signed-off-by: Peter Toth <[email protected]>

…uses ### What changes were proposed in this pull request? This is an alternative PR to #51810 to fix a regresion introduced in Spark 3.2 with #32470. This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`: ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics === 'Sort ['s DESC NULLS LAST], true 'Sort ['s DESC NULLS LAST], true !+- 'UnresolvedHaving ('count('product) > 2) +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2) ! +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L] +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L] ! +- SubqueryAlias t +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] ! +- LocalRelation [product#261, region#262, amount#263] +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] ! +- SubqueryAlias t ! +- LocalRelation [product#261, region#262, amount#263] ``` to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`): ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions === 'Sort ['s DESC NULLS LAST], true 'Sort ['s DESC NULLS LAST], true !+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint)) +- Project [product#269, region#270, s#264L] ! +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L] +- Filter (count(product)#272L > cast(2 as bigint)) ! +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L] ! +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] ! +- SubqueryAlias t +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] ! +- LocalRelation [product#261, region#262, amount#263] +- SubqueryAlias t ! +- LocalRelation [product#261, region#262, amount#263] ``` ### Why are the changes needed? Fix a correctness isue described in #51810. ### Does this PR introduce _any_ user-facing change? Yes, it fixes a correctness issue. ### How was this patch tested? Added new UT from #51810. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #51855 from peter-toth/SPARK-53094-fix-cube-having-3.5. Authored-by: Peter Toth <[email protected]> Signed-off-by: Peter Toth <[email protected]>

…uses ### What changes were proposed in this pull request? This is an alternative PR to apache#51810 to fix a regresion introduced in Spark 3.2 with apache#32470. This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`: ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics === 'Sort ['s DESC NULLS LAST], true 'Sort ['s DESC NULLS LAST], true !+- 'UnresolvedHaving ('count('product) > 2) +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2) ! +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L] +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L] ! +- SubqueryAlias t +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] ! +- LocalRelation [product#261, region#262, amount#263] +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] ! +- SubqueryAlias t ! +- LocalRelation [product#261, region#262, amount#263] ``` to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`): ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions === 'Sort ['s DESC NULLS LAST], true 'Sort ['s DESC NULLS LAST], true !+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint)) +- Project [product#269, region#270, s#264L] ! +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L] +- Filter (count(product)#272L > cast(2 as bigint)) ! +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L] ! +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L] ! +- SubqueryAlias t +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267] ! +- LocalRelation [product#261, region#262, amount#263] +- SubqueryAlias t ! +- LocalRelation [product#261, region#262, amount#263] ``` ### Why are the changes needed? Fix a correctness isue described in apache#51810. ### Does this PR introduce _any_ user-facing change? Yes, it fixes a correctness issue. ### How was this patch tested? Added new UT from apache#51810. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51854 from peter-toth/SPARK-53094-fix-cube-having-4.0. Authored-by: Peter Toth <[email protected]> Signed-off-by: Peter Toth <[email protected]>

github-actions bot added the SQL label May 7, 2021

cloud-fan force-pushed the agg2 branch from a8c708f to c67a801 Compare May 8, 2021 06:22

cloud-fan force-pushed the agg2 branch from c67a801 to dbc00c8 Compare May 10, 2021 07:11

cloud-fan force-pushed the agg2 branch from dbc00c8 to ab31ab1 Compare May 10, 2021 13:50

cloud-fan force-pushed the agg2 branch from ab31ab1 to c0bb807 Compare May 11, 2021 04:49

cloud-fan force-pushed the agg2 branch from c0bb807 to e33c0a8 Compare June 10, 2021 02:51

cloud-fan force-pushed the agg2 branch 2 times, most recently from 48e01a6 to d08d8e4 Compare June 10, 2021 03:47

cloud-fan force-pushed the agg2 branch from d08d8e4 to ae4d411 Compare June 10, 2021 09:08

cloud-fan commented Jun 10, 2021

View reviewed changes

allisonwang-db approved these changes Jun 14, 2021

View reviewed changes

cloud-fan added 5 commits June 15, 2021 17:33

Simplify ResolveAggregateFunctions

65e76a1

address comments

0d2867a

simplify more

f542726

update plan golden files

8c259bf

address comments

b362a09

cloud-fan force-pushed the agg2 branch from d5f2259 to b362a09 Compare June 15, 2021 09:46

viirya approved these changes Jun 16, 2021

View reviewed changes

cloud-fan closed this in a2961dd Jun 16, 2021

ulysses-you mentioned this pull request Dec 20, 2021

Watchdog support for Spark-3.2 apache/kyuubi#1591

Closed

3 tasks

amaliujia mentioned this pull request Feb 11, 2022

[SPARK-38118][SQL] Func(wrong data type) in HAVING clause should throw data mismatch error #35404

Closed

peter-toth mentioned this pull request Aug 4, 2025

[SPARK-53094][SQL] Fix CUBE with aggregate containing HAVING clauses #51820

Closed

This was referenced Aug 5, 2025

[SPARK-53094][SQL][4.0] Fix CUBE with aggregate containing HAVING clauses #51854

Closed

[SPARK-53094][SQL][3.5] Fix CUBE with aggregate containing HAVING clauses #51855

Closed

[SPARK-35712][SQL] Simplify ResolveAggregateFunctions #32470

[SPARK-35712][SQL] Simplify ResolveAggregateFunctions #32470

Uh oh!

Conversation

cloud-fan commented May 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented May 7, 2021

Uh oh!

SparkQA commented May 7, 2021

Uh oh!

SparkQA commented May 8, 2021

Uh oh!

SparkQA commented May 8, 2021

Uh oh!

SparkQA commented May 8, 2021

Uh oh!

SparkQA commented May 10, 2021

Uh oh!

SparkQA commented May 10, 2021

Uh oh!

SparkQA commented May 10, 2021

Uh oh!

SparkQA commented May 10, 2021

Uh oh!

SparkQA commented May 10, 2021

Uh oh!

SparkQA commented May 10, 2021

Uh oh!

SparkQA commented May 11, 2021

Uh oh!

SparkQA commented May 11, 2021

Uh oh!

SparkQA commented May 11, 2021

Uh oh!

SparkQA commented Jun 10, 2021

Uh oh!

SparkQA commented Jun 10, 2021

Uh oh!

SparkQA commented Jun 10, 2021

Uh oh!

SparkQA commented Jun 10, 2021

Uh oh!

SparkQA commented Jun 10, 2021

Uh oh!

SparkQA commented Jun 10, 2021

Uh oh!

cloud-fan Jun 10, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 12, 2021

Uh oh!

SparkQA commented Jun 12, 2021

Uh oh!

SparkQA commented Jun 12, 2021

Uh oh!

allisonwang-db Jun 14, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 15, 2021

Uh oh!

SparkQA commented Jun 15, 2021

Uh oh!

SparkQA commented Jun 15, 2021

Uh oh!

cloud-fan commented Jun 16, 2021

Uh oh!

cloud-fan commented Jun 16, 2021

Uh oh!

maropu commented Jun 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

cloud-fan commented May 7, 2021 •

edited

Loading