Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented May 7, 2021

What changes were proposed in this pull request?

Currently, ResolveAggregateFunctions is a complicated rule that recursively calls the entire analyzer to resolve aggregate functions in parent nodes of aggregate. It's kind of necessary as we need to do many things to identify the aggregate function and push it down to the aggregate node: resolve columns as if they are in the aggregate node, resolve functions, apply type coercion, etc. However, this is overly complicated and it's hard to fully understand how the resolution is done there. It also leads to hacks such as the char/varchar hack, subquery hack, grouping function hack, etc.

This PR simplifies the ResolveAggregateFunctions rule and clarifies the resolution logic. To resolve aggregate functions/grouping columns in HAVING, ORDER BY and df.where, we expand the aggregate node below to output these required aggregate functions/grouping columns. In details, when resolving an expression from the parent of an aggregate node:

  1. try to resolve columns with agg.child and wrap the result with TempResolvedColumn.
  2. try to resolve subqueries with agg.child
  3. if the expression is not resolved, return it and wait for other rules to resolve it, such as resolve functions, type coercions, etc.
  4. if the expression is resolved, we transform it and push aggregate functions/grouping columns into the aggregate node below.
    4.1 the expression may already present in agg.aggregateExpressions, we can simply replace the expression with attr ref.
    4.2 if a TempResolvedColumn is neither inside an aggregate function, or wrap a grouping column, turn it back to an UnresolvedAttribute
  5. after the main resolution batch, remove all TempResolvedColumn and turn them back to UnresolvedAttribute.

Why are the changes needed?

Code cleanup

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing test

@github-actions github-actions bot added the SQL label May 7, 2021
@SparkQA
Copy link

SparkQA commented May 7, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42783/

@SparkQA
Copy link

SparkQA commented May 7, 2021

Test build #138261 has finished for PR 32470 at commit a8c708f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42803/

@SparkQA
Copy link

SparkQA commented May 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42803/

@SparkQA
Copy link

SparkQA commented May 8, 2021

Test build #138281 has finished for PR 32470 at commit c67a801.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2021

Test build #138326 has finished for PR 32470 at commit dbc00c8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42848/

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42848/

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42858/

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42858/

@SparkQA
Copy link

SparkQA commented May 10, 2021

Test build #138336 has finished for PR 32470 at commit ab31ab1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42880/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42880/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Test build #138358 has finished for PR 32470 at commit c0bb807.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Test build #139609 has finished for PR 32470 at commit e33c0a8.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan force-pushed the agg2 branch 2 times, most recently from 48e01a6 to d08d8e4 Compare June 10, 2021 03:47
@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44136/

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44136/

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44137/

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44137/

@SparkQA
Copy link

SparkQA commented Jun 10, 2021

Test build #139612 has finished for PR 32470 at commit d08d8e4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only change is the removed alias

@SparkQA
Copy link

SparkQA commented Jun 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44242/

@SparkQA
Copy link

SparkQA commented Jun 12, 2021

Test build #139716 has finished for PR 32470 at commit d5f2259.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44242/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The children of the resolved subquery that are not aggregate expressions should also be wrapped in TempResolvedColumn (but it seems the current logic can't handle this case with proper error messages either. This can be a follow-up in the future)

select c1 from t1 group by c1 having (select sum(c2) from t2 where t1.c2 = t2.c1) > 0
-- org.apache.spark.sql.AnalysisException: Resolved attribute(s) c2#10 missing from c1#9 in operator
-- !Filter (scalar-subquery#8 [c2#10] > cast(0 as bigint)).;

@SparkQA
Copy link

SparkQA commented Jun 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44335/

@SparkQA
Copy link

SparkQA commented Jun 15, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44335/

@SparkQA
Copy link

SparkQA commented Jun 15, 2021

Test build #139808 has finished for PR 32470 at commit b362a09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

@viirya @maropu Can you take one more look?

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master!

@cloud-fan cloud-fan closed this in a2961dd Jun 16, 2021
@maropu
Copy link
Member

maropu commented Jun 16, 2021

late lgtm.

peter-toth added a commit that referenced this pull request Aug 5, 2025
### What changes were proposed in this pull request?

This is an alternative PR to #51810 to fix a regresion introduced in Spark 3.2 with #32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

### Why are the changes needed?

Fix a correctness isue described in #51810.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a correctness issue.

### How was this patch tested?

Added new UT from #51810.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51820 from peter-toth/SPARK-53094-fix-cube-having.

Lead-authored-by: Peter Toth <[email protected]>
Co-authored-by: harris233 <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
peter-toth added a commit to peter-toth/spark that referenced this pull request Aug 5, 2025
…uses

This is an alternative PR to apache#51810 to fix a regresion introduced in Spark 3.2 with apache#32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

Fix a correctness isue described in apache#51810.

Yes, it fixes a correctness issue.

Added new UT from apache#51810.

No.

Closes apache#51820 from peter-toth/SPARK-53094-fix-cube-having.

Lead-authored-by: Peter Toth <[email protected]>
Co-authored-by: harris233 <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
peter-toth added a commit to peter-toth/spark that referenced this pull request Aug 5, 2025
…uses

This is an alternative PR to apache#51810 to fix a regresion introduced in Spark 3.2 with apache#32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

Fix a correctness isue described in apache#51810.

Yes, it fixes a correctness issue.

Added new UT from apache#51810.

No.

Closes apache#51820 from peter-toth/SPARK-53094-fix-cube-having.

Lead-authored-by: Peter Toth <[email protected]>
Co-authored-by: harris233 <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
peter-toth added a commit that referenced this pull request Aug 6, 2025
…uses

### What changes were proposed in this pull request?

This is an alternative PR to #51810 to fix a regresion introduced in Spark 3.2 with #32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

### Why are the changes needed?

Fix a correctness isue described in #51810.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a correctness issue.

### How was this patch tested?

Added new UT from #51810.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51854 from peter-toth/SPARK-53094-fix-cube-having-4.0.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
peter-toth added a commit that referenced this pull request Aug 6, 2025
…uses

### What changes were proposed in this pull request?

This is an alternative PR to #51810 to fix a regresion introduced in Spark 3.2 with #32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

### Why are the changes needed?

Fix a correctness isue described in #51810.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a correctness issue.

### How was this patch tested?

Added new UT from #51810.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51855 from peter-toth/SPARK-53094-fix-cube-having-3.5.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…uses

### What changes were proposed in this pull request?

This is an alternative PR to apache#51810 to fix a regresion introduced in Spark 3.2 with apache#32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

### Why are the changes needed?

Fix a correctness isue described in apache#51810.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a correctness issue.

### How was this patch tested?

Added new UT from apache#51810.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#51854 from peter-toth/SPARK-53094-fix-cube-having-4.0.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants