Skip to content

Conversation

@linhongliu-db
Copy link
Contributor

What changes were proposed in this pull request?

For queries with multiple foldable distinct columns, since they will be eliminated during
execution, it's not mandatory to let RewriteDistinctAggregates handle this case. And
in the current code, RewriteDistinctAggregates dose miss some "aggregating with
multiple foldable distinct expressions" cases.
For example: select count(distinct 2), count(distinct 2, 3) will be missed.

But in the planner, this will trigger an error that "multiple distinct expressions" are not allowed.
As the foldable distinct columns can be eliminated finally, we can allow this in the aggregation
planner check.

Why are the changes needed?

bug fix

Does this PR introduce any user-facing change?

No

How was this patch tested?

added test case

@linhongliu-db linhongliu-db changed the title Allow aggregating multiple foldable distinct expressions [SPARK-32761][SQL] Allow aggregating multiple foldable distinct expressions Sep 1, 2020
Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@HyukjinKwon
Copy link
Member

add to whitelist

@SparkQA
Copy link

SparkQA commented Sep 1, 2020

Test build #128133 has finished for PR 29607 at commit 8ab8955.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Sep 1, 2020

thanks, merging to master!

@cloud-fan cloud-fan closed this in a410658 Sep 1, 2020
@linhongliu-db linhongliu-db deleted the SPARK-32761 branch September 1, 2020 13:05
linhongliu-db added a commit to linhongliu-db/spark that referenced this pull request Oct 15, 2020
…ssions

### What changes were proposed in this pull request?
For queries with multiple foldable distinct columns, since they will be eliminated during
execution, it's not mandatory to let `RewriteDistinctAggregates` handle this case. And
in the current code, `RewriteDistinctAggregates` *dose* miss some "aggregating with
multiple foldable distinct expressions" cases.
For example: `select count(distinct 2), count(distinct 2, 3)` will be missed.

But in the planner, this will trigger an error that "multiple distinct expressions" are not allowed.
As the foldable distinct columns can be eliminated finally, we can allow this in the aggregation
planner check.

### Why are the changes needed?
bug fix

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added test case

Closes apache#29607 from linhongliu-db/SPARK-32761.

Authored-by: Linhong Liu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a410658)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants