-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32816][SQL][3.0] Fix analyzer bug when aggregating multiple distinct DECIMAL columns #30053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #129826 has finished for PR 30053 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
This PR depends on #30052 to pass tests |
…t DECIMAL columns This PR fixes a conflict between `RewriteDistinctAggregates` and `DecimalAggregates`. In some cases, `DecimalAggregates` will wrap the decimal column to `UnscaledValue` using different rules for different aggregates. This means, same distinct column with different aggregates will change to different distinct columns after `DecimalAggregates`. For example: `avg(distinct decimal_col), sum(distinct decimal_col)` may change to `avg(distinct UnscaledValue(decimal_col)), sum(distinct decimal_col)` We assume after `RewriteDistinctAggregates`, there will be at most one distinct column in aggregates, but `DecimalAggregates` breaks this assumption. To fix this, we have to switch the order of these two rules. bug fix no added test cases Closes apache#29673 from linhongliu-db/SPARK-32816. Authored-by: Linhong Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 40ef5c9)
b043f94 to
2634588
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #129876 has finished for PR 30053 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #129881 has finished for PR 30053 at commit
|
|
Test build #129882 has finished for PR 30053 at commit
|
|
@cloud-fan This PR depends on another 2 fixes in the master branch. Should we cherry-pick them? |
|
ah, that two are hard to backport as they change streaming state store. I guess we can't backport this fix either. |
|
got it. Let's close it for now |
What changes were proposed in this pull request?
This PR fixes a conflict between
RewriteDistinctAggregatesandDecimalAggregates.In some cases,
DecimalAggregateswill wrap the decimal column toUnscaledValueusingdifferent rules for different aggregates.
This means, same distinct column with different aggregates will change to different distinct columns
after
DecimalAggregates. For example:avg(distinct decimal_col), sum(distinct decimal_col)may change toavg(distinct UnscaledValue(decimal_col)), sum(distinct decimal_col)We assume after
RewriteDistinctAggregates, there will be at most one distinct column in aggregates,but
DecimalAggregatesbreaks this assumption. To fix this, we have to switch the order of these tworules.
Why are the changes needed?
bug fix
Does this PR introduce any user-facing change?
no
How was this patch tested?
added test cases
Authored-by: Linhong Liu [email protected]
Signed-off-by: Wenchen Fan [email protected]
(cherry picked from commit 40ef5c9)