-
Notifications
You must be signed in to change notification settings - Fork 554
[GLUTEN-3559][CORE] Fix output partitioning of Hash join #4342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-3559][CORE] Fix output partitioning of Hash join #4342
Conversation
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
|
@JkSelf @zhli1142015 The build is green. Kindly review. |
| case BuildLeft => | ||
| joinType match { | ||
| case _: InnerLike | RightOuter => right.outputPartitioning | ||
| case _: InnerLike | RightOuter => expandPartitioning(right.outputPartitioning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashJoinLikeExecTransformer is also extended by ShuffledHashJoinExecTransformer. Do we need this fix for SHJ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was looking into that, this change will help it there also so I think we can keep it for SHJ also.
|
it seems the code is difference since Spark3.4, shall we move this to shim module ? |
| } | ||
|
|
||
| // https://issues.apache.org/jira/browse/SPARK-31869 | ||
| // ToDo: https://issues.apache.org/jira/browse/SPARK-45882 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a ToDO here, for this https://issues.apache.org/jira/browse/SPARK-45882 we would need to add in shim. Shall that be taken in separate PR?
This PR make the changes which were added in spark 3.1 apache/spark#28676
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulysses-you Shall we move this to shim module as part of this in separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to solve this todo in this pr, as we need to move the whole code to shim module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ulysses-you I will update the PR
84f5dfe to
90aec14
Compare
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
ulysses-you
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @ayushi-agarwal , lgtm. cc @rui-mo if you have other comments
|
LGTM |
|
Thank you @zzcclp @ulysses-you @zhli1142015 for reviewing. |
What changes were proposed in this pull request?
As part of this ticket https://issues.apache.org/jira/browse/SPARK-31869 an improvement was added which can decrease the number of exchanges. As output partitioning is overridden in gluten, the test was failing as extra exchange was coming.
(Fixes: #3559)
How was this patch tested?
Ran UT locally.