-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34708][SQL] Code-gen for left semi/anti broadcast nested loop join (build right side) #31874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan and @maropu could you help take a look when you have time, thanks. |
|
retest this please |
|
Closed & reopened the PR to trigger test rerun. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| |boolean $findMatchedRow = false; | |
| |boolean $foundMatch = false; |
Maybe we can be consistent with non-codegen naming in BroadcastNestedLoopJoinExec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: (LeftSemi | LeftAnti, BuildRight)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - updated.
|
Just FYI - the unit test failure here is legit, and I found there's a bug around LIMIT codegen - https://issues.apache.org/jira/browse/SPARK-34796 . Will submit a PR to fix that first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case, we don't need to add a mutable state for buildRowArrayTerm ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already added inside prepareBroadcast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my qestion is wrong. I meant, is Ah, nvm, it looks fine.buildRowArrayTerm referenced in this case?
c21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments and the PR is ready to review again, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, updated.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #136300 has finished for PR 31874 at commit
|
|
Refer to this link for build results (access rights to CI server needed): |
|
thanks, merging to master! |
|
Thank you @cloud-fan, @maropu and @linzebing for review! |
What changes were proposed in this pull request?
This PR is to add code-gen support for left semi / left anti BroadcastNestedLoopJoin (build side is right side). The execution code path for build left side cannot fit into whole stage code-gen framework, so only add the code-gen for build right side here.
Reference: the iterator (non-code-gen) code path is
BroadcastNestedLoopJoinExec.leftExistenceJoin()withBuildRight.Why are the changes needed?
Improve query CPU performance.
Tested with a simple query:
Seeing 5x run time improvement:
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Changed existing unit test in
ExistenceJoinSuite.scalato cover all code paths:Added unit test in
WholeStageCodegenSuite.scalato make sure code-gen for broadcast nested loop join is taking effect, and test for multiple join case as well.Example query:
Example generated code (
bnlj_doConsume_0method):This is for left semi join. The generated code for left anti join is mostly to be same as here, except L55 to be
if (bnlj_findMatchedRow_0 == false) {.