-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection #29074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| leftKeys, rightKeys, UnknownPartitioning(0), rightPartitioning)) | ||
| case (_, HashPartitioning(rightExpressions, _)) => | ||
| reorder(leftKeys.toIndexedSeq, rightKeys.toIndexedSeq, rightExpressions, rightKeys) | ||
| .orElse(reorderJoinKeysRecursively( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be also implemented by looking at left partitioning first then move to the right partitionoing:
(leftPartitioning, rightPartitioning) match {
case (HashPartitioning(leftExpressions, _), _) =>
reorder(leftKeys.toIndexedSeq, rightKeys.toIndexedSeq, leftExpressions, leftKeys)
.orElse(reorderJoinKeysRecursively(
leftKeys, rightKeys, UnknownPartitioning(0), rightPartitioning))
case (PartitioningCollection(partitionings), _) =>
partitionings.foreach { p =>
reorderJoinKeysRecursively(leftKeys, rightKeys, p, rightPartitioning).map { k =>
return Some(k)
}
}
reorderJoinKeysRecursively(leftKeys, rightKeys, UnknownPartitioning(0), rightPartitioning)
case (_, HashPartitioning(rightExpressions, _)) =>
reorder(leftKeys.toIndexedSeq, rightKeys.toIndexedSeq, rightExpressions, rightKeys)
case (_, PartitioningCollection(partitionings)) =>
partitionings.foreach { p =>
reorderJoinKeysRecursively(leftKeys, rightKeys, leftPartitioning, p).map { k =>
return Some(k)
}
}
None
case _ =>
None
}However, I chose this way so that the behavior remains the same. If you have leftPartitioning = PartitioningCollection and rightPartitioning = HashPartitioning, it will match the rightPartitioning first, which is the existing behavior.
|
Test build #125704 has finished for PR 29074 at commit
|
|
retest this please |
|
Test build #125722 has finished for PR 29074 at commit
|
|
cc: @maropu @cloud-fan |
|
also cc: @viirya |
|
Test build #125890 has finished for PR 29074 at commit
|
|
Test build #125917 has finished for PR 29074 at commit
|
|
Gentle ping @cloud-fan / @maropu / @viirya |
|
Test build #126928 has finished for PR 29074 at commit
|
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Outdated
Show resolved
Hide resolved
|
Test build #127164 has finished for PR 29074 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay except for the existing minor comments. @cloud-fan @viirya
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/exchange/EnsureRequirementsSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/exchange/EnsureRequirementsSuite.scala
Outdated
Show resolved
Hide resolved
|
Test build #127213 has finished for PR 29074 at commit
|
|
retest this please |
|
Test build #127775 has finished for PR 29074 at commit
|
c21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @imback82 for adding this. This is important for complex bucketed-related queries to avoid shuffle. LGTM, just a nit.
sql/core/src/test/scala/org/apache/spark/sql/execution/exchange/EnsureRequirementsSuite.scala
Outdated
Show resolved
Hide resolved
|
Test build #127850 has finished for PR 29074 at commit
|
|
Gentle ping. @cloud-fan, do you think this PR can move forward? Thanks in advance! |
|
retest this please |
|
Test build #128455 has finished for PR 29074 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #129183 has finished for PR 29074 at commit
|
| case (Some(PartitioningCollection(partitionings)), _) => | ||
| partitionings.foreach { p => | ||
| reorderJoinKeysRecursively(leftKeys, rightKeys, Some(p), rightPartitioning).map { k => | ||
| return Some(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
partitionings.foldLeft(None) { (res, p) =>
res.orElse(reorderJoinKeysRecursively...)
}.getOrElse(reorderJoinKeysRecursively(leftKeys, rightKeys, None, rightPartitioning))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated.
| ShuffleExchangeExec(HashPartitioning(rightPartitioningExpressions, _), _, _), _), _) => | ||
| assert(leftKeys !== smjExec1.leftKeys) | ||
| assert(rightKeys !== smjExec1.rightKeys) | ||
| assert(leftKeys === leftPartitionings.head.asInstanceOf[HashPartitioning].expressions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we simply check leftKeys === Seq(exprA, exprB)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I simplified the checks in this test.
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except some minor comments
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #129483 has finished for PR 29074 at commit
|
| res.orElse(reorderJoinKeysRecursively(leftKeys, rightKeys, Some(p), rightPartitioning)) | ||
| }.orElse(reorderJoinKeysRecursively(leftKeys, rightKeys, None, rightPartitioning)) | ||
| case (_, Some(PartitioningCollection(partitionings))) => | ||
| partitionings.foreach { p => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do the same refactor here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| case SortMergeJoinExec(leftKeys, rightKeys, _, _, | ||
| SortExec(_, _, DummySparkPlan(_, _, _: PartitioningCollection, _, _), _), | ||
| SortExec(_, _, ShuffleExchangeExec(_: HashPartitioning, _, _), _), _) => | ||
| assert(leftKeys !== smjExec1.leftKeys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this check needed? We already check leftKeys === Seq(exprA, exprB) and it's obvious that leftKeys !== smjExec1.leftKeys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #129523 has finished for PR 29074 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
This PR proposes to improve
EnsureRquirement.reorderJoinKeysto handle the following scenarios:HashPartitioning, consider the right-sideHashPartitioning.PartitioningCollection, which may containHashPartitioningWhy are the changes needed?
HashPartitioningor the right-sideHashPartitioning. This means that if both sides areHashPartitioning, it will try to match only the left side.The following will not consider the right-side
HashPartitioning:PartitioningCollection:Does this PR introduce any user-facing change?
Yes, now from the above examples, the shuffle/sort nodes pointed by
This can be removedare now removed:How was this patch tested?
Added tests.