-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28746][SQL] Add partitionby hint for sql queries #25464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
merge master
pull master
|
@gatorsmile @maryannxue We need this? |
|
I'm not against it, but is it possible to extend the existing "repartition" hint grammar to achieve this? |
|
Yea, that approach looks more reasonable to me. Could you brush up the code based on the suggestion? @ulysses-you |
|
Thanks for review this. Make hint consistent with repartition api is a good idea. |
|
retest this please |
|
ok to test |
|
Test build #109600 has finished for PR 25464 at commit
|
| RepartitionByExpression( | ||
| exprs.map(_.asInstanceOf[UnresolvedAttribute]), h.child, numPartitions) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the case, SELECT /*+ REPARTITION(a) */ * FROM t?
| if (errExprs.nonEmpty) throw new AnalysisException( | ||
| s"""Invalid type exprs : $errExprs | ||
| |expects UnresolvedAttribute type | ||
| """.stripMargin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz add tests for this exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will add this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't add this test yet?
|
Test build #109603 has finished for PR 25464 at commit
|
|
Test build #109615 has finished for PR 25464 at commit
|
|
retest this please |
1 similar comment
|
retest this please |
|
Test build #109720 has finished for PR 25464 at commit
|
|
retet this please |
|
retest this please |
|
oh.. typo.. |
|
retest this please |
|
Test build #112637 has finished for PR 25464 at commit
|
|
retest this please |
|
Test build #112639 has finished for PR 25464 at commit
|
|
Test build #112646 has finished for PR 25464 at commit
|
|
retest this please |
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Anyone could do final checks? @cloud-fan @HyukjinKwon
|
Test build #112749 has finished for PR 25464 at commit
|
| throw new AnalysisException(s"$hintName Hint parameter should include columns, but " + | ||
| s"${invalidParams.mkString(", ")} found") | ||
| } | ||
| RepartitionByExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we then consistently throw an exception like Dataset.repartition?
val sortOrders = partitionExprs.filter(_.expr.isInstanceOf[SortOrder])
if (sortOrders.nonEmpty) throw new IllegalArgumentException(
s"""Invalid partitionExprs specified: $sortOrders
|For range partitioning use repartitionByRange(...) instead.
""".stripMargin)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, add an IllegalArgumentException check.
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too otherwise.
|
Test build #112849 has finished for PR 25464 at commit
|
|
Thanks for the contribution, @ulysses-you ! Merged to master. |
|
FYI: I added @ulysses-you in the Spark contributor list. |
|
Thanks for great help ! @maropu @HyukjinKwon @cloud-fan @maryannxue |
| createRepartition(shuffle = false, hint) | ||
| case "REPARTITION_BY_RANGE" => | ||
| createRepartitionByRange(hint) | ||
| case _ => plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we return hint here? This will cause stack overflow once the hint is not the root node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. I will make a followup. Thanks for catching this.
…unknown hint resolution ### What changes were proposed in this pull request? This is rather a followup of #25464 (see https://github.com/apache/spark/pull/25464/files#r349543286) It will cause an infinite recursion via mapping children - we should return the hint rather than its parent plan in unknown hint resolution. ### Why are the changes needed? Prevent Stack over flow during hint resolution. ### Does this PR introduce any user-facing change? Yes, it avoids stack overflow exception It was caused by #25464 and this is only in the master. No behaviour changes to end users as it happened only in the master. ### How was this patch tested? Unittest was added. Closes #26642 from HyukjinKwon/SPARK-30003. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
| s"""Invalid partitionExprs specified: $sortOrders | ||
| |For range partitioning use REPARTITION_BY_RANGE instead. | ||
| """.stripMargin) | ||
| val invalidParams = partitionExprs.filter(!_.isInstanceOf[UnresolvedAttribute]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check breaks the old API, in Spark 2.4 it is possible to use an expression here. I think we need to back this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK 2.4 only supports something like REPARTITION(5), the parameters here means anything after the partition number parameter, e.g. REPARTITION(5, para1, para2, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I think so, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, you are right. Never the less I think we should support expressions for REPARTITION here.
| def createRepartitionByExpression( | ||
| numPartitions: Int, partitionExprs: Seq[Any]): RepartitionByExpression = { | ||
| val sortOrders = partitionExprs.filter(_.isInstanceOf[SortOrder]) | ||
| if (sortOrders.nonEmpty) throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style, please put this inside curly braces and on a new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulysses-you Could you do follow-up for the two comments from @hvanhovell ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it might be better to just handle this later when we happen to touch this codes given that we don't usually make followups for minor styles issues.
|
|
||
| def createRepartitionByExpression( | ||
| numPartitions: Int, partitionExprs: Seq[Any]): RepartitionByExpression = { | ||
| val invalidParams = partitionExprs.filter(!_.isInstanceOf[UnresolvedAttribute]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
What changes were proposed in this pull request?
Now,
RepartitionByExpressionis allowed at Dataset methodDataset.repartition(). But in spark sql, we do not have an equivalent functionality.In hive, we can use
distribute by, so it's worth to add a hint to support such function.Similar jira SPARK-24940
Why are the changes needed?
Make repartition hints consistent with repartition api .
Does this PR introduce any user-facing change?
This pr intends to support quries below;
How was this patch tested?
UT