-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-38768][SQL] Remove Limit from plan if complete push down limit to data source.
#36043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ve one partition, DS V2 should not do limit again
Limit from plan if complete push down limit to data source.
|
ping @huaxingao cc @cloud-fan |
| globalLimit.child.asInstanceOf[LocalLimit].withNewChildren(Seq(newChild)) | ||
| globalLimit.withNewChildren(Seq(newLocalLimit)) | ||
| } else { | ||
| newChild |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a problem here. If isPartiallyPushed is false, it is assumed that Limit is completely pushed down so Spark doesn't do Limit any more. However, the isPartiallyPushed false could come from the default case in PushDownUtils.pushLimit
def pushLimit(scanBuilder: ScanBuilder, limit: Int): (Boolean, Boolean) = {
scanBuilder match {
case s: SupportsPushDownLimit if s.pushLimit(limit) =>
(true, s.isPartiallyPushed)
case _ => (false, false)
}
}
In this case, the Limit at Spark is removed wrongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the reminder.
|
thanks, merging to master! |
|
@huaxingao @cloud-fan Thank you for the review. |
…it to data source ### What changes were proposed in this pull request? Currently, Spark supports push down limit to data source. If limit could pushed down and Data source only have one partition, DS V2 still do limit again. This PR want remove `Limit` from plan if complete push down limit to data source. ### Why are the changes needed? Improve performance. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? Tests updated. Closes apache#36043 from beliefer/SPARK-38768. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
… to data source (#474) * [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source ### What changes were proposed in this pull request? Currently, Spark supports push down limit to data source. If limit could pushed down and Data source only have one partition, DS V2 still do limit again. This PR want remove `Limit` from plan if complete push down limit to data source. ### Why are the changes needed? Improve performance. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? Tests updated. Closes apache#36043 from beliefer/SPARK-38768. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> * [SPARK-38391][SPARK-38768][SQL][FOLLOWUP] Add comments for `pushLimit` and `pushTopN` of `PushDownUtils` ### What changes were proposed in this pull request? `pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents. ### Why are the changes needed? Make DS V2 API more friendly to developers. ### Does this PR introduce _any_ user-facing change? 'No'. Just update comments. ### How was this patch tested? N/A Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> * [SPARK-37960][SQL][FOLLOWUP] Make the testing CASE WHEN query more reasonable ### What changes were proposed in this pull request? Some testing CASE WHEN queries are not carefully written and do not make sense. In the future, the optimizer may get smarter and get rid of the CASE WHEN completely, and then we loose test coverage. This PR updates some CASE WHEN queries to make them more reasonable. ### Why are the changes needed? future-proof test coverage. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? N/A Closes apache#36125 from beliefer/SPARK-37960_followup3. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> * update spark version Co-authored-by: Jiaan Geng <[email protected]>
… to data source (#474) * [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source Currently, Spark supports push down limit to data source. If limit could pushed down and Data source only have one partition, DS V2 still do limit again. This PR want remove `Limit` from plan if complete push down limit to data source. Improve performance. 'No'. New feature. Tests updated. Closes apache#36043 from beliefer/SPARK-38768. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> * [SPARK-38391][SPARK-38768][SQL][FOLLOWUP] Add comments for `pushLimit` and `pushTopN` of `PushDownUtils` `pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents. Make DS V2 API more friendly to developers. 'No'. Just update comments. N/A Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> * [SPARK-37960][SQL][FOLLOWUP] Make the testing CASE WHEN query more reasonable Some testing CASE WHEN queries are not carefully written and do not make sense. In the future, the optimizer may get smarter and get rid of the CASE WHEN completely, and then we loose test coverage. This PR updates some CASE WHEN queries to make them more reasonable. future-proof test coverage. 'No'. N/A Closes apache#36125 from beliefer/SPARK-37960_followup3. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> * update spark version Co-authored-by: Jiaan Geng <[email protected]>
… to data source (#474) * [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source Currently, Spark supports push down limit to data source. If limit could pushed down and Data source only have one partition, DS V2 still do limit again. This PR want remove `Limit` from plan if complete push down limit to data source. Improve performance. 'No'. New feature. Tests updated. Closes apache#36043 from beliefer/SPARK-38768. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> * [SPARK-38391][SPARK-38768][SQL][FOLLOWUP] Add comments for `pushLimit` and `pushTopN` of `PushDownUtils` `pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents. Make DS V2 API more friendly to developers. 'No'. Just update comments. N/A Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> * [SPARK-37960][SQL][FOLLOWUP] Make the testing CASE WHEN query more reasonable Some testing CASE WHEN queries are not carefully written and do not make sense. In the future, the optimizer may get smarter and get rid of the CASE WHEN completely, and then we loose test coverage. This PR updates some CASE WHEN queries to make them more reasonable. future-proof test coverage. 'No'. N/A Closes apache#36125 from beliefer/SPARK-37960_followup3. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> * update spark version Co-authored-by: Jiaan Geng <[email protected]>
What changes were proposed in this pull request?
Currently, Spark supports push down limit to data source.
If limit could pushed down and Data source only have one partition, DS V2 still do limit again.
This PR want remove
Limitfrom plan if complete push down limit to data source.Why are the changes needed?
Improve performance.
Does this PR introduce any user-facing change?
'No'.
New feature.
How was this patch tested?
Tests updated.