Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Apr 2, 2022

What changes were proposed in this pull request?

Currently, Spark supports push down limit to data source.
If limit could pushed down and Data source only have one partition, DS V2 still do limit again.
This PR want remove Limit from plan if complete push down limit to data source.

Why are the changes needed?

Improve performance.

Does this PR introduce any user-facing change?

'No'.
New feature.

How was this patch tested?

Tests updated.

…ve one partition, DS V2 should not do limit again
@github-actions github-actions bot added the SQL label Apr 2, 2022
@beliefer beliefer changed the title [SPARK-38768][SQL] Remove Limit from plan if complete push down limit to data source. [SPARK-38768][SQL] Remove Limit from plan if complete push down limit to data source. Apr 2, 2022
@beliefer
Copy link
Contributor Author

beliefer commented Apr 2, 2022

ping @huaxingao cc @cloud-fan

globalLimit.child.asInstanceOf[LocalLimit].withNewChildren(Seq(newChild))
globalLimit.withNewChildren(Seq(newLocalLimit))
} else {
newChild
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a problem here. If isPartiallyPushed is false, it is assumed that Limit is completely pushed down so Spark doesn't do Limit any more. However, the isPartiallyPushed false could come from the default case in PushDownUtils.pushLimit

  def pushLimit(scanBuilder: ScanBuilder, limit: Int): (Boolean, Boolean) = {
    scanBuilder match {
      case s: SupportsPushDownLimit if s.pushLimit(limit) =>
        (true, s.isPartiallyPushed)
      case _ => (false, false)
    }
  }

In this case, the Limit at Spark is removed wrongly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the reminder.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in adff37a Apr 6, 2022
@beliefer
Copy link
Contributor Author

beliefer commented Apr 7, 2022

@huaxingao @cloud-fan Thank you for the review.

chenzhx pushed a commit to chenzhx/spark that referenced this pull request Jun 1, 2022
…it to data source

### What changes were proposed in this pull request?
Currently, Spark supports push down limit to data source.
If limit could pushed down and Data source only have one partition, DS V2 still do limit again.
This PR want remove `Limit` from plan if complete push down limit to data source.

### Why are the changes needed?
Improve performance.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
Tests updated.

Closes apache#36043 from beliefer/SPARK-38768.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
chenzhx added a commit to Kyligence/spark that referenced this pull request Jun 2, 2022
… to data source (#474)

* [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source

### What changes were proposed in this pull request?
Currently, Spark supports push down limit to data source.
If limit could pushed down and Data source only have one partition, DS V2 still do limit again.
This PR want remove `Limit` from plan if complete push down limit to data source.

### Why are the changes needed?
Improve performance.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
Tests updated.

Closes apache#36043 from beliefer/SPARK-38768.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* [SPARK-38391][SPARK-38768][SQL][FOLLOWUP] Add comments for `pushLimit` and `pushTopN` of `PushDownUtils`

### What changes were proposed in this pull request?
`pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents.

### Why are the changes needed?
Make DS V2 API more friendly to developers.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update comments.

### How was this patch tested?
N/A

Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* [SPARK-37960][SQL][FOLLOWUP] Make the testing CASE WHEN query more reasonable

### What changes were proposed in this pull request?
Some testing CASE WHEN queries are not carefully written and do not make sense. In the future, the optimizer may get smarter and get rid of the CASE WHEN completely, and then we loose test coverage.

This PR updates some CASE WHEN queries to make them more reasonable.

### Why are the changes needed?
future-proof test coverage.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
N/A

Closes apache#36125 from beliefer/SPARK-37960_followup3.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* update spark version

Co-authored-by: Jiaan Geng <[email protected]>
leejaywei pushed a commit to Kyligence/spark that referenced this pull request Jul 14, 2022
… to data source (#474)

* [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source

Currently, Spark supports push down limit to data source.
If limit could pushed down and Data source only have one partition, DS V2 still do limit again.
This PR want remove `Limit` from plan if complete push down limit to data source.

Improve performance.

'No'.
New feature.

Tests updated.

Closes apache#36043 from beliefer/SPARK-38768.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* [SPARK-38391][SPARK-38768][SQL][FOLLOWUP] Add comments for `pushLimit` and `pushTopN` of `PushDownUtils`

`pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents.

Make DS V2 API more friendly to developers.

'No'.
Just update comments.

N/A

Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* [SPARK-37960][SQL][FOLLOWUP] Make the testing CASE WHEN query more reasonable

Some testing CASE WHEN queries are not carefully written and do not make sense. In the future, the optimizer may get smarter and get rid of the CASE WHEN completely, and then we loose test coverage.

This PR updates some CASE WHEN queries to make them more reasonable.

future-proof test coverage.

'No'.

N/A

Closes apache#36125 from beliefer/SPARK-37960_followup3.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* update spark version

Co-authored-by: Jiaan Geng <[email protected]>
RolatZhang pushed a commit to Kyligence/spark that referenced this pull request Aug 29, 2023
… to data source (#474)

* [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source

Currently, Spark supports push down limit to data source.
If limit could pushed down and Data source only have one partition, DS V2 still do limit again.
This PR want remove `Limit` from plan if complete push down limit to data source.

Improve performance.

'No'.
New feature.

Tests updated.

Closes apache#36043 from beliefer/SPARK-38768.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* [SPARK-38391][SPARK-38768][SQL][FOLLOWUP] Add comments for `pushLimit` and `pushTopN` of `PushDownUtils`

`pushLimit` and `pushTopN` of `PushDownUtils` returns tuple of boolean. It will be good to explain what the boolean value represents.

Make DS V2 API more friendly to developers.

'No'.
Just update comments.

N/A

Closes apache#36092 from beliefer/SPARK-38391_SPARK-38768_followup.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* [SPARK-37960][SQL][FOLLOWUP] Make the testing CASE WHEN query more reasonable

Some testing CASE WHEN queries are not carefully written and do not make sense. In the future, the optimizer may get smarter and get rid of the CASE WHEN completely, and then we loose test coverage.

This PR updates some CASE WHEN queries to make them more reasonable.

future-proof test coverage.

'No'.

N/A

Closes apache#36125 from beliefer/SPARK-37960_followup3.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

* update spark version

Co-authored-by: Jiaan Geng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants