-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12503] [SQL] Pushing Limit Through Union All #10451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
01e4cdf
6835704
9180687
b38a21e
d2b84af
fda8025
ac0dccd
6e0018b
0546772
b37a64f
661260b
2dfa0fd
d929d9b
4070d2f
38dcfb2
cb3fc83
8dbacc7
41b9172
56fd782
b5ac8d7
77105e3
7f25d91
ae59f42
3ccf3bd
004ed66
6998ec9
09a5672
358d62e
2823a57
10d570c
cfbeea7
ca5c104
56f0c16
62d5cbe
7cf955f
7899312
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -86,8 +86,8 @@ object SamplePushDown extends Rule[LogicalPlan] { | |
| * Operations that are safe to pushdown are listed as follows. | ||
| * Union: | ||
| * Right now, Union means UNION ALL, which does not de-duplicate rows. So, it is | ||
| * safe to pushdown Filters and Projections through it. Once we add UNION DISTINCT, | ||
| * we will not be able to pushdown Projections. | ||
| * safe to pushdown Filters, Projections and Limits through it. Once we add UNION DISTINCT, | ||
| * we will not be able to pushdown Projections and Limits. | ||
| * | ||
| * Intersect: | ||
| * It is not safe to pushdown Projections through it because we need to get the | ||
|
|
@@ -153,6 +153,17 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { | |
| ) | ||
| ) | ||
|
|
||
| // Adding extra Limit below UNION ALL iff both left and right childs are not Limit and no Limit | ||
| // was pushed down before. This heuristic is valid assuming there does not exist any Limit | ||
| // push-down rule that is unable to infer the value of maxRows. Any operator that a Limit can | ||
| // be pushed passed should override this function. | ||
| case Limit(exp, Union(left, right)) | ||
| if left.maxRows.isEmpty || right.maxRows.isEmpty => | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason to not check left and right separately?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Below is the example. If one side has a limit child/descendant, we still can push it down to reduce the number of returned rows. |
||
| Limit(exp, | ||
| Union( | ||
| Limit(exp, left), | ||
| Limit(exp, right))) | ||
|
|
||
| // Push down deterministic projection through UNION ALL | ||
| case p @ Project(projectList, u @ Union(left, right)) => | ||
| if (projectList.forall(_.deterministic)) { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -90,6 +90,13 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { | |
| Statistics(sizeInBytes = children.map(_.statistics.sizeInBytes).product) | ||
| } | ||
|
|
||
| /** | ||
| * Returns the limited number of rows to be returned. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Specify that any operator that a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And, thus, we should fix
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. Thanks!
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually we will push down
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we need to override the function in both directions. Let me update the comments. Thanks! The value of |
||
| * | ||
| * Any operator that a Limit can be pushed passed should override this function. | ||
| */ | ||
| def maxRows: Option[Expression] = None | ||
|
|
||
| /** | ||
| * Returns true if this expression and all its children have been resolved to a specific schema | ||
| * and false if it still contains any unresolved placeholders. Implementations of LogicalPlan | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is
left.maxRows.isEmptyequal to!left.isInstanceOf[Limit]?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think this branch is safe without this check, did I miss something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to avoid double pushdown even if the limit has been pushed past another operator (i.e. a project).