Skip to content

Conversation

@heary-cao
Copy link
Contributor

What changes were proposed in this pull request?

Currently, PushProjectionThroughUnion optimizer only supports pushdown project operator to both sides of a Union operator when expression is deterministic , in fact, we can be like pushdown filters, also support pushdown project operator to both sides of a Union operator when expression is non-deterministic , this PR description fix this problem。now the explain looks like:

=== Applying Rule org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion ===
Input LogicalPlan:
Project [a#0, rand(10) AS rnd#9]
+- Union
   :- LocalRelation <empty>, [a#0, b#1, c#2]
   :- LocalRelation <empty>, [d#3, e#4, f#5]
   +- LocalRelation <empty>, [g#6, h#7, i#8]

Output LogicalPlan:
Project [a#0, rand(10) AS rnd#9]
+- Union
   :- Project [a#0]
   :  +- LocalRelation <empty>, [a#0, b#1, c#2]
   :- Project [d#3]
   :  +- LocalRelation <empty>, [d#3, e#4, f#5]
   +- Project [g#6]
      +- LocalRelation <empty>, [g#6, h#7, i#8]

How was this patch tested?

add new test cases

@heary-cao
Copy link
Contributor Author

@gatorsmile ,@cloud-fan Can you help me to review it. thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we push a + 1 to union children? or just a?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we push a + 1 to union, just a + 1.
seems this lack of test cases. let's add this test cases. thanks.

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Feb 8, 2018

Test build #87210 has finished for PR 20541 at commit 36dbc9c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao heary-cao force-pushed the PushProjectionThroughUnion branch from 36dbc9c to 4f5d46b Compare February 9, 2018 01:42
@cloud-fan
Copy link
Contributor

I'm confused about why we need PushProjectionThroughUnion. Generally we only need to push down required columns, not entire project list, as there is no benifit of doing this. I think we just need to handle Union in the ColumnPruning rule, but I may miss something. cc @gatorsmile

@SparkQA
Copy link

SparkQA commented Feb 9, 2018

Test build #87237 has finished for PR 20541 at commit 4f5d46b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

I think the use case is, by pushing projects into Union, we are more likely to combine adjacent Unions. So I don't think we need to improve it to push part of the project list and still leave a project above Union.

@heary-cao
Copy link
Contributor Author

heary-cao commented Mar 22, 2018

in my opinion, this is considered that PushProjectionThroughUnion optimizes rules when there are multiple columns of union in data sources, while projection requires only a few columns, and the performance of file operation is better. thanks.

@cloud-fan
Copy link
Contributor

ColumnPruning rule handles Union already.

@heary-cao
Copy link
Contributor Author

oh ,yeah, there is a little difference, for a + 1 and a + b.
for a + 1:

`PushProjectionThroughUnion `rule handles:
Union
:- Project [(a#0 + 1) AS aa#10]
:  +- LocalRelation <empty>, [a#0, b#1, c#2]
:- Project [(d#3 + 1) AS aa#11]
:  +- LocalRelation <empty>, [d#3, e#4, f#5]
+- Project [(g#6 + 1) AS aa#12]
   +- LocalRelation <empty>, [g#6, h#7, i#8]

`ColumnPruning `rule handles:
Project [(a#0 + 1) AS aa#9]
Union
:- Project [a#0]
:  +- LocalRelation <empty>, [a#0, b#1, c#2]
:- Project [d#3]
:  +- LocalRelation <empty>, [d#3, e#4, f#5]
+- Project [g#6]
   +- LocalRelation <empty>, [g#6, h#7, i#8]

for a + b:

`PushProjectionThroughUnion `rule handles:
Union
:- Project [(a#0 + b#1) AS ab#10]
:  +- LocalRelation <empty>, [a#0, b#1, c#2]
:- Project [(d#3 + e#4) AS ab#11]
:  +- LocalRelation <empty>, [d#3, e#4, f#5]
+- Project [(g#6 + h#7) AS ab#12]
   +- LocalRelation <empty>, [g#6, h#7, i#8]	

`ColumnPruning `rule handles:
Project [(a#0 + b#1) AS ab#9]
Union
:- Project [a#0, b#1]
:  +- LocalRelation <empty>, [a#0, b#1, c#2]
:- Project [d#3, e#4]
:  +- LocalRelation <empty>, [d#3, e#4, f#5]
+- Project [g#6, h#7]
   +- LocalRelation <empty>, [g#6, h#7, i#8]

So I think this may be the reason for the need to add the pushprojectionthroughunion rules. and to non-deterministic expression.

@cloud-fan
Copy link
Contributor

I don't agree. a + 1/a + b are evaluated the same number of time, no matter you push in through Union or not. I don't see any performance benefit by doing this, except you can eliminate the entire project above Union.

@heary-cao
Copy link
Contributor Author

oh, I see, I fallback to the modification of the non-deterministic expression, and to keep the newly added test cases for a+1 and a+b, can you?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heary-cao How about closing this PR and opening a new PR for adding new test cases?

@heary-cao
Copy link
Contributor Author

@gatorsmile, OK, I will do it.

@heary-cao heary-cao closed this Nov 26, 2018
asfgit pushed a commit that referenced this pull request Nov 27, 2018
…in SetOperationSuite

## What changes were proposed in this pull request?

The purpose of this PR is supplement new test cases for a + 1,a + b and Rand in SetOperationSuite.
It comes from the comment of closed PR:#20541, thanks.

## How was this patch tested?

add new test cases

Closes #23138 from heary-cao/UnionPushTestCases.

Authored-by: caoxuewen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…in SetOperationSuite

## What changes were proposed in this pull request?

The purpose of this PR is supplement new test cases for a + 1,a + b and Rand in SetOperationSuite.
It comes from the comment of closed PR:apache#20541, thanks.

## How was this patch tested?

add new test cases

Closes apache#23138 from heary-cao/UnionPushTestCases.

Authored-by: caoxuewen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants