Skip to content

Conversation

@gatorsmile
Copy link
Member

More predicates in join conditions/filters can be inferred and pushed down via predicate transitivity. More predicate pushdown could greatly improve the join performance.

For example, we can infer the extra predicate upperCaseData.N = 3 in the following query:

"SELECT * 
 FROM upperCaseData JOIN lowerCaseData 
 WHERE lowerCaseData.n = upperCaseData.N AND lowerCaseData.n = 3"

Before the improvement, the optimized logical plan is

== Optimized Logical Plan ==
Project [N#16,L#17,n#18,l#19]
+- Join Inner, Some((n#18 = N#16))
   :- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at BeforeAndAfterAll.scala:187
   +- Filter (n#18 = 3)
      +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at BeforeAndAfterAll.scala:187

After the improvement, the optimized logical plan should be like

== Optimized Logical Plan ==
Project [N#16,L#17,n#18,l#19]
+- Join Inner, Some((n#18 = N#16))
   :- Filter (N#16 = 3)
   :  +- LogicalRDD [N#16,L#17], MapPartitionsRDD[17] at beforeAll at BeforeAndAfterAll.scala:187
   +- Filter (n#18 = 3)
      +- LogicalRDD [n#18,l#19], MapPartitionsRDD[19] at beforeAll at BeforeAndAfterAll.scala:187

@SparkQA
Copy link

SparkQA commented Dec 27, 2015

Test build #48355 has finished for PR 10490 at commit 918ea2c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

Thanks, close it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants