-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15616] [SQL] Metastore relation should fallback to HDFS size of partitions that are involved in Query for JoinSelection. #13373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #59552 has finished for PR 13373 at commit
|
|
Test build #59555 has finished for PR 13373 at commit
|
|
Test build #60017 has finished for PR 13373 at commit
|
|
Test build #60018 has finished for PR 13373 at commit
|
|
Test build #60044 has finished for PR 13373 at commit
|
|
Test build #60094 has finished for PR 13373 at commit
|
|
Test build #60110 has finished for PR 13373 at commit
|
|
Test build #62707 has finished for PR 13373 at commit
|
|
Test build #62710 has finished for PR 13373 at commit
|
|
Test build #62711 has finished for PR 13373 at commit
|
|
Test build #62704 has finished for PR 13373 at commit
|
|
Test build #62713 has finished for PR 13373 at commit
|
|
Test build #62715 has finished for PR 13373 at commit
|
|
Test build #63612 has finished for PR 13373 at commit
|
|
Test build #63614 has finished for PR 13373 at commit
|
…on_broadcast # Conflicts: # sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala
|
Test build #67765 has finished for PR 13373 at commit
|
|
Test build #67768 has finished for PR 13373 at commit
|
|
Test build #67771 has finished for PR 13373 at commit
|
|
Test build #67777 has finished for PR 13373 at commit
|
|
@lianhuiwang, I understand it is painful to keep the PR up-to-date. However, shouldn't we probably have the Jenkins build passed at the last even if it has conflicts? |
|
Sorry that I think it's not valid anymore after we have |
|
@cloud-fan I do not think that PruneFileSourcePartitions rule is for Hive's CatalogRelation. example in this PR with master branch cannot get expected result. So i will update it with the latest code. |
|
@HyukjinKwon @cloud-fan I will close this PR and create new PR #18193 for it. Thanks. |
What changes were proposed in this pull request?
Currently if some partitions of a partitioned table are used in join operation we rely on Metastore returned size of table to calculate if we can convert the operation to Broadcast join.
if Filter can prune some partitions, Hive can prune partition before determining to use broadcast joins according to HDFS size of partitions that are involved in Query.So sparkSQL needs it that can improve join's performance for partitioned table.
How was this patch tested?
integration tests