-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20194] Add support for partition pruning to in-memory catalog #17510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
| _.references.map(_.name).toSet.subsetOf(partitionColumnNames) | ||
| } | ||
| if (nonPartitionPruningPredicates.nonEmpty) { | ||
| sys.error("Expected only partition pruning predicates: " + nonPartitionPruningPredicates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Throwing an AnalysisException is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the negative test cases in your newly added test cases?
| } | ||
|
|
||
| test("list partitions by filter") { | ||
| val tz = TimeZone.getDefault().getID() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: val tz = TimeZone.getDefault.getID
|
Not related to this PR. It sounds like we have a bug in |
|
Test build #75461 has finished for PR 17510 at commit
|
|
Test build #75464 has finished for PR 17510 at commit
|
|
LGTM cc @cloud-fan @hvanhovell @ericl |
| if (predicates.nonEmpty) { | ||
| val clientPrunedPartitions = client.getPartitionsByFilter(rawTable, predicates).map { part => | ||
| val clientPrunedPartitions = | ||
| client.getPartitionsByFilter(rawTable, predicates).map { part => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if predicates.isEmpty, the previous code will run client.getPartitions. Can you double check there is no performance regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar optimization is done in the function itself: client.getPartitionsByFilter(), while Hive Shim_v0_12 delegates to getAllPartitions anyway.
I've now made the only piece of non-trivial code along that path lazy, so I think we're good.
| val catalog = newBasicCatalog() | ||
|
|
||
| def checkAnswer(table: CatalogTable, filters: Seq[Expression], | ||
| expected: Set[CatalogTablePartition]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: code style
def checkAnswer(
param1: XX, param2: XX, ...
|
LGTM except some minor comments |
|
Test build #75479 has finished for PR 17510 at commit
|
|
Thanks! Merging to master. |
|
Thanks for moving so fast! |
What changes were proposed in this pull request?
This patch implements
listPartitionsByFilter()forInMemoryCatalogand thus resolves an outstanding TODO causing thePruneFileSourcePartitionsoptimizer rule not to apply when "spark.sql.catalogImplementation" is set to "in-memory" (which is the default).The change is straightforward: it extracts the code for further filtering of the list of partitions returned by the metastore's
getPartitionsByFilter()out fromHiveExternalCatalogintoExternalCatalogUtilsand calls this new function fromInMemoryCatalogon the whole list of partitions.Now that this method is implemented we can always pass the
CatalogTableto theDataSourceinFindDataSourceTable, so that the latter is resolved to a relation with aCatalogFileIndex, which is what thePruneFileSourcePartitionsrule matches for.How was this patch tested?
Ran existing tests and added new test for
listPartitionsByFilterinExternalCatalogSuite, which is subclassed by bothInMemoryCatalogSuiteandHiveExternalCatalogSuite.