-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16879][SQL] unify logical plans for CREATE TABLE and CTAS #14482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #63176 has finished for PR 14482 at commit
|
|
is it a MIMA issue? Although I moved the |
|
SaveMode is a public API. We cannot move it to catalyst. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this somewhere we can reuse it.
| // * partition, bucket and sort column names must exist in table definition. | ||
| // * partition, bucket and sort column names can't be duplicated. | ||
| // * can't use all table columns as partition columns. | ||
| // * partition columns' type must be AtomicType. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @gatorsmile , I think all this checks are general and can be applied to hive serde tables too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these checking are general to both types. The only issue is here:
partition, bucket and sort column names must exist in table definition.
However, the schema value from the parser combines both. Thus, Hive table creation pass the checks
val schema = StructType(dataCols ++ partitionCols)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because hive CREATE TABLE syntax is special, it will never hit this exception partition column names must exist in table definition. Instead, hive may specify partition column names that exist in table definition, and will hit column names in table definition can't be duplicated.
|
cc @gatorsmile , IIRC, you have some PRs about error handling. After this PR, we can have a centre place for basic error handling, is it good enough for all the error cases you found? |
| // Since we are saving table metadata to metastore, we should make sure the table name | ||
| // and database name don't break some common restrictions, e.g. special chars except | ||
| // underscore are not allowed. | ||
| val pattern = Pattern.compile("[\\w_]+") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @hvanhovell , I think this is the only place that we need this check, as CreateTable is the only plan that can save a table metadata into metastore. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiling a pattern can be expensive and the Pattern produced is reusable; that is all.
|
Test build #63209 has finished for PR 14482 at commit
|
|
Test build #63211 has finished for PR 14482 at commit
|
| tableType = CatalogTableType.MANAGED, | ||
| storage = CatalogStorageFormat.empty.copy(properties = options), | ||
| schema = schema.getOrElse(new StructType), | ||
| provider = Some(provider), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still possible that a provide is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, just look at the codes around, it's possible that provider is null. However, if we also look at the antlr file, the provider must be specified. The previous code doesn't check the null either, and will throw NPE somewhere if it's null. So I think we should use Some here to follow this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a more general question which you have answered, sorry about that.
In this case the provider cannot be null: the grammar requires a provider and the call ctx.tableProvider.qualifiedName.getText would result in a NPE if it were null.
|
Test build #63218 has finished for PR 14482 at commit
|
|
Test build #63219 has finished for PR 14482 at commit
|
|
Test build #63221 has finished for PR 14482 at commit
|
|
Test build #63226 has finished for PR 14482 at commit
|
|
Test build #63227 has finished for PR 14482 at commit
|
| schema = schema, | ||
| provider = Some(source) | ||
| ) | ||
| val plan = CreateTable(tableDesc, SaveMode.ErrorIfExists, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will have an external impact. The value of source is input from users. That means it could be "hive". If so, users can create a Hive serde Table. If this is desired, we need to document it and test it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, we need to populate the location of storage. Maybe we just disable the option hive and issue an exception when users do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! yea we should disable hive for now.
|
@cloud-fan I just went over my related PRs. Two of them are contained, and thus I closed them. The other three are not covered. Please feel free to let me know what are the best way to deal with the remaining three. Thanks! |
|
@gatorsmile how about keep working on them after this PR is merged? |
|
@cloud-fan Sure, will do. Thanks! |
|
Test build #63256 has finished for PR 14482 at commit
|
|
LGTM - merging to master. Thanks! |
…ake CatalogTable ## What changes were proposed in this pull request? This is kind of a follow-up of #14482 . As we put `CatalogTable` in the logical plan directly, it makes sense to let physical plans take `CatalogTable` directly, instead of extracting some fields of `CatalogTable` in planner and then construct a new `CatalogTable` in physical plan. ## How was this patch tested? existing tests. Author: Wenchen Fan <[email protected]> Closes #14823 from cloud-fan/create-table.
What changes were proposed in this pull request?
we have various logical plans for CREATE TABLE and CTAS:
CreateTableUsing,CreateTableUsingAsSelect,CreateHiveTableAsSelectLogicalPlan. This PR unifies them to reduce the complexity and centralize the error handling.How was this patch tested?
existing tests