Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

we have various logical plans for CREATE TABLE and CTAS: CreateTableUsing, CreateTableUsingAsSelect, CreateHiveTableAsSelectLogicalPlan. This PR unifies them to reduce the complexity and centralize the error handling.

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

cc @liancheng @gatorsmile @yhuai @rxin

@SparkQA
Copy link

SparkQA commented Aug 3, 2016

Test build #63176 has finished for PR 14482 at commit a11d122.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Aug 3, 2016

is it a MIMA issue? Although I moved the SaveMode class, it's still in the same package.

@yhuai
Copy link
Contributor

yhuai commented Aug 3, 2016

SaveMode is a public API. We cannot move it to catalyst.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this somewhere we can reuse it.

// * partition, bucket and sort column names must exist in table definition.
// * partition, bucket and sort column names can't be duplicated.
// * can't use all table columns as partition columns.
// * partition columns' type must be AtomicType.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @gatorsmile , I think all this checks are general and can be applied to hive serde tables too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these checking are general to both types. The only issue is here:

partition, bucket and sort column names must exist in table definition.

However, the schema value from the parser combines both. Thus, Hive table creation pass the checks

val schema = StructType(dataCols ++ partitionCols)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because hive CREATE TABLE syntax is special, it will never hit this exception partition column names must exist in table definition. Instead, hive may specify partition column names that exist in table definition, and will hit column names in table definition can't be duplicated.

@cloud-fan
Copy link
Contributor Author

cc @gatorsmile , IIRC, you have some PRs about error handling. After this PR, we can have a centre place for basic error handling, is it good enough for all the error cases you found?

// Since we are saving table metadata to metastore, we should make sure the table name
// and database name don't break some common restrictions, e.g. special chars except
// underscore are not allowed.
val pattern = Pattern.compile("[\\w_]+")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @hvanhovell , I think this is the only place that we need this check, as CreateTable is the only plan that can save a table metadata into metastore. what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiling a pattern can be expensive and the Pattern produced is reusable; that is all.

@SparkQA
Copy link

SparkQA commented Aug 4, 2016

Test build #63209 has finished for PR 14482 at commit 108e385.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class CreateTable(tableDesc: CatalogTable, mode: SaveMode, query: Option[LogicalPlan])
    • case class PreprocessDDL(conf: SQLConf) extends Rule[LogicalPlan]

@SparkQA
Copy link

SparkQA commented Aug 4, 2016

Test build #63211 has finished for PR 14482 at commit ec47911.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class CreateTable(tableDesc: CatalogTable, mode: SaveMode, query: Option[LogicalPlan])
    • case class PreprocessDDL(conf: SQLConf) extends Rule[LogicalPlan]

tableType = CatalogTableType.MANAGED,
storage = CatalogStorageFormat.empty.copy(properties = options),
schema = schema.getOrElse(new StructType),
provider = Some(provider),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still possible that a provide is None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, just look at the codes around, it's possible that provider is null. However, if we also look at the antlr file, the provider must be specified. The previous code doesn't check the null either, and will throw NPE somewhere if it's null. So I think we should use Some here to follow this behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a more general question which you have answered, sorry about that.

In this case the provider cannot be null: the grammar requires a provider and the call ctx.tableProvider.qualifiedName.getText would result in a NPE if it were null.

@SparkQA
Copy link

SparkQA commented Aug 4, 2016

Test build #63218 has finished for PR 14482 at commit 7c670b7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2016

Test build #63219 has finished for PR 14482 at commit 2662bf1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2016

Test build #63221 has finished for PR 14482 at commit 8db79e4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2016

Test build #63226 has finished for PR 14482 at commit 8f53cf2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2016

Test build #63227 has finished for PR 14482 at commit 2933c58.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

schema = schema,
provider = Some(source)
)
val plan = CreateTable(tableDesc, SaveMode.ErrorIfExists, None)
Copy link
Member

@gatorsmile gatorsmile Aug 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will have an external impact. The value of source is input from users. That means it could be "hive". If so, users can create a Hive serde Table. If this is desired, we need to document it and test it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, we need to populate the location of storage. Maybe we just disable the option hive and issue an exception when users do it?

Copy link
Contributor Author

@cloud-fan cloud-fan Aug 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! yea we should disable hive for now.

@gatorsmile
Copy link
Member

@cloud-fan I just went over my related PRs. Two of them are contained, and thus I closed them. The other three are not covered. Please feel free to let me know what are the best way to deal with the remaining three. Thanks!

@cloud-fan
Copy link
Contributor Author

@gatorsmile how about keep working on them after this PR is merged?

@gatorsmile
Copy link
Member

@cloud-fan Sure, will do. Thanks!

@SparkQA
Copy link

SparkQA commented Aug 5, 2016

Test build #63256 has finished for PR 14482 at commit f49ef22.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

LGTM - merging to master. Thanks!

@asfgit asfgit closed this in 5effc01 Aug 5, 2016
asfgit pushed a commit that referenced this pull request Sep 1, 2016
…ake CatalogTable

## What changes were proposed in this pull request?

This is kind of a follow-up of #14482 . As we put `CatalogTable` in the logical plan directly, it makes sense to let physical plans take `CatalogTable` directly, instead of extracting some fields of `CatalogTable` in planner and then construct a new `CatalogTable` in physical plan.

## How was this patch tested?

existing tests.

Author: Wenchen Fan <[email protected]>

Closes #14823 from cloud-fan/create-table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants