Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Jun 24, 2016

What changes were proposed in this pull request?

When we do not turn on the Hive Support, the following query generates a confusing error message by Planner:

sql("CREATE TABLE t2 SELECT a, b from t1")
assertion failed: No plan for CreateTable CatalogTable(
    Table: `t2`
    Created: Tue Aug 09 23:45:32 PDT 2016
    Last Access: Wed Dec 31 15:59:59 PST 1969
    Type: MANAGED
    Provider: hive
    Storage(InputFormat: org.apache.hadoop.mapred.TextInputFormat, OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat)), ErrorIfExists
+- Relation[a#19L,b#20L] parquet

java.lang.AssertionError: assertion failed: No plan for CreateTable CatalogTable(
    Table: `t2`
    Created: Tue Aug 09 23:45:32 PDT 2016
    Last Access: Wed Dec 31 15:59:59 PST 1969
    Type: MANAGED
    Provider: hive
    Storage(InputFormat: org.apache.hadoop.mapred.TextInputFormat, OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat)), ErrorIfExists
+- Relation[a#19L,b#20L] parquet

This PR is to issue a better error message:

Hive support is required to use CREATE Hive TABLE AS SELECT

How was this patch tested?

Added test cases in DDLSuite.scala

@SparkQA
Copy link

SparkQA commented Jun 24, 2016

Test build #61159 has finished for PR 13886 at commit db3d44c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 24, 2016

Test build #61158 has finished for PR 13886 at commit 8947445.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 24, 2016

Test build #61180 has finished for PR 13886 at commit e4cc35d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan @hvanhovell could you please review this simple change? Thanks!

@cloud-fan
Copy link
Contributor

hmmm, why can't we support CREATE TABLE AS SELECT without hive support?

@gatorsmile
Copy link
Member Author

CREATE TABLE AS SELECT can be converted to Create Data Source Table when the following condition is true: the statement does not have the user-specified file format and row format. (Actually, when the file format is Parquet, we still can convert it. Will try to submit a PR for this case)

However, the default value of internal Conf spark.sql.hive.convertCTAS is false. Thus, we do not convert them even if it is possible. Maybe we can add a rule to do it when users do not enable Hive support?

@gatorsmile
Copy link
Member Author

gatorsmile commented Jun 26, 2016

@cloud-fan Submitted a PR (#13907) for converting CTAS in parquet to data source tables without hive support. Could you also review that PR? Thanks!

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61450 has finished for PR 13886 at commit e4cc35d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

Could you please review this PR again? @cloud-fan Thanks!

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 5, 2016

Test build #61771 has finished for PR 13886 at commit e4cc35d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 25, 2016

Test build #62827 has finished for PR 13886 at commit 472896f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan This is not contained in #14482. Should I leave it open? or you will merge it into the PR?

s"$numStaticPartitions partition column(s) having constant value(s).")
}

case c if c.getClass.getName ==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after #14482 , we can have a un-hacky way to handle it, can you update?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will fix it soon. Thanks!

@SparkQA
Copy link

SparkQA commented Aug 8, 2016

Test build #63365 has finished for PR 13886 at commit 8a4d2b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

def apply(plan: LogicalPlan): Unit = {
plan.foreach {
case CreateTable(tableDesc, _, Some(_))
if tableDesc.provider.get == "hive" && sparkConf.get(CATALOG_IMPLEMENTATION) != "hive" =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I just realized this can't be backported to 2.0.

For master, I'd like to implement it directly. Then we can think about how to deal with 2.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That sounds reasonable to me. Do you think we can create a data source table in this specific case? If so, I can submit another PR for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already do? The problem is APPEND mode, we need to read schema of the existing hive table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. We did it for a few cases, but we are not able to handle all the cases. See the following code:

val hasStorageProperties = (ctx.createFileFormat != null) || (ctx.rowFormat != null)
if (conf.convertCTAS && !hasStorageProperties) {
// At here, both rowStorage.serdeProperties and fileStorage.serdeProperties
// are empty Maps.
val optionsWithPath = if (location.isDefined) {
Map("path" -> location.get)
} else {
Map.empty[String, String]
}
val newTableDesc = tableDesc.copy(
storage = CatalogStorageFormat.empty.copy(properties = optionsWithPath),
provider = Some(conf.defaultDataSourceName)
)

We can extend it to support more cases, like another PR: #13907. However, this sounds like we are unable to support all the cases when Hive is not enabled.

Regarding the issue you mentioned above, (reading the schema from the existing Hive table and then append to another table), there are two cases:

Sorry, this also looks very confusing to me. Hopefully, my description is a little bit clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry I misread this PR. When Hive support is not enabled, we definitely need to throw exception for hive tables in CTAS.

@cloud-fan
Copy link
Contributor

We can create a new rule for this kind of check and put it only in sql module, i.e. SessionState

@SparkQA
Copy link

SparkQA commented Aug 10, 2016

Test build #63495 has finished for PR 13886 at commit 79c21b6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
object HiveOnlyCheck extends (LogicalPlan => Unit) {
def apply(plan: LogicalPlan): Unit = {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put it in class not in method

"WITH SERDEPROPERTIES ('spark.sql.sources.me'='anything')")
}

test("Create Cataloged Table As Select") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hive table?

@cloud-fan
Copy link
Contributor

Can you update the PR description? I think it should throw exception in planner before your PR.

@gatorsmile
Copy link
Member Author

Sure, let me fix them now. Thanks!

@cloud-fan
Copy link
Contributor

LGTM, pending jenkins, thanks for working on it!

@SparkQA
Copy link

SparkQA commented Aug 10, 2016

Test build #63510 has finished for PR 13886 at commit 037388a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

test("Create Hive Table As Select") {
import testImplicits._
Copy link
Member

@viirya viirya Aug 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: import testImplicits._ is used in many test cases in DDLSuite. We can import it only once.

@asfgit asfgit closed this in 2b10ebe Aug 10, 2016
@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants