Skip to content

Conversation

@dilipbiswal
Copy link
Contributor

@dilipbiswal dilipbiswal commented Sep 21, 2016

What changes were proposed in this pull request?

Make sure the hive.default.fileformat is used to when creating the storage format metadata.

Output

scala> spark.sql("SET hive.default.fileformat=orc")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("CREATE TABLE tmp_default(id INT)")
res2: org.apache.spark.sql.DataFrame = []

Before

scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]

After

scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.ql.io.orc.OrcSerde,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]

How was this patch tested?

Added new tests to HiveDDLCommandSuite

@SparkQA
Copy link

SparkQA commented Sep 22, 2016

Test build #65742 has finished for PR 15190 at commit 042a94e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

assert(partition2.get.apply("c") == "1" && partition2.get.apply("d") == "2")
}

test("Test default fileformat") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the test case name to Test the default fileformat for Hive-serde tables

s"""
|CREATE TABLE IF NOT EXISTS fileformat_test (id int)
""".stripMargin
val (desc, exists) = extractTableDesc(s1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val (desc, exists) = extractTableDesc("CREATE TABLE IF NOT EXISTS fileformat_test (id int)")

s"""
|CREATE TABLE IF NOT EXISTS fileformat_test (id int)
""".stripMargin
val (desc, exists) = extractTableDesc(s1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Thanks !! I have updated as per your comments.

@gatorsmile
Copy link
Member

Please update the PR description. This is not for orc only

@SparkQA
Copy link

SparkQA commented Sep 22, 2016

Test build #65744 has finished for PR 15190 at commit f60e760.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 22, 2016

Test build #65752 has finished for PR 15190 at commit f2b93de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal dilipbiswal changed the title [SPARK-17620][SQL] hive.default.fileformat=orc does not set OrcSerde [SPARK-17620][SQL] Use the storage format specified by hive.default.fileformat when creating hive serde tables. Sep 22, 2016
.orElse(Some("org.apache.hadoop.mapred.TextInputFormat")),
outputFormat = defaultHiveSerde.flatMap(_.outputFormat)
.orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")),
// Note: Keep this unspecified because we use the presence of the serde to decide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is kept as unspecified because it is intended to write the table with Hive write path. If we specify serde here, it will be converted to datasource table. Is it ok? cc @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @yhuai to confirm

Copy link
Member

@gatorsmile gatorsmile Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is not valid now. This was removed by the PR: #13386 (See the code changes made in HiveMetastoreCatalog.scala)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current checking conditions are based on ctx.createFileFormat and ctx.rowFormat. Thus, I think this PR looks ok. : )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya @cloud-fan Actually i am not sure, if the above comment is in sync with the code. When we had this comment, we used to have CreateTableAsSelectLogicalPlan to represent the CTAS case and we used to check for serde's presence to determine whether or not to convert it to a data source table like following.

   if (sessionState.convertCTAS && table.storage.serde.isEmpty) {
          // Do the conversion when spark.sql.hive.convertCTAS is true and the query
          // does not specify any storage format (file format and storage handler).
          if (table.identifier.database.isDefined) {
            throw new AnalysisException(
              "Cannot specify database name in a CTAS statement " +
                "when spark.sql.hive.convertCTAS is set to true.")
          }

          val mode = if (allowExisting) SaveMode.Ignore else SaveMode.ErrorIfExists
          CreateTableUsingAsSelect(
            TableIdentifier(desc.identifier.table),
            conf.defaultDataSourceName,
            temporary = false,
            Array.empty[String],
            bucketSpec = None,
            mode,
            options = Map.empty[String, String],
            child
          )
        } else {
          val desc = if (table.storage.serde.isEmpty) {
            // add default serde
            table.withNewStorage(
              serde = Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
          } else {
            table
          }

I think this code has changed and moved to SparkSqlParser ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. looks ok now.

@gatorsmile
Copy link
Member

PR title should be
Determine Serde by hive.default.fileformat when Creating Hive Serde Tables

@dilipbiswal dilipbiswal changed the title [SPARK-17620][SQL] Use the storage format specified by hive.default.fileformat when creating hive serde tables. [SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Creating Hive Serde Tables Sep 22, 2016
// Note: Keep this unspecified because we use the presence of the serde to decide
// whether to convert a table created by CTAS to a datasource table.
serde = None,
serde = defaultHiveSerde.flatMap(_.serde),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DataSinks strategy, we set a default serde for CreateTable if tableDesc.storage.serde.isEmpty. I think we should also remove it and add .orElse(Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")) here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya Please see my comment below

assert(partition2.get.apply("c") == "1" && partition2.get.apply("d") == "2")
}

test("Test the default fileformat for Hive-serde tables") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test for withSQLConf("hive.default.fileformat" -> "")?

@viirya
Copy link
Member

viirya commented Sep 22, 2016

As we set default serde/inputFormat/outFormat now, I think we don't need to set default values for them in CreateHiveTableAsSelectCommand again. What do you think?

@dilipbiswal
Copy link
Contributor Author

dilipbiswal commented Sep 22, 2016

@viirya I think we can come here from multiple code paths like visitCreateTableUsing. I think we can come to DataSinks's CreateTable case without serde being set. Let me know what you think.

@viirya
Copy link
Member

viirya commented Sep 22, 2016

oh. right. Should we set default serde in visitCreateTableUsing?

@viirya
Copy link
Member

viirya commented Sep 22, 2016

Currently the default serde setting path looks scattered. I think it is better to avoid that.

@dilipbiswal
Copy link
Contributor Author

@viirya In my understanding, thats the datasource table code path. I am not sure if we should look at hive.default.fileformat property to set the default storage for data source tables ? In my opinion, its probably better to leave it the way it is to make sure we always have a default setting in the common path so we don't miss it.

@viirya
Copy link
Member

viirya commented Sep 22, 2016

hmmm, as we prohibit CREATE TABLE USING" with hive serde table, can we reach DataSinks's CreateTable from the path if visitCreateTableUsing?

@viirya
Copy link
Member

viirya commented Sep 22, 2016

@dilipbiswal yeah. I see. I still think to set default serde at different places looks not good to me, especially in other places it is hard-coded.

@viirya
Copy link
Member

viirya commented Sep 22, 2016

@dilipbiswal nvm. It is not related to this change.

@yhuai
Copy link
Contributor

yhuai commented Sep 22, 2016

Can you try CREATE TABLE tmp_default(id INT) as select .... and see if the table will be converted to parquet format?

@dilipbiswal
Copy link
Contributor Author

@yhuai Hi Yin,

create table ... as select ... would respect the setting of hive.default.fileformat.

scala> spark.sql("SET hive.default.fileformat=parquet")
res14: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("CREATE TABLE tmp_default4 select * from tmp_default")
res11: org.apache.spark.sql.DataFrame = []

scala> spark.sql("DESC FORMATTED tmp_default4").collect.foreach(println)
...
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe,]
[InputFormat:,org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat,]
...

scala> spark.sql("SET hive.default.fileformat=orc")
res14: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("CREATE TABLE tmp_default5 select * from tmp_default")
res15: org.apache.spark.sql.DataFrame = []

scala> spark.sql("DESC FORMATTED tmp_default5").collect.foreach(println)
...
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.ql.io.orc.OrcSerde,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
...

@gatorsmile
Copy link
Member

gatorsmile commented Sep 22, 2016

@dilipbiswal Are they converted to data source tables?

@dilipbiswal
Copy link
Contributor Author

dilipbiswal commented Sep 22, 2016

@gatorsmile Thanks.. didn't realize we wanted to find out the CTAS behaviour. Here is the result..
When convertCTAS is set to true, we create a data source table with parquet format.

scala> spark.sql("SET spark.sql.hive.convertCTAS=true")
res23: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("CREATE TABLE tmp_default6 as select * from tmp_default")
res24: org.apache.spark.sql.DataFrame = []

scala> spark.sql("DESC extended tmp_default6").collect.foreach(println)
[# Detailed Table Information,CatalogTable(
    Provider: parquet
    Storage(Location: file:/home/mygit/apache/spark/bin/spark-warehouse/tmp_default6, InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, Serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties: [path=file:/user/hive/warehouse/tmp_default6, serialization.format=1])),]

@gatorsmile
Copy link
Member

@dilipbiswal Based on your tests and the source codes, I think your fix does not break anything.

@viirya
Copy link
Member

viirya commented Sep 23, 2016

Looks good.

@gatorsmile
Copy link
Member

cc @yhuai @cloud-fan Based on the above PR discussion, it sounds like this PR is ok to merge. What do you think? Thank you!

@yhuai
Copy link
Contributor

yhuai commented Oct 11, 2016

If we have spark.sql.hive.convertCTAS=true and hive.default.fileformat=orc, what format will we use when we create a table through a CTAS statement?

@dilipbiswal
Copy link
Contributor Author

@yhuai We will use Parquet format in your example. We look at SQL spark.sql.sources.default configuration to decide on the format to use ?

Here is the output for your perusal.

spark-sql> set spark.sql.hive.convertCTAS=true;
spark.sql.hive.convertCTAS  true
Time taken: 3.309 seconds, Fetched 1 row(s)
spark-sql> set hive.default.fileformat=orc;
hive.default.fileformat orc
Time taken: 0.053 seconds, Fetched 1 row(s)
spark-sql> CREATE TABLE IF NOT EXISTS test select 1 from foo;
spark-sql> describe formatted test;
...
# Storage Information       
SerDe Library:  org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe 
InputFormat:    org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat   
OutputFormat:   org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat  

Now change spark.sql.sources.default=orc

spark-sql> set spark.sql.sources.default=orc;
spark.sql.sources.default   orc
spark-sql> CREATE TABLE IF NOT EXISTS test2 select 1 from foo;
Time taken: 0.451 seconds
spark-sql> describe formatted test2;
...
# Storage Information       
SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde   
InputFormat:    org.apache.hadoop.hive.ql.io.orc.OrcInputFormat 
OutputFormat:   org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat    

Please let me know if you have any further questions.

@gatorsmile
Copy link
Member

Added them to your test cases. : )

@SparkQA
Copy link

SparkQA commented Oct 14, 2016

Test build #66919 has finished for PR 15190 at commit f32fe21.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Oct 14, 2016

Test build #66944 has finished for PR 15190 at commit f32fe21.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 14, 2016

Test build #66949 has finished for PR 15190 at commit f32fe21.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

@gatorsmile @yhuai I have added a new test. Can we please take a look at this again ?

@yhuai
Copy link
Contributor

yhuai commented Oct 14, 2016

@dilipbiswal That makes sense. Thank you for testing that. I do not have any other question.

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Merging to master! Thanks!

@asfgit asfgit closed this in 7ab8624 Oct 14, 2016
@dilipbiswal
Copy link
Contributor Author

Thank you @yhuai @gatorsmile @cloud-fan @viirya @dafrista

@yhuai
Copy link
Contributor

yhuai commented Oct 14, 2016

@yhuai
Copy link
Contributor

yhuai commented Oct 14, 2016

I am reverting this patch. Sorry.

@yhuai
Copy link
Contributor

yhuai commented Oct 14, 2016

Reverted

@dilipbiswal
Copy link
Contributor Author

@yhuai very sorry Yin.. Let me look what happened here.. Is there a way to open this pull request ? or i need to open a new one ?

@gatorsmile
Copy link
Member

It sounds like the other merged PRs made a change and impact your code. Please resolve it. Thanks!

@yhuai
Copy link
Contributor

yhuai commented Oct 14, 2016

Yea. Looks like so. No worries. Let's get it tested again.

@dilipbiswal
Copy link
Contributor Author

@gatorsmile @yhuai Its due to difference between scala 2.10 and 2.11 compiler in the way they deal with named parameters. Looks like 2.10 is less forgiving :-) . I have opened #15495 with the fix.

asfgit pushed a commit that referenced this pull request Oct 18, 2016
…eating Hive Serde Tables

## What changes were proposed in this pull request?
Reopens the closed PR #15190
(Please refer to the above link for review comments on the PR)

Make sure the hive.default.fileformat is used to when creating the storage format metadata.

Output
``` SQL
scala> spark.sql("SET hive.default.fileformat=orc")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("CREATE TABLE tmp_default(id INT)")
res2: org.apache.spark.sql.DataFrame = []
```
Before
```SQL
scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]
```
After
```SQL
scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.ql.io.orc.OrcSerde,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]

```
## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Added new tests to HiveDDLCommandSuite, SQLQuerySuite

Author: Dilip Biswal <[email protected]>

Closes #15495 from dilipbiswal/orc2.
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
…eating Hive Serde Tables

## What changes were proposed in this pull request?
Make sure the hive.default.fileformat is used to when creating the storage format metadata.

Output
``` SQL
scala> spark.sql("SET hive.default.fileformat=orc")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("CREATE TABLE tmp_default(id INT)")
res2: org.apache.spark.sql.DataFrame = []
```
Before
```SQL
scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]
```
After
```SQL
scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.ql.io.orc.OrcSerde,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]

```

## How was this patch tested?
Added new tests to HiveDDLCommandSuite

Author: Dilip Biswal <[email protected]>

Closes apache#15190 from dilipbiswal/orc.
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
…eating Hive Serde Tables

## What changes were proposed in this pull request?
Reopens the closed PR apache#15190
(Please refer to the above link for review comments on the PR)

Make sure the hive.default.fileformat is used to when creating the storage format metadata.

Output
``` SQL
scala> spark.sql("SET hive.default.fileformat=orc")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("CREATE TABLE tmp_default(id INT)")
res2: org.apache.spark.sql.DataFrame = []
```
Before
```SQL
scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]
```
After
```SQL
scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.ql.io.orc.OrcSerde,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]

```
## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Added new tests to HiveDDLCommandSuite, SQLQuerySuite

Author: Dilip Biswal <[email protected]>

Closes apache#15495 from dilipbiswal/orc2.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…eating Hive Serde Tables

## What changes were proposed in this pull request?
Make sure the hive.default.fileformat is used to when creating the storage format metadata.

Output
``` SQL
scala> spark.sql("SET hive.default.fileformat=orc")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("CREATE TABLE tmp_default(id INT)")
res2: org.apache.spark.sql.DataFrame = []
```
Before
```SQL
scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]
```
After
```SQL
scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.ql.io.orc.OrcSerde,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]

```

## How was this patch tested?
Added new tests to HiveDDLCommandSuite

Author: Dilip Biswal <[email protected]>

Closes apache#15190 from dilipbiswal/orc.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…eating Hive Serde Tables

## What changes were proposed in this pull request?
Reopens the closed PR apache#15190
(Please refer to the above link for review comments on the PR)

Make sure the hive.default.fileformat is used to when creating the storage format metadata.

Output
``` SQL
scala> spark.sql("SET hive.default.fileformat=orc")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("CREATE TABLE tmp_default(id INT)")
res2: org.apache.spark.sql.DataFrame = []
```
Before
```SQL
scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]
```
After
```SQL
scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.ql.io.orc.OrcSerde,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]

```
## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Added new tests to HiveDDLCommandSuite, SQLQuerySuite

Author: Dilip Biswal <[email protected]>

Closes apache#15495 from dilipbiswal/orc2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants