[SPARK-5498][SQL][FOLLOW] add schema to table partition #20846

liutang123 · 2018-03-16T10:28:29Z

What changes were proposed in this pull request?

When query a orc table witch some partition schemas are different from table schema, ClassCastException will occured.
reproduction:
create table test_par(a string) PARTITIONED BY (b bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
ALTER TABLE test_par CHANGE a a bigint restrict; -- in hive
select * from test_par;

How was this patch tested?

manual test.

gatorsmile · 2018-03-16T16:34:28Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

    parameters: Map[String, String] = Map.empty,
-    stats: Option[CatalogStatistics] = None) {
+    stats: Option[CatalogStatistics] = None,
+    schema: Option[StructType] = None) {


The partition schema is stored in CatalogTable . I am not very clear what is the exception you got.

@dongjoon-hyun Could you help @liutang123 investigate the issue?

Sure, @gatorsmile . I'll take a look during weekend.

Some times, partition's schema is different from the table's.

dongjoon-hyun · 2018-03-17T00:59:28Z

@liutang123 , Spark should not do this kind of risky thing. Hive 2.3.2 also disallows incompatible schema changes like the following.

hive> CREATE TABLE test_par(a string) PARTITIONED BY (b bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
OK
Time taken: 0.262 seconds

hive> ALTER TABLE test_par CHANGE a a bigint RESTRICT;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions :
a

hive> SELECT VERSION();
OK
2.3.2 r857a9fd8ad725a53bd95c1b2d6612f9b1155f44d
Time taken: 0.711 seconds, Fetched: 1 row(s)

cc @gatorsmile .

liutang123 · 2018-03-18T01:47:57Z

@dongjoon-hyun, thanks for reviewing.
The exception is not thrown in ALTER TABLE.
We should prevent user to change table's column type. But, for historical data, should we do some compatible measures?

gatorsmile · 2018-03-19T03:50:56Z

We do not allow users to change the table column type. Currently, only the column comments are allowed to change if users issue the command through Spark. However, users still can change it through Hive. Thus, nothing we can do from Spark side, right?

dongjoon-hyun · 2018-03-19T15:11:30Z

Right, @gatorsmile .

AmplabJenkins · 2018-10-22T13:58:08Z

Can one of the admins verify this patch?

srowen · 2018-11-10T16:24:32Z

What JIRA was this really about?

Closes apache#21766 Closes apache#21679 Closes apache#21161 Closes apache#20846 Closes apache#19434 Closes apache#18080 Closes apache#17648 Closes apache#17169 Add: Closes apache#22813 Closes apache#21994 Closes apache#22005 Closes apache#22463 Add: Closes apache#15899 Add: Closes apache#22539 Closes apache#21868 Closes apache#21514 Closes apache#21402 Closes apache#21322 Closes apache#21257 Closes apache#20163 Closes apache#19691 Closes apache#18697 Closes apache#18636 Closes apache#17176 Closes apache#23001 from wangyum/CloseStalePRs. Authored-by: Yuming Wang <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>

liutang123 closed this Mar 16, 2018

liutang123 reopened this Mar 16, 2018

[SPARK-5498][SQL][FOLLOW] add schema to table partition.

f18ab92

liutang123 force-pushed the SPARK-5498 branch from cdd5987 to f18ab92 Compare March 16, 2018 10:36

gatorsmile reviewed Mar 16, 2018

View reviewed changes

wangyum mentioned this pull request Nov 10, 2018

[INFRA] Close stale PRs #23001

Closed

asfgit closed this in a3ba3a8 Nov 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-5498][SQL][FOLLOW] add schema to table partition #20846

[SPARK-5498][SQL][FOLLOW] add schema to table partition #20846

Uh oh!

liutang123 commented Mar 16, 2018 •

edited

Loading

Uh oh!

gatorsmile Mar 16, 2018

Uh oh!

dongjoon-hyun Mar 17, 2018

Uh oh!

liutang123 Mar 18, 2018

Uh oh!

dongjoon-hyun commented Mar 17, 2018

Uh oh!

liutang123 commented Mar 18, 2018 •

edited

Loading

Uh oh!

gatorsmile commented Mar 19, 2018

Uh oh!

dongjoon-hyun commented Mar 19, 2018

Uh oh!

AmplabJenkins commented Oct 22, 2018

Uh oh!

srowen commented Nov 10, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-5498][SQL][FOLLOW] add schema to table partition #20846

[SPARK-5498][SQL][FOLLOW] add schema to table partition #20846

Uh oh!

Conversation

liutang123 commented Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile Mar 16, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Mar 17, 2018

Choose a reason for hiding this comment

Uh oh!

liutang123 Mar 18, 2018

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Mar 17, 2018

Uh oh!

liutang123 commented Mar 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Mar 19, 2018

Uh oh!

dongjoon-hyun commented Mar 19, 2018

Uh oh!

AmplabJenkins commented Oct 22, 2018

Uh oh!

srowen commented Nov 10, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

liutang123 commented Mar 16, 2018 •

edited

Loading

liutang123 commented Mar 18, 2018 •

edited

Loading