-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC to Data Source Tables for CREATE TABLE AS SELECT #13907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #61243 has finished for PR 13907 at commit
|
|
Test build #61253 has finished for PR 13907 at commit
|
|
With your PR, if users specify |
|
Nope. If users do not specify the intput and output formats. We will use the default I am not sure whether we should still convert it. Please let me know if you think we should still convert them. Thanks! BTW, I also confirmed Spark SQL and Hive have the same default input and output formats. |
|
cc @cloud-fan This is not contained in #14482. Should I leave it open? Or should I fix the conflict after #14482 is merged? |
|
I don't think it's a very useful feature, and we may surprise users as they do use hive syntax to specify row format. For advanced users, they can easily use |
|
I see. Let me close it. |
What changes were proposed in this pull request?
Currently, the following created tables will be Hive Table.
When users create table as query with
STORED ASorROW FORMATandspark.sql.hive.convertCTASis set totrue, we will not convert them to data source tables. Actually, for parquet and orc formats, we still can convert them to data source table even if the users useSTORED ASorROW FORMAT.How was this patch tested?
Added test cases for both ORC and PARQUET