Support atomic CTAS and RTAS with SparkSessionCatalog #1183

rdblue · 2020-07-08T17:27:16Z

This adds support for atomic CTAS and RTAS commands when using SparkSessionCatalog in Spark 3.

If a TableCatalog in Spark 3 implements StagingTableCatalog, then all CTAS/RTAS operations will use the staging table methods, assuming that all tables in the catalog support the same capabilities. Iceberg tables support atomic operations, but tables loaded by the wrapped session catalog do not. The work-around is to mimic Spark's non-atomic behavior by creating a table immediately, using it for the write, and rolling back by dropping the table.

This PR doesn't contain new tests because the session catalog in Spark 3 does not work with v2 tables. It will always return a V1Table. Because a v1 table is always returned, there are no code paths that will load non-Iceberg tables using the session catalog. When the provider for a table is not a v2 provider, Spark will bypass the v2 plugin. A plugin can define and load v2 tables, but v2 will never be used for tables loaded by the wrapped session catalog.

danielcweeks · 2020-07-08T23:25:26Z

+1 LGTM

rdblue · 2020-07-09T00:25:46Z

I ran tests locally because CI is running behind. Everything looks good, I'll merge this.

Thanks for reviewing, @danielcweeks!

eubnara · 2024-10-06T04:32:40Z

Hello @rdblue ! According to this PR, SparkSessionCatalog is also atomic on CTAS, RTAS. However, on documentation it says "CTAS is supported, but is not atomic when using SparkSessionCatalog.".

https://iceberg.apache.org/docs/latest/spark-ddl/#create-table-as-select

Which one is correct?

rdblue mentioned this pull request Jul 8, 2020

Add Spark 3 SQL tests #1156

Merged

rdblue added this to the Spark 3 milestone Jul 8, 2020

rdblue added 2 commits July 8, 2020 16:23

Support atomic CTAS and RTAS with SparkSessionCatalog.

f1cc460

Fix checkstyle.

2a53e80

rdblue force-pushed the atomic-ctas-rtas branch from 9e89091 to 2a53e80 Compare July 8, 2020 23:24

Update CTAS and RTAS tests for session catalog.

518eb27

rdblue merged commit a81ba17 into apache:master Jul 9, 2020

cmathiesen pushed a commit to ExpediaGroup/iceberg that referenced this pull request Aug 19, 2020

Support atomic CTAS and RTAS with SparkSessionCatalog (apache#1183)

22750be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support atomic CTAS and RTAS with SparkSessionCatalog #1183

Support atomic CTAS and RTAS with SparkSessionCatalog #1183

Uh oh!

rdblue commented Jul 8, 2020 •

edited

Loading

Uh oh!

danielcweeks commented Jul 8, 2020

Uh oh!

rdblue commented Jul 9, 2020

Uh oh!

eubnara commented Oct 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support atomic CTAS and RTAS with SparkSessionCatalog #1183

Support atomic CTAS and RTAS with SparkSessionCatalog #1183

Uh oh!

Conversation

rdblue commented Jul 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielcweeks commented Jul 8, 2020

Uh oh!

rdblue commented Jul 9, 2020

Uh oh!

eubnara commented Oct 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rdblue commented Jul 8, 2020 •

edited

Loading