Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Jul 8, 2020

This adds support for atomic CTAS and RTAS commands when using SparkSessionCatalog in Spark 3.

If a TableCatalog in Spark 3 implements StagingTableCatalog, then all CTAS/RTAS operations will use the staging table methods, assuming that all tables in the catalog support the same capabilities. Iceberg tables support atomic operations, but tables loaded by the wrapped session catalog do not. The work-around is to mimic Spark's non-atomic behavior by creating a table immediately, using it for the write, and rolling back by dropping the table.

This PR doesn't contain new tests because the session catalog in Spark 3 does not work with v2 tables. It will always return a V1Table. Because a v1 table is always returned, there are no code paths that will load non-Iceberg tables using the session catalog. When the provider for a table is not a v2 provider, Spark will bypass the v2 plugin. A plugin can define and load v2 tables, but v2 will never be used for tables loaded by the wrapped session catalog.

@rdblue rdblue mentioned this pull request Jul 8, 2020
@rdblue rdblue added this to the Spark 3 milestone Jul 8, 2020
@rdblue rdblue force-pushed the atomic-ctas-rtas branch from 9e89091 to 2a53e80 Compare July 8, 2020 23:24
@danielcweeks
Copy link
Contributor

+1 LGTM

@rdblue
Copy link
Contributor Author

rdblue commented Jul 9, 2020

I ran tests locally because CI is running behind. Everything looks good, I'll merge this.

Thanks for reviewing, @danielcweeks!

@rdblue rdblue merged commit a81ba17 into apache:master Jul 9, 2020
cmathiesen pushed a commit to ExpediaGroup/iceberg that referenced this pull request Aug 19, 2020
@eubnara
Copy link

eubnara commented Oct 6, 2024

Hello @rdblue ! According to this PR, SparkSessionCatalog is also atomic on CTAS, RTAS. However, on documentation it says "CTAS is supported, but is not atomic when using SparkSessionCatalog.".

image

https://iceberg.apache.org/docs/latest/spark-ddl/#create-table-as-select

Which one is correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants