Skip to content

Conversation

@amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Aug 12, 2024

What changes were proposed in this pull request?

V2SessionCatalog should use V2Command when possible.

Why are the changes needed?

This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

NO

V2SessionCatalog should use V2Command when possible.

This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog.

No

Existing tests.

 NO

Closes apache#47660 from amaliujia/create_table_v2.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@github-actions github-actions bot added the SQL label Aug 12, 2024
@amaliujia
Copy link
Contributor Author

@cloud-fan


case ShowTableExtended(
DatabaseInSessionCatalog(db),
ResolvedV1Database(db),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ResolvedV1Database(db),
ResolvedV1Database(db),

@cloud-fan
Copy link
Contributor

thanks, merging to 3.5!

cloud-fan added a commit that referenced this pull request Aug 13, 2024
### What changes were proposed in this pull request?

V2SessionCatalog should use V2Command when possible.

### Why are the changes needed?

This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog.
### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

Existing tests.

### Was this patch authored or co-authored using generative AI tooling?

NO

Closes #47724 from amaliujia/branch-3.5.

Lead-authored-by: Rui Wang <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan cloud-fan closed this Aug 13, 2024

object ResolvedV1Identifier {
def unapply(resolved: LogicalPlan): Option[TableIdentifier] = resolved match {
case ResolvedIdentifier(catalog, ident) if supportsV1Command(catalog) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaliujia @cloud-fan
This change looks to have broken creating V1 table from a V2_SESSION_CATALOG_IMPLEMENTATION like Iceberg's SparkSessionCatalog

// For CREATE TABLE [AS SELECT], we should use the v1 command if the catalog is resolved to the
// session catalog and the table provider is not v2.
case c @ CreateTable(ResolvedV1Identifier(ident), _, _, tableSpec: TableSpec, _) =>
val (storageFormat, provider) = getStorageFormatAndProvider(
c.tableSpec.provider, tableSpec.options, c.tableSpec.location, c.tableSpec.serde,
ctas = false)
if (!isV2Provider(provider)) {
constructV1TableCmd(None, c.tableSpec, ident, c.tableSchema, c.partitioning,
c.ignoreIfExists, storageFormat, provider)
} else {
c
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Iceberg catalog extend DelegatingCatalogExtension?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do want to use v2 commands for custom catalogs that do not extend DelegatingCatalogExtension

Copy link
Member

@manuzhang manuzhang Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Iceberg catalog extend DelegatingCatalogExtension?

Nope.

We do want to use v2 commands for custom catalogs that do not extend DelegatingCatalogExtension

Even so, is it the right time to introduce such a behavior change in a bug fix release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider it as a bug. People implementing DS V2 catalog APIs expect to see v2 commands to customize the table behaviors. And there is a backdoor: DelegatingCatalogExtension.

For iceberg, it should be easy to work around it by extending DelegatingCatalogExtension? Iceberg catalog can still keep all its methods unchanged, don't use the delegate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iceberg's SparkSessionCatalog already extends a base class. There's no easy way to extend DelegatingCatalogExtension without a major refactoring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make either the iceberg BaseCatalog or the Spark DelegatingCatalogExtension an interface. It looks easier to make BaseCatalog an interface?

dongjoon-hyun pushed a commit that referenced this pull request Sep 26, 2024
### What changes were proposed in this pull request?

This PR updates `DelegatingCatalogExtension` so that it's more extendable
- `initialize` becomes not final, so that sub-classes can overwrite it
- `delegate` becomes `protected`, so that sub-classes can access it

In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. #47724 should use `CatalogExtension` instead.

### Why are the changes needed?

Unblock the Iceberg extension.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #48257 from cloud-fan/catalog.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Sep 26, 2024
### What changes were proposed in this pull request?

This PR updates `DelegatingCatalogExtension` so that it's more extendable
- `initialize` becomes not final, so that sub-classes can overwrite it
- `delegate` becomes `protected`, so that sub-classes can access it

In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. #47724 should use `CatalogExtension` instead.

### Why are the changes needed?

Unblock the Iceberg extension.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #48257 from cloud-fan/catalog.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 339dd5b)
Signed-off-by: Dongjoon Hyun <[email protected]>
a0x8o added a commit to a0x8o/spark that referenced this pull request Sep 26, 2024
### What changes were proposed in this pull request?

This PR updates `DelegatingCatalogExtension` so that it's more extendable
- `initialize` becomes not final, so that sub-classes can overwrite it
- `delegate` becomes `protected`, so that sub-classes can access it

In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. apache/spark#47724 should use `CatalogExtension` instead.

### Why are the changes needed?

Unblock the Iceberg extension.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #48257 from cloud-fan/catalog.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants