Skip to content

Conversation

@amaliujia
Copy link
Contributor

What changes were proposed in this pull request?

V2SessionCatalog should use V2Command when possible.

Why are the changes needed?

This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

NO

@github-actions github-actions bot added the SQL label Aug 8, 2024
@amaliujia
Copy link
Contributor Author

@cloud-fan

@amaliujia amaliujia changed the title [SPARK-49152][SQL] V2SessionCatalog should use V2Command when possible [SPARK-49152][SQL] V2SessionCatalog should use V2Command Aug 8, 2024
import DataSourceV2Implicits._
import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._

lazy private val hadoopConf = session.sparkContext.hadoopConfiguration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be lazy val, we should call SessionState.newHadoopConf

amaliujia and others added 2 commits August 11, 2024 19:22
@cloud-fan
Copy link
Contributor

cloud-fan commented Aug 12, 2024

thanks, merging to master!

@cloud-fan cloud-fan closed this in 2465cb0 Aug 12, 2024
amaliujia added a commit to amaliujia/spark that referenced this pull request Aug 12, 2024
V2SessionCatalog should use V2Command when possible.

This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog.

No

Existing tests.

 NO

Closes apache#47660 from amaliujia/create_table_v2.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
private def qualifyLocInTableSpec(tableSpec: TableSpec): TableSpec = {
tableSpec.withNewLocation(tableSpec.location.map(makeQualifiedDBObjectPath(_)))
tableSpec.withNewLocation(tableSpec.location.map(loc => CatalogUtils.makeQualifiedPath(
CatalogUtils.stringToURI(loc), hadoopConf).toString))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should follow v1 command code path and call CatalogUtils.URIToString to get the path string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing at #47759

dongjoon-hyun pushed a commit that referenced this pull request Aug 14, 2024
…ath string

### What changes were proposed in this pull request?

This is a followup of #47660 to restore the behavior change. The table location string should be Hadoop Path string instead of URL string which escapes all special chars.

### Why are the changes needed?

restore the unintentional behavior change.

### Does this PR introduce _any_ user-facing change?

No, it's not released yet

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #47759 from cloud-fan/fix.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
cloud-fan added a commit to cloud-fan/spark that referenced this pull request Aug 15, 2024
…ath string

This is a followup of apache#47660 to restore the behavior change. The table location string should be Hadoop Path string instead of URL string which escapes all special chars.

restore the unintentional behavior change.

No, it's not released yet

new test

no

Closes apache#47759 from cloud-fan/fix.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
cloud-fan added a commit that referenced this pull request Aug 15, 2024
…oop Path string

### What changes were proposed in this pull request?

This is a followup of #47660 to restore the behavior change. The table location string should be Hadoop Path string instead of URL string which escapes all special chars.

### Why are the changes needed?

restore the unintentional behavior change.

### Does this PR introduce _any_ user-facing change?

No, it's not released yet

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #47765 from cloud-fan/fix.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan added a commit that referenced this pull request Sep 5, 2024
…se V1 commands

### What changes were proposed in this pull request?

This is a followup of #47660 . If users override `spark_catalog` with
`DelegatingCatalogExtension`, we should still use v1 commands as `DelegatingCatalogExtension` forwards requests to HMS and there are still behavior differences between v1 and v2 commands targeting HMS.

This PR also forces to use v1 commands for certain commands that do not have a v2 version.

### Why are the changes needed?

Avoid introducing behavior changes to Spark plugins that implements `DelegatingCatalogExtension` to override `spark_catalog`.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

new test case

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #47995 from amaliujia/fix_catalog_v2.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Rui Wang <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan added a commit that referenced this pull request Sep 5, 2024
…se V1 commands

### What changes were proposed in this pull request?

This is a followup of #47660 . If users override `spark_catalog` with
`DelegatingCatalogExtension`, we should still use v1 commands as `DelegatingCatalogExtension` forwards requests to HMS and there are still behavior differences between v1 and v2 commands targeting HMS.

This PR also forces to use v1 commands for certain commands that do not have a v2 version.

### Why are the changes needed?

Avoid introducing behavior changes to Spark plugins that implements `DelegatingCatalogExtension` to override `spark_catalog`.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

new test case

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #47995 from amaliujia/fix_catalog_v2.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Rui Wang <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit f7cfeb5)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants