-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-49152][SQL] V2SessionCatalog should use V2Command #47660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
e85a419 to
69614d1
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
...core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/TestV2SessionCatalogBase.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
a86341f to
250a4ae
Compare
| import DataSourceV2Implicits._ | ||
| import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ | ||
|
|
||
| lazy private val hadoopConf = session.sparkContext.hadoopConfiguration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be lazy val, we should call SessionState.newHadoopConf
...core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
…/ResolveSessionCatalog.scala Co-authored-by: Wenchen Fan <[email protected]>
|
thanks, merging to master! |
V2SessionCatalog should use V2Command when possible. This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog. No Existing tests. NO Closes apache#47660 from amaliujia/create_table_v2. Authored-by: Rui Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
| private def qualifyLocInTableSpec(tableSpec: TableSpec): TableSpec = { | ||
| tableSpec.withNewLocation(tableSpec.location.map(makeQualifiedDBObjectPath(_))) | ||
| tableSpec.withNewLocation(tableSpec.location.map(loc => CatalogUtils.makeQualifiedPath( | ||
| CatalogUtils.stringToURI(loc), hadoopConf).toString)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should follow v1 command code path and call CatalogUtils.URIToString to get the path string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing at #47759
…ath string ### What changes were proposed in this pull request? This is a followup of #47660 to restore the behavior change. The table location string should be Hadoop Path string instead of URL string which escapes all special chars. ### Why are the changes needed? restore the unintentional behavior change. ### Does this PR introduce _any_ user-facing change? No, it's not released yet ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #47759 from cloud-fan/fix. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…ath string This is a followup of apache#47660 to restore the behavior change. The table location string should be Hadoop Path string instead of URL string which escapes all special chars. restore the unintentional behavior change. No, it's not released yet new test no Closes apache#47759 from cloud-fan/fix. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…oop Path string ### What changes were proposed in this pull request? This is a followup of #47660 to restore the behavior change. The table location string should be Hadoop Path string instead of URL string which escapes all special chars. ### Why are the changes needed? restore the unintentional behavior change. ### Does this PR introduce _any_ user-facing change? No, it's not released yet ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #47765 from cloud-fan/fix. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…se V1 commands ### What changes were proposed in this pull request? This is a followup of #47660 . If users override `spark_catalog` with `DelegatingCatalogExtension`, we should still use v1 commands as `DelegatingCatalogExtension` forwards requests to HMS and there are still behavior differences between v1 and v2 commands targeting HMS. This PR also forces to use v1 commands for certain commands that do not have a v2 version. ### Why are the changes needed? Avoid introducing behavior changes to Spark plugins that implements `DelegatingCatalogExtension` to override `spark_catalog`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new test case ### Was this patch authored or co-authored using generative AI tooling? No Closes #47995 from amaliujia/fix_catalog_v2. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Rui Wang <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…se V1 commands ### What changes were proposed in this pull request? This is a followup of #47660 . If users override `spark_catalog` with `DelegatingCatalogExtension`, we should still use v1 commands as `DelegatingCatalogExtension` forwards requests to HMS and there are still behavior differences between v1 and v2 commands targeting HMS. This PR also forces to use v1 commands for certain commands that do not have a v2 version. ### Why are the changes needed? Avoid introducing behavior changes to Spark plugins that implements `DelegatingCatalogExtension` to override `spark_catalog`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new test case ### Was this patch authored or co-authored using generative AI tooling? No Closes #47995 from amaliujia/fix_catalog_v2. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Rui Wang <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit f7cfeb5) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
V2SessionCatalog should use V2Command when possible.
Why are the changes needed?
This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
NO