Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This is a followup of #47660 to restore the behavior change. The table location string should be Hadoop Path string instead of URL string which escapes all special chars.

Why are the changes needed?

restore the unintentional behavior change.

Does this PR introduce any user-facing change?

No, it's not released yet

How was this patch tested?

new test

Was this patch authored or co-authored using generative AI tooling?

no

case SetTableLocation(ResolvedV1TableIdentifier(ident), None, location) =>
AlterTableSetLocationCommand(ident, None, location)

// V2 catalog doesn't support setting partition location yet, we must use v1 command here.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to table location but is also a followup of #47660 to fix a missing case.

private def qualifyLocInTableSpec(tableSpec: TableSpec): TableSpec = {
tableSpec.withNewLocation(tableSpec.location.map(loc => CatalogUtils.makeQualifiedPath(
CatalogUtils.stringToURI(loc), hadoopConf).toString))
val newLoc = tableSpec.location.map { loc =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here follows SessionCatalog#makeQualifiedTablePath

/**
* A reserved property to specify the location of the table. The files of the table
* should be under this location.
* should be under this location. The location is a Hadoop Path string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarification.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this looks correct to me.

c INT)
USING parquet
LOCATION 'file:///path/to/table'
LOCATION 'file:/path/to/table'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've investigated this locally. The new result is actually consistent with the production behavior (with Hive Metastore). What happens is:

  • CREATE TABLE command qualifies the table location, and keeps it as URI in CatalogTable#storage#locationUri
  • When saving the table into HMS, we have to turn URI into string, by using new Path(uri).toString. The Hadoop Path string will omit the authority component if it's not present. So file:///path becomes file:/path.
  • When reading the table back from HMS, we turn string back to URI, but the authority component won't be filled.

However, with InMemoryCatalog, we keep CatalogTable directly and there is no URI <-> string round trip. So the authority component is still there.

This actually doesn't matter, as empty authority is the same as no authority, in URI string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the details.

@dongjoon-hyun
Copy link
Member

Merged to master.

@dongjoon-hyun
Copy link
Member

Could you make a backporting PR to branch-3.5 seperately in order to make it sure all CIs pass, @cloud-fan ?

cloud-fan added a commit to cloud-fan/spark that referenced this pull request Aug 15, 2024
…ath string

This is a followup of apache#47660 to restore the behavior change. The table location string should be Hadoop Path string instead of URL string which escapes all special chars.

restore the unintentional behavior change.

No, it's not released yet

new test

no

Closes apache#47759 from cloud-fan/fix.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants