Skip to content

Conversation

@cstavr
Copy link
Contributor

@cstavr cstavr commented Sep 3, 2024

What changes were proposed in this pull request?

Change the implementation of createTable to avoid escaping of special chars in UnresolvedTableSpec.location. This field should contain the original user-provided path option and not the URI that is constructed by the buildStorageFormatFromOptions() call.

In addition this commit extends SparkFunSuite and SQLTestUtils to allow creating temporary directories with a custom prefix. This can be used to create temporary directories with special chars.

Why are the changes needed?

Bug fix. The following code would result in the creation of a table that is stored in /tmp/test%table instead of /tmp/test table:

spark.catalog.createTable("testTable", source = "parquet", schema = new StructType().add("id", "int"), description = "", options = Map("path" -> "/tmp/test table"))

Note that this was not consistent with the SQL API, e.g. create table testTable(id int) using parquet location '/tmp/test table'

Does this PR introduce any user-facing change?

Yes. The previous behaviour would result in table path be escaped. After this change the path will not be escaped.

How was this patch tested?

Updated existing test in CatalogSuite.

Was this patch authored or co-authored using generative AI tooling?

No

@cstavr cstavr force-pushed the location-double-escaping branch 2 times, most recently from dc9997a to 95ece17 Compare September 4, 2024 09:53
@cstavr cstavr force-pushed the location-double-escaping branch from 95ece17 to 83509fe Compare September 5, 2024 11:15
@cloud-fan cloud-fan changed the title [SPARK-49501] Fix double-escaping of table location [SPARK-49501][SQL] Fix double-escaping of table location Sep 5, 2024

withTable("t") {
withTempDir { dir =>
withTempDir(prefix = "test%prefix") { dir =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this new util function with prefix parameter, we can just do

withTempDir { baseDir =>
  val dir = new File(baseDir, "test%prefix")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also work, but I think is useful to have it since we should expand more on testing of paths with special chars. In the future we could also change the default value, so this helper will be needed to avoid the special chars.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's orthogonal from this bug fix, isn't it? I'd recommend to proceed that test utility contribution as an independent PR, @cstavr . We can review separately.

That would also work, but I think is useful to have it since we should expand more on testing of paths with special chars. In the future we could also change the default value, so this helper will be needed to avoid the special chars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. I removed the test util.

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

None
}

// The location in UnresolvedTableSpec should be the original user-provided path string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case, shall we add a legacy config to restore to the old behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this. I did not add a config because the change in behaviour is only for new tables. Existing tables should continue working. I am also suspecting that there is no real usage of this API for tables with special chars. But I don't have a strong opinion. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's keep it simple. double escaping is definitely a bug and no one should rely on it (existing tables won't be affected)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @cstavr , we can merge your two independent contributions if you split this into two PRs. Let's not mix the improvement and bug fixes into a single PR.


/**
* Creates a temporary directory with the provided prefix, which is then passed to `f` and will
* be deleted after `f` returns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a new test utility PR, could you revise this PR description? The AS-IS description looks like a copy of the description of super.withTempDir. However, this has more features, doesn't it?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @cstavr .

@cloud-fan
Copy link
Contributor

The docker test failure isn't related, thanks, merging to master/3.5!

@cloud-fan cloud-fan closed this in dc3333b Sep 9, 2024
cloud-fan pushed a commit that referenced this pull request Sep 9, 2024
Change the implementation of `createTable` to avoid escaping of special chars in `UnresolvedTableSpec.location`. This field should contain the original user-provided `path` option and not the URI that is constructed by the `buildStorageFormatFromOptions()` call.

In addition this commit extends `SparkFunSuite` and `SQLTestUtils` to allow creating temporary directories with a custom prefix. This can be used to create temporary directories with special chars.

Bug fix. The following code would result in the creation of a table that is stored in `/tmp/test%table` instead of `/tmp/test table`:
```
spark.catalog.createTable("testTable", source = "parquet", schema = new StructType().add("id", "int"), description = "", options = Map("path" -> "/tmp/test table"))
```

Note that this was not consistent with the SQL API, e.g. `create table testTable(id int) using parquet location '/tmp/test table'`

Yes. The previous behaviour would result in table path be escaped. After this change the path will not be escaped.

Updated existing test in `CatalogSuite`.

No

Closes #47976 from cstavr/location-double-escaping.

Authored-by: Christos Stavrakakis <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit dc3333b)
Signed-off-by: Wenchen Fan <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
Change the implementation of `createTable` to avoid escaping of special chars in `UnresolvedTableSpec.location`. This field should contain the original user-provided `path` option and not the URI that is constructed by the `buildStorageFormatFromOptions()` call.

In addition this commit extends `SparkFunSuite` and `SQLTestUtils` to allow creating temporary directories with a custom prefix. This can be used to create temporary directories with special chars.

Bug fix. The following code would result in the creation of a table that is stored in `/tmp/test%table` instead of `/tmp/test table`:
```
spark.catalog.createTable("testTable", source = "parquet", schema = new StructType().add("id", "int"), description = "", options = Map("path" -> "/tmp/test table"))
```

Note that this was not consistent with the SQL API, e.g. `create table testTable(id int) using parquet location '/tmp/test table'`

Yes. The previous behaviour would result in table path be escaped. After this change the path will not be escaped.

Updated existing test in `CatalogSuite`.

No

Closes apache#47976 from cstavr/location-double-escaping.

Authored-by: Christos Stavrakakis <[email protected]>

(cherry picked from commit dc3333b)

Signed-off-by: Wenchen Fan <[email protected]>
Co-authored-by: Christos Stavrakakis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants