-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-49501][SQL] Fix double-escaping of table location #47976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dc9997a to
95ece17
Compare
95ece17 to
83509fe
Compare
|
|
||
| withTable("t") { | ||
| withTempDir { dir => | ||
| withTempDir(prefix = "test%prefix") { dir => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need this new util function with prefix parameter, we can just do
withTempDir { baseDir =>
val dir = new File(baseDir, "test%prefix")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would also work, but I think is useful to have it since we should expand more on testing of paths with special chars. In the future we could also change the default value, so this helper will be needed to avoid the special chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's orthogonal from this bug fix, isn't it? I'd recommend to proceed that test utility contribution as an independent PR, @cstavr . We can review separately.
That would also work, but I think is useful to have it since we should expand more on testing of paths with special chars. In the future we could also change the default value, so this helper will be needed to avoid the special chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK. I removed the test util.
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
| None | ||
| } | ||
|
|
||
| // The location in UnresolvedTableSpec should be the original user-provided path string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just in case, shall we add a legacy config to restore to the old behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this. I did not add a config because the change in behaviour is only for new tables. Existing tables should continue working. I am also suspecting that there is no real usage of this API for tables with special chars. But I don't have a strong opinion. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's keep it simple. double escaping is definitely a bug and no one should rely on it (existing tables won't be affected)
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To @cstavr , we can merge your two independent contributions if you split this into two PRs. Let's not mix the improvement and bug fixes into a single PR.
|
|
||
| /** | ||
| * Creates a temporary directory with the provided prefix, which is then passed to `f` and will | ||
| * be deleted after `f` returns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a new test utility PR, could you revise this PR description? The AS-IS description looks like a copy of the description of super.withTempDir. However, this has more features, doesn't it?
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @cstavr .
|
The docker test failure isn't related, thanks, merging to master/3.5! |
Change the implementation of `createTable` to avoid escaping of special chars in `UnresolvedTableSpec.location`. This field should contain the original user-provided `path` option and not the URI that is constructed by the `buildStorageFormatFromOptions()` call.
In addition this commit extends `SparkFunSuite` and `SQLTestUtils` to allow creating temporary directories with a custom prefix. This can be used to create temporary directories with special chars.
Bug fix. The following code would result in the creation of a table that is stored in `/tmp/test%table` instead of `/tmp/test table`:
```
spark.catalog.createTable("testTable", source = "parquet", schema = new StructType().add("id", "int"), description = "", options = Map("path" -> "/tmp/test table"))
```
Note that this was not consistent with the SQL API, e.g. `create table testTable(id int) using parquet location '/tmp/test table'`
Yes. The previous behaviour would result in table path be escaped. After this change the path will not be escaped.
Updated existing test in `CatalogSuite`.
No
Closes #47976 from cstavr/location-double-escaping.
Authored-by: Christos Stavrakakis <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit dc3333b)
Signed-off-by: Wenchen Fan <[email protected]>
Change the implementation of `createTable` to avoid escaping of special chars in `UnresolvedTableSpec.location`. This field should contain the original user-provided `path` option and not the URI that is constructed by the `buildStorageFormatFromOptions()` call.
In addition this commit extends `SparkFunSuite` and `SQLTestUtils` to allow creating temporary directories with a custom prefix. This can be used to create temporary directories with special chars.
Bug fix. The following code would result in the creation of a table that is stored in `/tmp/test%table` instead of `/tmp/test table`:
```
spark.catalog.createTable("testTable", source = "parquet", schema = new StructType().add("id", "int"), description = "", options = Map("path" -> "/tmp/test table"))
```
Note that this was not consistent with the SQL API, e.g. `create table testTable(id int) using parquet location '/tmp/test table'`
Yes. The previous behaviour would result in table path be escaped. After this change the path will not be escaped.
Updated existing test in `CatalogSuite`.
No
Closes apache#47976 from cstavr/location-double-escaping.
Authored-by: Christos Stavrakakis <[email protected]>
(cherry picked from commit dc3333b)
Signed-off-by: Wenchen Fan <[email protected]>
Co-authored-by: Christos Stavrakakis <[email protected]>
What changes were proposed in this pull request?
Change the implementation of
createTableto avoid escaping of special chars inUnresolvedTableSpec.location. This field should contain the original user-providedpathoption and not the URI that is constructed by thebuildStorageFormatFromOptions()call.In addition this commit extends
SparkFunSuiteandSQLTestUtilsto allow creating temporary directories with a custom prefix. This can be used to create temporary directories with special chars.Why are the changes needed?
Bug fix. The following code would result in the creation of a table that is stored in
/tmp/test%tableinstead of/tmp/test table:Note that this was not consistent with the SQL API, e.g.
create table testTable(id int) using parquet location '/tmp/test table'Does this PR introduce any user-facing change?
Yes. The previous behaviour would result in table path be escaped. After this change the path will not be escaped.
How was this patch tested?
Updated existing test in
CatalogSuite.Was this patch authored or co-authored using generative AI tooling?
No