-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26071][SQL] disallow map as map key #23045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #98866 has finished for PR 23045 at commit
|
| - The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set. | ||
|
|
||
| - In Spark version 2.4 and earlier, users can create map values with map type key via built-in function like `CreateMap`, `MapFromArrays`, etc. Since Spark 3.0, it's not allowed to create map values with map type key. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think should we also add this note?
Note that, maps with map type key still exist, via reading from parquet files, converting from scala/java map, etc. Spark is not to completely forbid map as map key, but to avoid creating it by Spark itself.
|
|
||
| def checkForMapKeyType(keyType: DataType): TypeCheckResult = { | ||
| if (keyType.existsRecursively(_.isInstanceOf[MapType])) { | ||
| TypeCheckResult.TypeCheckFailure("The key of map cannot be/contains map.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. contains -> contain
|
Test build #98901 has finished for PR 23045 at commit
|
|
Test build #98922 has finished for PR 23045 at commit
|
|
Test build #98955 has finished for PR 23045 at commit
|
| if (sameTypeCheck.isFailure) { | ||
| sameTypeCheck | ||
| } else { | ||
| TypeUtils.checkForMapKeyType(dataType.keyType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this. The children already should not have map type keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see https://github.com/apache/spark/pull/23045/files#diff-3f19ec3d15dcd8cd42bb25dde1c5c1a9R20 . The child may be read from parquet files, so map of map is still possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I see. thanks!
|
LGTM. |
|
thanks, merging to master! |
## What changes were proposed in this pull request? Due to implementation limitation, currently Spark can't compare or do equality check between map types. As a result, map values can't appear in EQUAL or comparison expressions, can't be grouping key, etc. The more important thing is, map loop up needs to do equality check of the map key, and thus can't support map as map key when looking up values from a map. Thus it's not useful to have map as map key. This PR proposes to stop users from creating maps using map type as key. The list of expressions that are updated: `CreateMap`, `MapFromArrays`, `MapFromEntries`, `MapConcat`, `TransformKeys`. I manually checked all the places that create `MapType`, and came up with this list. Note that, maps with map type key still exist, via reading from parquet files, converting from scala/java map, etc. This PR is not to completely forbid map as map key, but to avoid creating it by Spark itself. Motivation: when I was trying to fix the duplicate key problem, I found it's impossible to do it with map type map key. I think it's reasonable to avoid map type map key for builtin functions. ## How was this patch tested? updated test Closes apache#23045 from cloud-fan/map-key. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Due to implementation limitation, currently Spark can't compare or do equality check between map types. As a result, map values can't appear in EQUAL or comparison expressions, can't be grouping key, etc.
The more important thing is, map loop up needs to do equality check of the map key, and thus can't support map as map key when looking up values from a map. Thus it's not useful to have map as map key.
This PR proposes to stop users from creating maps using map type as key. The list of expressions that are updated:
CreateMap,MapFromArrays,MapFromEntries,MapConcat,TransformKeys. I manually checked all the places that createMapType, and came up with this list.Note that, maps with map type key still exist, via reading from parquet files, converting from scala/java map, etc. This PR is not to completely forbid map as map key, but to avoid creating it by Spark itself.
Motivation: when I was trying to fix the duplicate key problem, I found it's impossible to do it with map type map key. I think it's reasonable to avoid map type map key for builtin functions.
How was this patch tested?
updated test