-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3036][SPARK-3037][SQL] Add MapType/ArrayType containing null value support to Parquet. #2032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Squashed commit of the following: commit 3ba41f2 Merge: 4d7bae2 cd0720c Author: Takuya UESHIN <[email protected]> Date: Tue Aug 19 16:41:05 2014 +0900 Merge branch 'master' into issues/SPARK-3063 commit 4d7bae2 Merge: 9321379 c77f406 Author: Takuya UESHIN <[email protected]> Date: Mon Aug 18 14:45:25 2014 +0900 Merge branch 'master' into issues/SPARK-3063 commit 9321379 Merge: d8a900a cc36487 Author: Takuya UESHIN <[email protected]> Date: Sat Aug 16 09:14:04 2014 +0900 Merge branch 'master' into issues/SPARK-3063 commit d8a900a Author: Takuya UESHIN <[email protected]> Date: Fri Aug 15 15:48:52 2014 +0900 Make ExistingRdd.convertToCatalyst be able to convert Map value.
Squashed commit of the following: commit 24f1c5c Author: Takuya UESHIN <[email protected]> Date: Tue Aug 12 19:41:10 2014 +0900 Change the default value of ArrayType.containsNull to true in Python API. commit 79f5b65 Author: Takuya UESHIN <[email protected]> Date: Tue Aug 12 19:40:39 2014 +0900 Change the default value of ArrayType.containsNull to true in Java API. commit 7cd1a7a Author: Takuya UESHIN <[email protected]> Date: Tue Aug 12 17:10:03 2014 +0900 Fix json test failures. commit 2cfb862 Author: Takuya UESHIN <[email protected]> Date: Tue Aug 12 15:13:15 2014 +0900 Change the default value of ArrayType.containsNull to true. commit 2f38e61 Author: Takuya UESHIN <[email protected]> Date: Tue Aug 12 15:06:39 2014 +0900 Revert the default value of MapTypes.valueContainsNull. commit 9fa02f5 Author: Takuya UESHIN <[email protected]> Date: Mon Aug 11 23:34:42 2014 +0900 Fix a test failure. commit 1a9a96b Author: Takuya UESHIN <[email protected]> Date: Mon Aug 11 19:46:24 2014 +0900 Modify ScalaReflection to handle ArrayType.containsNull and MapType.valueContainsNull.
|
QA tests have started for PR 2032 at commit
|
|
QA tests have finished for PR 2032 at commit
|
|
test this please |
|
Jenkins, test this pleas.e |
|
Jenkins, test this please. |
|
Jenkins, test this please. |
|
QA tests have started for PR 2032 at commit
|
|
QA tests have finished for PR 2032 at commit
|
|
Jenkins, test this please. |
|
QA tests have started for PR 2032 at commit
|
|
QA tests have finished for PR 2032 at commit
|
|
Thanks! I've merged this to master and 1.1. |
…alue support to Parquet. JIRA: - https://issues.apache.org/jira/browse/SPARK-3036 - https://issues.apache.org/jira/browse/SPARK-3037 Currently this uses the following Parquet schema for `MapType` when `valueContainsNull` is `true`: ``` message root { optional group a (MAP) { repeated group map (MAP_KEY_VALUE) { required int32 key; optional int32 value; } } } ``` for `ArrayType` when `containsNull` is `true`: ``` message root { optional group a (LIST) { repeated group bag { optional int32 array; } } } ``` We have to think about compatibilities with older version of Spark or Hive or others I mentioned in the JIRA issues. Notice: This PR is based on #1963 and #1889. Please check them first. /cc marmbrus, yhuai Author: Takuya UESHIN <[email protected]> Closes #2032 from ueshin/issues/SPARK-3036_3037 and squashes the following commits: 4e8e9e7 [Takuya UESHIN] Add ArrayType containing null value support to Parquet. 013c2ca [Takuya UESHIN] Add MapType containing null value support to Parquet. 62989de [Takuya UESHIN] Merge branch 'issues/SPARK-2969' into issues/SPARK-3036_3037 8e38b53 [Takuya UESHIN] Merge branch 'issues/SPARK-3063' into issues/SPARK-3036_3037 (cherry picked from commit 727cb25) Signed-off-by: Michael Armbrust <[email protected]>
…alue support to Parquet. JIRA: - https://issues.apache.org/jira/browse/SPARK-3036 - https://issues.apache.org/jira/browse/SPARK-3037 Currently this uses the following Parquet schema for `MapType` when `valueContainsNull` is `true`: ``` message root { optional group a (MAP) { repeated group map (MAP_KEY_VALUE) { required int32 key; optional int32 value; } } } ``` for `ArrayType` when `containsNull` is `true`: ``` message root { optional group a (LIST) { repeated group bag { optional int32 array; } } } ``` We have to think about compatibilities with older version of Spark or Hive or others I mentioned in the JIRA issues. Notice: This PR is based on apache#1963 and apache#1889. Please check them first. /cc marmbrus, yhuai Author: Takuya UESHIN <[email protected]> Closes apache#2032 from ueshin/issues/SPARK-3036_3037 and squashes the following commits: 4e8e9e7 [Takuya UESHIN] Add ArrayType containing null value support to Parquet. 013c2ca [Takuya UESHIN] Add MapType containing null value support to Parquet. 62989de [Takuya UESHIN] Merge branch 'issues/SPARK-2969' into issues/SPARK-3036_3037 8e38b53 [Takuya UESHIN] Merge branch 'issues/SPARK-3063' into issues/SPARK-3036_3037
JIRA:
Currently this uses the following Parquet schema for
MapTypewhenvalueContainsNullistrue:for
ArrayTypewhencontainsNullistrue:We have to think about compatibilities with older version of Spark or Hive or others I mentioned in the JIRA issues.
Notice:
This PR is based on #1963 and #1889.
Please check them first.
/cc @marmbrus, @yhuai