Skip to content

Conversation

@ueshin
Copy link
Member

@ueshin ueshin commented Aug 19, 2014

JIRA:

Currently this uses the following Parquet schema for MapType when valueContainsNull is true:

message root {
  optional group a (MAP) {
    repeated group map (MAP_KEY_VALUE) {
      required int32 key;
      optional int32 value;
    }
  }
}

for ArrayType when containsNull is true:

message root {
  optional group a (LIST) {
    repeated group bag {
      optional int32 array;
    }
  }
}

We have to think about compatibilities with older version of Spark or Hive or others I mentioned in the JIRA issues.

Notice:
This PR is based on #1963 and #1889.
Please check them first.

/cc @marmbrus, @yhuai

ueshin added 4 commits August 19, 2014 17:41
Squashed commit of the following:

commit 3ba41f2
Merge: 4d7bae2 cd0720c
Author: Takuya UESHIN <[email protected]>
Date:   Tue Aug 19 16:41:05 2014 +0900

    Merge branch 'master' into issues/SPARK-3063

commit 4d7bae2
Merge: 9321379 c77f406
Author: Takuya UESHIN <[email protected]>
Date:   Mon Aug 18 14:45:25 2014 +0900

    Merge branch 'master' into issues/SPARK-3063

commit 9321379
Merge: d8a900a cc36487
Author: Takuya UESHIN <[email protected]>
Date:   Sat Aug 16 09:14:04 2014 +0900

    Merge branch 'master' into issues/SPARK-3063

commit d8a900a
Author: Takuya UESHIN <[email protected]>
Date:   Fri Aug 15 15:48:52 2014 +0900

    Make ExistingRdd.convertToCatalyst be able to convert Map value.
Squashed commit of the following:

commit 24f1c5c
Author: Takuya UESHIN <[email protected]>
Date:   Tue Aug 12 19:41:10 2014 +0900

    Change the default value of ArrayType.containsNull to true in Python API.

commit 79f5b65
Author: Takuya UESHIN <[email protected]>
Date:   Tue Aug 12 19:40:39 2014 +0900

    Change the default value of ArrayType.containsNull to true in Java API.

commit 7cd1a7a
Author: Takuya UESHIN <[email protected]>
Date:   Tue Aug 12 17:10:03 2014 +0900

    Fix json test failures.

commit 2cfb862
Author: Takuya UESHIN <[email protected]>
Date:   Tue Aug 12 15:13:15 2014 +0900

    Change the default value of ArrayType.containsNull to true.

commit 2f38e61
Author: Takuya UESHIN <[email protected]>
Date:   Tue Aug 12 15:06:39 2014 +0900

    Revert the default value of MapTypes.valueContainsNull.

commit 9fa02f5
Author: Takuya UESHIN <[email protected]>
Date:   Mon Aug 11 23:34:42 2014 +0900

    Fix a test failure.

commit 1a9a96b
Author: Takuya UESHIN <[email protected]>
Date:   Mon Aug 11 19:46:24 2014 +0900

    Modify ScalaReflection to handle ArrayType.containsNull and MapType.valueContainsNull.
@SparkQA
Copy link

SparkQA commented Aug 19, 2014

QA tests have started for PR 2032 at commit 4e8e9e7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 19, 2014

QA tests have finished for PR 2032 at commit 4e8e9e7.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

test this please

@pwendell
Copy link
Contributor

Jenkins, test this pleas.e

@pwendell
Copy link
Contributor

Jenkins, test this please.

@ueshin ueshin changed the title [WIP][SPARK-3036][SPARK-3037][SQL] Add MapType/ArrayType containing null value support to Parquet. [SPARK-3036][SPARK-3037][SQL] Add MapType/ArrayType containing null value support to Parquet. Aug 21, 2014
@marmbrus
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 26, 2014

QA tests have started for PR 2032 at commit 4e8e9e7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 26, 2014

QA tests have finished for PR 2032 at commit 4e8e9e7.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 26, 2014

QA tests have started for PR 2032 at commit 4e8e9e7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 26, 2014

QA tests have finished for PR 2032 at commit 4e8e9e7.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Thanks! I've merged this to master and 1.1.

asfgit pushed a commit that referenced this pull request Aug 27, 2014
…alue support to Parquet.

JIRA:
- https://issues.apache.org/jira/browse/SPARK-3036
- https://issues.apache.org/jira/browse/SPARK-3037

Currently this uses the following Parquet schema for `MapType` when `valueContainsNull` is `true`:

```
message root {
  optional group a (MAP) {
    repeated group map (MAP_KEY_VALUE) {
      required int32 key;
      optional int32 value;
    }
  }
}
```

for `ArrayType` when `containsNull` is `true`:

```
message root {
  optional group a (LIST) {
    repeated group bag {
      optional int32 array;
    }
  }
}
```

We have to think about compatibilities with older version of Spark or Hive or others I mentioned in the JIRA issues.

Notice:
This PR is based on #1963 and #1889.
Please check them first.

/cc marmbrus, yhuai

Author: Takuya UESHIN <[email protected]>

Closes #2032 from ueshin/issues/SPARK-3036_3037 and squashes the following commits:

4e8e9e7 [Takuya UESHIN] Add ArrayType containing null value support to Parquet.
013c2ca [Takuya UESHIN] Add MapType containing null value support to Parquet.
62989de [Takuya UESHIN] Merge branch 'issues/SPARK-2969' into issues/SPARK-3036_3037
8e38b53 [Takuya UESHIN] Merge branch 'issues/SPARK-3063' into issues/SPARK-3036_3037

(cherry picked from commit 727cb25)
Signed-off-by: Michael Armbrust <[email protected]>
@asfgit asfgit closed this in 727cb25 Aug 27, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…alue support to Parquet.

JIRA:
- https://issues.apache.org/jira/browse/SPARK-3036
- https://issues.apache.org/jira/browse/SPARK-3037

Currently this uses the following Parquet schema for `MapType` when `valueContainsNull` is `true`:

```
message root {
  optional group a (MAP) {
    repeated group map (MAP_KEY_VALUE) {
      required int32 key;
      optional int32 value;
    }
  }
}
```

for `ArrayType` when `containsNull` is `true`:

```
message root {
  optional group a (LIST) {
    repeated group bag {
      optional int32 array;
    }
  }
}
```

We have to think about compatibilities with older version of Spark or Hive or others I mentioned in the JIRA issues.

Notice:
This PR is based on apache#1963 and apache#1889.
Please check them first.

/cc marmbrus, yhuai

Author: Takuya UESHIN <[email protected]>

Closes apache#2032 from ueshin/issues/SPARK-3036_3037 and squashes the following commits:

4e8e9e7 [Takuya UESHIN] Add ArrayType containing null value support to Parquet.
013c2ca [Takuya UESHIN] Add MapType containing null value support to Parquet.
62989de [Takuya UESHIN] Merge branch 'issues/SPARK-2969' into issues/SPARK-3036_3037
8e38b53 [Takuya UESHIN] Merge branch 'issues/SPARK-3063' into issues/SPARK-3036_3037
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants