Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Aug 11, 2020

What changes were proposed in this pull request?

Fix DaysWritable by overriding parent's method def get(doesTimeMatter: Boolean): Date from DateWritable instead of Date get() because the former one uses the first one. The bug occurs because HiveOutputWriter.write() call def get(doesTimeMatter: Boolean): Date transitively with default implementation from the parent class DateWritable which doesn't respect date rebases and uses not initialized daysSinceEpoch (0 which is 1970-01-01).

Why are the changes needed?

The changes fix the bug:

spark-sql> CREATE TABLE table1 (d date);
spark-sql> INSERT INTO table1 VALUES (date '2020-08-11');
spark-sql> SELECT * FROM table1;
1970-01-01

The expected result of the last SQL statement must be 2020-08-11 but got 1970-01-01.

Does this PR introduce any user-facing change?

Yes. After the fix, INSERT work correctly:

spark-sql> SELECT * FROM table1;
2020-08-11

How was this patch tested?

Add new test to HiveSerDeReadWriteSuite

@SparkQA
Copy link

SparkQA commented Aug 12, 2020

Test build #127347 has finished for PR 29409 at commit 713f6ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.0.

HyukjinKwon pushed a commit that referenced this pull request Aug 12, 2020
### What changes were proposed in this pull request?
Fix `DaysWritable` by overriding parent's method `def get(doesTimeMatter: Boolean): Date` from `DateWritable` instead of `Date get()` because the former one uses the first one. The bug occurs because `HiveOutputWriter.write()` call `def get(doesTimeMatter: Boolean): Date` transitively with default implementation from the parent class  `DateWritable` which doesn't respect date rebases and uses not initialized `daysSinceEpoch` (0 which `1970-01-01`).

### Why are the changes needed?
The changes fix the bug:
```sql
spark-sql> CREATE TABLE table1 (d date);
spark-sql> INSERT INTO table1 VALUES (date '2020-08-11');
spark-sql> SELECT * FROM table1;
1970-01-01
```
The expected result of the last SQL statement must be **2020-08-11** but got **1970-01-01**.

### Does this PR introduce _any_ user-facing change?
Yes. After the fix, `INSERT` work correctly:
```sql
spark-sql> SELECT * FROM table1;
2020-08-11
```

### How was this patch tested?
Add new test to `HiveSerDeReadWriteSuite`

Closes #29409 from MaxGekk/insert-date-into-hive-table.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit 0477d23)
Signed-off-by: HyukjinKwon <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Aug 12, 2020
…eReadWriteSuite`

### What changes were proposed in this pull request?
- Test TEXTFILE together with the PARQUET and ORC file formats in `HiveSerDeReadWriteSuite`
- Remove the "SPARK-32594: insert dates to a Hive table" added by #29409

### Why are the changes needed?
- To improve test coverage, and test other row SerDe - `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe`.
- The removed test is not needed anymore because the bug reported in SPARK-32594 is triggered by the TEXTFILE file format too.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the modified test suite `HiveSerDeReadWriteSuite`.

Closes #29417 from MaxGekk/textfile-HiveSerDeReadWriteSuite.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Aug 12, 2020
…eReadWriteSuite`

### What changes were proposed in this pull request?
- Test TEXTFILE together with the PARQUET and ORC file formats in `HiveSerDeReadWriteSuite`
- Remove the "SPARK-32594: insert dates to a Hive table" added by #29409

### Why are the changes needed?
- To improve test coverage, and test other row SerDe - `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe`.
- The removed test is not needed anymore because the bug reported in SPARK-32594 is triggered by the TEXTFILE file format too.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the modified test suite `HiveSerDeReadWriteSuite`.

Closes #29417 from MaxGekk/textfile-HiveSerDeReadWriteSuite.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit f664aaa)
Signed-off-by: HyukjinKwon <[email protected]>
@MaxGekk MaxGekk deleted the insert-date-into-hive-table branch December 11, 2020 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants