-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32594][SQL] Fix serialization of dates inserted to Hive tables #29409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Test build #127347 has finished for PR 29409 at commit
|
HyukjinKwon
approved these changes
Aug 12, 2020
Member
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Member
|
Merged to master and branch-3.0. |
HyukjinKwon
pushed a commit
that referenced
this pull request
Aug 12, 2020
### What changes were proposed in this pull request? Fix `DaysWritable` by overriding parent's method `def get(doesTimeMatter: Boolean): Date` from `DateWritable` instead of `Date get()` because the former one uses the first one. The bug occurs because `HiveOutputWriter.write()` call `def get(doesTimeMatter: Boolean): Date` transitively with default implementation from the parent class `DateWritable` which doesn't respect date rebases and uses not initialized `daysSinceEpoch` (0 which `1970-01-01`). ### Why are the changes needed? The changes fix the bug: ```sql spark-sql> CREATE TABLE table1 (d date); spark-sql> INSERT INTO table1 VALUES (date '2020-08-11'); spark-sql> SELECT * FROM table1; 1970-01-01 ``` The expected result of the last SQL statement must be **2020-08-11** but got **1970-01-01**. ### Does this PR introduce _any_ user-facing change? Yes. After the fix, `INSERT` work correctly: ```sql spark-sql> SELECT * FROM table1; 2020-08-11 ``` ### How was this patch tested? Add new test to `HiveSerDeReadWriteSuite` Closes #29409 from MaxGekk/insert-date-into-hive-table. Authored-by: Max Gekk <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit 0477d23) Signed-off-by: HyukjinKwon <[email protected]>
HyukjinKwon
pushed a commit
that referenced
this pull request
Aug 12, 2020
…eReadWriteSuite` ### What changes were proposed in this pull request? - Test TEXTFILE together with the PARQUET and ORC file formats in `HiveSerDeReadWriteSuite` - Remove the "SPARK-32594: insert dates to a Hive table" added by #29409 ### Why are the changes needed? - To improve test coverage, and test other row SerDe - `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe`. - The removed test is not needed anymore because the bug reported in SPARK-32594 is triggered by the TEXTFILE file format too. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the modified test suite `HiveSerDeReadWriteSuite`. Closes #29417 from MaxGekk/textfile-HiveSerDeReadWriteSuite. Authored-by: Max Gekk <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
HyukjinKwon
pushed a commit
that referenced
this pull request
Aug 12, 2020
…eReadWriteSuite` ### What changes were proposed in this pull request? - Test TEXTFILE together with the PARQUET and ORC file formats in `HiveSerDeReadWriteSuite` - Remove the "SPARK-32594: insert dates to a Hive table" added by #29409 ### Why are the changes needed? - To improve test coverage, and test other row SerDe - `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe`. - The removed test is not needed anymore because the bug reported in SPARK-32594 is triggered by the TEXTFILE file format too. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the modified test suite `HiveSerDeReadWriteSuite`. Closes #29417 from MaxGekk/textfile-HiveSerDeReadWriteSuite. Authored-by: Max Gekk <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit f664aaa) Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Fix
DaysWritableby overriding parent's methoddef get(doesTimeMatter: Boolean): DatefromDateWritableinstead ofDate get()because the former one uses the first one. The bug occurs becauseHiveOutputWriter.write()calldef get(doesTimeMatter: Boolean): Datetransitively with default implementation from the parent classDateWritablewhich doesn't respect date rebases and uses not initializeddaysSinceEpoch(0 which is1970-01-01).Why are the changes needed?
The changes fix the bug:
The expected result of the last SQL statement must be 2020-08-11 but got 1970-01-01.
Does this PR introduce any user-facing change?
Yes. After the fix,
INSERTwork correctly:How was this patch tested?
Add new test to
HiveSerDeReadWriteSuite