-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-5189] Make HiveAvroSerializer compatible with hive3 #7173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @xiarixiaoyao , please review if you are available. |
|
@hudi-bot run azure |
|
@xicm |
|
@xiarixiaoyao @xicm |
| return new Utf8(string); | ||
| case DATE: | ||
| return DateWritable.dateToDays(((DateObjectInspector)fieldOI).getPrimitiveJavaObject(structFieldData)); | ||
| return new DateWritable((DateWritable)structFieldData).getDays(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive3 will return DateWritableV2 for date type, we cannot convert DateWritableV2 to DateWritable direclty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh, I compile with hive 3.1.2 and run TestHiveAvroSerializer, the test is successful. Let me do more tests.
| Timestamp timestamp = | ||
| ((TimestampObjectInspector) fieldOI).getPrimitiveJavaObject(structFieldData); | ||
| return timestamp.getTime(); | ||
| return new TimestampWritable((TimestampWritable) structFieldData).getTimestamp().getTime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
yes, it will be better to deal with hive2/hive3 will a unifed way |
|
@xiarixiaoyao @cdmikechen |
|
Hi @cdmikechen , do you mind if I copy the code #3391 to this pr? |
|
Hi @xiarixiaoyao, I merged #3391 and tested with the case you described in #6741. |
|
@xiarixiaoyao @xicm |
I have merged your code , but there is still a problem with Timestamp, I'm debugging. |
my wechat 1037817390 |
|
@cdmikechen could you pls help review this pr, thanks |
|
@hudi-bot run azure |
|
@xiarixiaoyao Can we push-forward this PR again, seems s useful fix. |
| * @param realReader Parquet Reader | ||
| */ | ||
| public RealtimeUnmergedRecordReader(RealtimeSplit split, JobConf job, | ||
| RecordReader<NullWritable, ArrayWritable> realReader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not change the original format, thanks
| .withLogFilePaths(split.getDeltaLogPaths()) | ||
| .withReaderSchema(getReaderSchema()) | ||
| .withLatestInstantTime(split.getMaxCommitTime()) | ||
| .withReadBlocksLazily(Boolean.parseBoolean(this.jobConf.get(HoodieRealtimeConfig.COMPACTION_LAZY_BLOCK_READ_ENABLED_PROP, HoodieRealtimeConfig.DEFAULT_COMPACTION_LAZY_BLOCK_READ_ENABLED))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| return DateWritable.dateToDays(((DateObjectInspector)fieldOI).getPrimitiveJavaObject(structFieldData)); | ||
| try { | ||
| Class<?> clazz = structFieldData.getClass(); | ||
| return clazz.getMethod("getDays").invoke(structFieldData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has a significant impact on performance,
Let's avoid using reflection every iteration。
Let's do the same thing as HiveUtils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| public static final String DATE_WRITEABLE_V2_CLASS = "org.apache.hadoop.hive.serde2.io.DateWritableV2"; | ||
| public static final boolean SUPPORT_DATE_WRITEABLE_V2; | ||
| private static final Constructor DATE_WRITEABLE_V2_CONSTRUCTOR; | ||
| public static String DATE_WRITEABLE_V2_CLASS = "org.apache.hadoop.hive.serde2.io.DateWritableV2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final ?
|
LGTM |
|
@XuQianJin-Stars could you pls help review again |
|
@xicm Thanks for the contribution, can we squash the commits into one, it is hard for code reviewing because of the merge cmd, let's use the |
|
@hudi-bot run azure |
|
@xicm Do you have plan to push forward this PR again? |
Year, I resolved the conflict, review again when you are available. |
|
Thanks for the contribution, overall looks good now, I have reviewed and created a patch: |
|
Awesome job on landing this. I guess this has been one of the longest pending gaps we had w/ hudi. timestamp col not readable w/ hive3. thank you folks. |
| if (Arrays.stream(constructors) | ||
| .anyMatch(c -> c.getParameterCount() > 0 && c.getParameterTypes()[0] | ||
| .getName().equals(ParquetInputFormat.class.getName()))) { | ||
| supportAvroRead = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when will supportAvroRead be false? I add a pr for it~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zouxxyy You can read the comments. This is to address compatibility issues with spark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zouxxyy You can read the comments. This is to address compatibility issues with spark
yeal, but it always be true now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zouxxyy
In my impression, hive2 and spark3 have the same processing method and constructor, but there is a difference in hive3.
You can try switching hive to version 3 for verification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should revisit this method, the name ParquetInputFormat.class.getName() in hive2 is the same as hive3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for compatibility with spark,do you mean hive on spark? @cdmikechen
Change Logs
Compilation of HiveAvroSerializer fails with hive3.
Impact
none
Risk level (write none, low medium or high below)
low
If medium or high, explain what verification was done to mitigate the risks.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist