-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2471] Add support ignoring case in merge into #3700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@pengzhiwei2018 @xushiyan : can either of you take care of reviewing this please. |
xushiyan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. cc @YannByron
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the table field is defined in uppercase letters, does that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YannByron yes,it can work in column name matching.Do I need to add a test case for upper case column name definition?
However, ignoring case matching has not been implemented in condition and action. I think we should support it. Do I submit another PR or support it in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if you can solve all case matching including source/target schema, condition and action in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll try to solve it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YannByron Hello, I have modified the code to add support for ignoring case matching including condition and action.Can you please take a look?
...ark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestMergeIntoTable2.scala
Outdated
Show resolved
Hide resolved
|
@dongkelun is it same with yours? |
Sorry, I don't know what UT is or how to use it. Is UTC time set or something else? |
I mean UT is unit test. sql is run as followed : create table h0 ( id int, name string, price double, ts long, dt string) using hudi options (primaryKey ='id', preCombineField = 'ts'); merge into h0 as t0 using (select 1 as ID, 'a1' as NAME, 1111 as TS, '2021-05-05' as DT, 111 as PRICE) as s0 on t0.id = s0.id when matched then update set * when not matched then insert *; |
I haven't encountered such an exception. I can run unit test cases in the local Windows environment. I tried your SQL and there was no problem. It can run normally |
@YannByron hello,I encountered a similar problem when I added test cases for condition and action to ignore case matching Caused by: java.lang.RuntimeException: Error in execute expression: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Double.
Expressions is: [boundreference() AS `ID` boundreference() AS `NAME` (CAST(boundreference() AS DOUBLE) + boundreference()) AS `PRICE` CAST(boundreference() AS `TS` AS BIGINT) boundreference() AS `DT`] |
|
@dongkelun Sorry, I personally think the previous solution to lowercase fields may not be the best and most correct one. We should dig into the root cause. |
In column name matching, I can't think of any better method except to lowercase fields. Can you provide some ideas? |
For
Is this solved? |
|
@hudi-bot run azure |
|
@hudi-bot run azure |
|
@YannByron Hello, I have replaced |
...datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
Outdated
Show resolved
Hide resolved
...udi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala
Outdated
Show resolved
Hide resolved
|
LGTM. |
|
@hudi-bot azure run |
|
@hudi-bot run azure |
xushiyan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongkelun thank you for the change. just have a small question on the test case.
Tips
What is the purpose of the pull request
Add support ignoring case when column name matches in merge into
Brief change log
(for example:)
Verify this pull request
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.