Skip to content

Conversation

@dongkelun
Copy link
Contributor

Tips

What is the purpose of the pull request

Add support ignoring case when column name matches in merge into

Brief change log

(for example:)

  • Add support ignoring case when column name matches in merge into

Verify this pull request

(example:)

  • Added unit test in TestMergeIntoTable2

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@nsivabalan
Copy link
Contributor

@pengzhiwei2018 @xushiyan : can either of you take care of reviewing this please.

Copy link
Member

@xushiyan xushiyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. cc @YannByron

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the table field is defined in uppercase letters, does that work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YannByron yes,it can work in column name matching.Do I need to add a test case for upper case column name definition?
However, ignoring case matching has not been implemented in condition and action. I think we should support it. Do I submit another PR or support it in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you can solve all case matching including source/target schema, condition and action in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll try to solve it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YannByron Hello, I have modified the code to add support for ignoring case matching including condition and action.Can you please take a look?

@YannByron
Copy link
Contributor

@dongkelun
I reproduce this using the table in UT in my local env, ClassCastException is raised. The detail trace stack:
Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer at org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_228eb15b_f549_4127_a201_a71c459c1f61.eval(Unknown Source) at org.apache.spark.sql.hudi.command.payload.ExpressionPayload.evaluate(ExpressionPayload.scala:258)

is it same with yours?

@dongkelun
Copy link
Contributor Author

@dongkelun I reproduce this using the table in UT in my local env, ClassCastException is raised. The detail trace stack: Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer at org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_228eb15b_f549_4127_a201_a71c459c1f61.eval(Unknown Source) at org.apache.spark.sql.hudi.command.payload.ExpressionPayload.evaluate(ExpressionPayload.scala:258)

is it same with yours?

Sorry, I don't know what UT is or how to use it. Is UTC time set or something else?

@YannByron
Copy link
Contributor

YannByron commented Oct 25, 2021

@dongkelun I reproduce this using the table in UT in my local env, ClassCastException is raised. The detail trace stack: Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer at org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_228eb15b_f549_4127_a201_a71c459c1f61.eval(Unknown Source) at org.apache.spark.sql.hudi.command.payload.ExpressionPayload.evaluate(ExpressionPayload.scala:258)
is it same with yours?

Sorry, I don't know what UT is or how to use it. Is UTC time set or something else?

I mean UT is unit test. sql is run as followed :

create table h0 ( id int, name string, price double, ts long, dt string) using hudi options (primaryKey ='id', preCombineField = 'ts');

merge into h0 as t0 using (select 1 as ID, 'a1' as NAME, 1111 as TS, '2021-05-05' as DT, 111 as PRICE) as s0 on t0.id = s0.id when matched then update set * when not matched then insert *;

@dongkelun
Copy link
Contributor Author

@dongkelun I reproduce this using the table in UT in my local env, ClassCastException is raised. The detail trace stack: Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer at org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_228eb15b_f549_4127_a201_a71c459c1f61.eval(Unknown Source) at org.apache.spark.sql.hudi.command.payload.ExpressionPayload.evaluate(ExpressionPayload.scala:258)
is it same with yours?

Sorry, I don't know what UT is or how to use it. Is UTC time set or something else?

I mean UT is unit test. sql is run as followed :

create table h0 ( id int, name string, price double, ts long, dt string) using hudi options (primaryKey ='id', preCombineField = 'ts');

merge into h0 as t0 using (select 1 as ID, 'a1' as NAME, 1111 as TS, '2021-05-05' as DT, 111 as PRICE) as s0 on t0.id = s0.id when matched then update set * when not matched then insert *;

I haven't encountered such an exception. I can run unit test cases in the local Windows environment. I tried your SQL and there was no problem. It can run normally

@dongkelun
Copy link
Contributor Author

@dongkelun I reproduce this using the table in UT in my local env, ClassCastException is raised. The detail trace stack: Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer at org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_228eb15b_f549_4127_a201_a71c459c1f61.eval(Unknown Source) at org.apache.spark.sql.hudi.command.payload.ExpressionPayload.evaluate(ExpressionPayload.scala:258)
is it same with yours?

Sorry, I don't know what UT is or how to use it. Is UTC time set or something else?

I mean UT is unit test. sql is run as followed :

create table h0 ( id int, name string, price double, ts long, dt string) using hudi options (primaryKey ='id', preCombineField = 'ts');

merge into h0 as t0 using (select 1 as ID, 'a1' as NAME, 1111 as TS, '2021-05-05' as DT, 111 as PRICE) as s0 on t0.id = s0.id when matched then update set * when not matched then insert *;

@YannByron hello,I encountered a similar problem when I added test cases for condition and action to ignore case matching

Caused by: java.lang.RuntimeException: Error in execute expression: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Double.
Expressions is: [boundreference() AS `ID`  boundreference() AS `NAME`  (CAST(boundreference() AS DOUBLE) + boundreference()) AS `PRICE`  CAST(boundreference() AS `TS` AS BIGINT)  boundreference() AS `DT`]

@dongkelun dongkelun changed the title [HUDI-2471] Add support ignoring case when column name matches in merge into [HUDI-2471] Add support ignoring case in merge into Oct 26, 2021
@YannByron
Copy link
Contributor

@dongkelun Sorry, I personally think the previous solution to lowercase fields may not be the best and most correct one. We should dig into the root cause.

@dongkelun
Copy link
Contributor Author

@dongkelun Sorry, I personally think the previous solution to lowercase fields may not be the best and most correct one. We should dig into the root cause.

In column name matching, I can't think of any better method except to lowercase fields. Can you provide some ideas?

@YannByron
Copy link
Contributor

@dongkelun Sorry, I personally think the previous solution to lowercase fields may not be the best and most correct one. We should dig into the root cause.

In column name matching, I can't think of any better method except to lowercase fields. Can you provide some ideas?

For

@dongkelun I reproduce this using the table in UT in my local env, ClassCastException is raised. The detail trace stack: Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer at org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_228eb15b_f549_4127_a201_a71c459c1f61.eval(Unknown Source) at org.apache.spark.sql.hudi.command.payload.ExpressionPayload.evaluate(ExpressionPayload.scala:258)
is it same with yours?

Sorry, I don't know what UT is or how to use it. Is UTC time set or something else?

I mean UT is unit test. sql is run as followed :
create table h0 ( id int, name string, price double, ts long, dt string) using hudi options (primaryKey ='id', preCombineField = 'ts');
merge into h0 as t0 using (select 1 as ID, 'a1' as NAME, 1111 as TS, '2021-05-05' as DT, 111 as PRICE) as s0 on t0.id = s0.id when matched then update set * when not matched then insert *;

@YannByron hello,I encountered a similar problem when I added test cases for condition and action to ignore case matching

Caused by: java.lang.RuntimeException: Error in execute expression: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Double.
Expressions is: [boundreference() AS `ID`  boundreference() AS `NAME`  (CAST(boundreference() AS DOUBLE) + boundreference()) AS `PRICE`  CAST(boundreference() AS `TS` AS BIGINT)  boundreference() AS `DT`]

Is this solved?

@dongkelun
Copy link
Contributor Author

@hudi-bot run azure

@dongkelun
Copy link
Contributor Author

@hudi-bot run azure

@dongkelun
Copy link
Contributor Author

@YannByron Hello, I have replaced toLowerCase with resolver, can you please take a look?

@YannByron
Copy link
Contributor

LGTM.

@nsivabalan
Copy link
Contributor

@hudi-bot azure run

@dongkelun
Copy link
Contributor Author

@hudi-bot run azure

Copy link
Member

@xushiyan xushiyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongkelun thank you for the change. just have a small question on the test case.

@hudi-bot
Copy link
Collaborator

hudi-bot commented Nov 5, 2021

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@xushiyan xushiyan merged commit 844346c into apache:master Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants