[HUDI-2538] persist some configs to hoodie.properties when the first write #3823

YannByron · 2021-10-19T15:06:13Z

What is the purpose of the pull request

We can specify different values to URL_ENCODE_PARTITIONING and HIVE_STYLE_PARTITIONING_ENABLE configs for multiple write operation without any warns or errors. And some partition path enable hive-style and some not, or some enable url-encode and some not, that is so weird.

So i wanna to persist these config to hoodie.properties when the first write (write data by spark dataframe the first time or create table by spark-sql). And then, uses do not need to specify these config any more. If these configs are specified and different to the existing value in hoodie.properties, exceptions will be raised.

And, this is also useful to solve some of the keyGenerator discrepancy issues between DataFrame writer and SQL.

Brief change log

(for example:)

Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end.
Added HoodieClientWriteTest to verify the change.
Manually verified the change by running a job locally.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

hudi-bot · 2021-10-19T15:39:50Z

CI report:

38d68b3 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run travis re-run the last Travis build
@hudi-bot run azure re-run the last Azure build

YannByron · 2021-10-20T09:59:13Z

@leesf @vinothchandar can you help to review this?

...park-client/src/main/java/org/apache/hudi/keygen/factory/HoodieSparkKeyGeneratorFactory.java

leesf · 2021-10-26T15:19:24Z

...park-client/src/main/java/org/apache/hudi/keygen/factory/HoodieSparkKeyGeneratorFactory.java

what's the difference between case "org.apache.hudi.keygen.ComplexAvroKeyGenerator": return "org.apache.hudi.keygen.ComplexKeyGenerator"; I would not get the point here.

KenGenerators whose name isn't include 'Avro' extends SparkKeyGeneratorInterface which has method getRecordKey and getPartitionPath from Row. I wanna to make sure the KeyGenerator's instance works within Spark via convertToSparkKeyGenerator .

leesf · 2021-10-26T15:21:19Z

hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestCustomKeyGenerator.java

why need the change here?

I refactor the method createKeyGenerator which wouldn't throw an exception with this message directly, but this message "Property hoodie.datasource.write.recordkey.field not found" exists in the exception's stacktrace.

leesf · 2021-10-26T15:21:27Z

hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestCustomKeyGenerator.java

leesf · 2021-10-26T15:25:20Z

hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala

no need to change?

It's needed. The parameters variable includes all default hoodie's parameters, and will be used as following codes. The optParams variable just used to validate whether there are conflicts between these and TableConfig.

hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala

hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java

YannByron · 2021-11-01T05:43:32Z

@hudi-bot run azure

YannByron · 2021-11-01T12:35:41Z

@hudi-bot run azure

leesf · 2021-11-01T13:33:56Z

hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala

why need this change, TypedProperties should act as same as Properties

Because HoodieSparkUtils.getPartitionColumns use Properties as the input parameter type.

...udi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala

… for writing

YannByron force-pushed the master_persist_configs branch from 5ce2404 to c491ffd Compare October 26, 2021 12:04

leesf reviewed Oct 26, 2021

View reviewed changes

...park-client/src/main/java/org/apache/hudi/keygen/factory/HoodieSparkKeyGeneratorFactory.java Outdated Show resolved Hide resolved

leesf reviewed Oct 26, 2021

View reviewed changes

hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestCustomKeyGenerator.java Outdated

Copy link

Contributor

leesf Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

leesf reviewed Oct 26, 2021

View reviewed changes

hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala Outdated Show resolved Hide resolved

leesf reviewed Oct 26, 2021

View reviewed changes

hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java Outdated Show resolved Hide resolved

YannByron force-pushed the master_persist_configs branch 5 times, most recently from 791e312 to 8441914 Compare October 27, 2021 16:48

leesf reviewed Nov 1, 2021

View reviewed changes

...udi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala Outdated Show resolved Hide resolved

YannByron force-pushed the master_persist_configs branch 2 times, most recently from 6c8875d to a9673c0 Compare November 1, 2021 14:52

YannByron added 3 commits November 2, 2021 18:07

[HUDI-2538] persist some configs to hoodie.properties and reuse these…

c89dca4

… for writing

[HUDI-2538] [MINOR] solve comments & support version upgrade

e9c7124

[HUDI-2538] [MINOR] solve comments

38d68b3

YannByron force-pushed the master_persist_configs branch from a9673c0 to 38d68b3 Compare November 2, 2021 10:07

leesf approved these changes Nov 3, 2021

View reviewed changes

leesf merged commit 6351e5f into apache:master Nov 3, 2021

[HUDI-2538] persist some configs to hoodie.properties when the first write #3823

[HUDI-2538] persist some configs to hoodie.properties when the first write #3823

Uh oh!

Conversation

YannByron commented Oct 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

Uh oh!

hudi-bot commented Oct 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

YannByron commented Oct 20, 2021

Uh oh!

Uh oh!

leesf Oct 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YannByron Oct 26, 2021

Choose a reason for hiding this comment

Uh oh!

leesf Oct 26, 2021

Choose a reason for hiding this comment

Uh oh!

YannByron Oct 26, 2021

Choose a reason for hiding this comment

Uh oh!

leesf Oct 26, 2021

Choose a reason for hiding this comment

Uh oh!

leesf Oct 26, 2021

Choose a reason for hiding this comment

Uh oh!

YannByron Oct 26, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

YannByron commented Nov 1, 2021

Uh oh!

YannByron commented Nov 1, 2021

Uh oh!

leesf Nov 1, 2021

Choose a reason for hiding this comment

Uh oh!

YannByron Nov 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YannByron commented Oct 19, 2021 •

edited

Loading

hudi-bot commented Oct 19, 2021 •

edited

Loading

leesf Oct 26, 2021 •

edited

Loading

YannByron Nov 1, 2021 •

edited

Loading