-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2538] persist some configs to hoodie.properties when the first write #3823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@leesf @vinothchandar can you help to review this? |
5ce2404 to
c491ffd
Compare
...park-client/src/main/java/org/apache/hudi/keygen/factory/HoodieSparkKeyGeneratorFactory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between case "org.apache.hudi.keygen.ComplexAvroKeyGenerator": return "org.apache.hudi.keygen.ComplexKeyGenerator"; I would not get the point here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KenGenerators whose name isn't include 'Avro' extends SparkKeyGeneratorInterface which has method getRecordKey and getPartitionPath from Row. I wanna to make sure the KeyGenerator's instance works within Spark via convertToSparkKeyGenerator .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactor the method createKeyGenerator which wouldn't throw an exception with this message directly, but this message "Property hoodie.datasource.write.recordkey.field not found" exists in the exception's stacktrace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed. The parameters variable includes all default hoodie's parameters, and will be used as following codes. The optParams variable just used to validate whether there are conflicts between these and TableConfig.
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
Outdated
Show resolved
Hide resolved
791e312 to
8441914
Compare
|
@hudi-bot run azure |
1 similar comment
|
@hudi-bot run azure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need this change, TypedProperties should act as same as Properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because HoodieSparkUtils.getPartitionColumns use Properties as the input parameter type.
...udi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala
Outdated
Show resolved
Hide resolved
6c8875d to
a9673c0
Compare
a9673c0 to
38d68b3
Compare
What is the purpose of the pull request
We can specify different values to URL_ENCODE_PARTITIONING and HIVE_STYLE_PARTITIONING_ENABLE configs for multiple write operation without any warns or errors. And some partition path enable hive-style and some not, or some enable url-encode and some not, that is so weird.
So i wanna to persist these config to hoodie.properties when the first write (write data by spark dataframe the first time or create table by spark-sql). And then, uses do not need to specify these config any more. If these configs are specified and different to the existing value in hoodie.properties, exceptions will be raised.
And, this is also useful to solve some of the keyGenerator discrepancy issues between DataFrame writer and SQL.
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.