-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2809] Introduce a checksum mechanism for validating hoodie.properties #4712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
Outdated
Show resolved
Hide resolved
|
can you help understand this scenario:
|
|
|
btw, we might need to add the checksum property as part of the upgrade. |
53074e0 to
f402ca3
Compare
952302a to
39e9fb7
Compare
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
Outdated
Show resolved
Hide resolved
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like for us to solve 2 partial write failure cases.
Case1:
you take backup.
delete original props file
update in memory props file.
write to original props file: mid way process crashed.
With this, how do we ensure readers get routed to backup file which is prestine and not the original file which is corrupt.
Case2:
when taking backup, let's say process crashed. but file was partially written.
Guess here we don't need to worry, bcoz, original file is still intact and readers will anyways read the original prop file.
Also, do we need to add checksum to prop file with an explicit upgrade step ?
39e9fb7 to
e7d96f2
Compare
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
Outdated
Show resolved
Hide resolved
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, can you update the description of the patch (for eg, it does not cover upgrade info)
e7d96f2 to
13f8523
Compare
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 nit. once addressed you can land. also there are some CI failures. do check it out.
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
Outdated
Show resolved
Hide resolved
…erties Address review comments Exclude a dependency to avoid conflict Fix dependency conflict Fix repairs command Implement putIfAbsent for DDB lock provider Add upgrade step and validate while fetching configs Validate checksum for latest table version only while fetching config Move generateChecksum to BinaryUtil Rebase and resolve conflict Fix table version check
13f8523 to
d2e24f8
Compare
|
CI was a build failure due to maven download timing out. It succeeded in the latest commit. |
…erties (apache#4712) Fix dependency conflict Fix repairs command Implement putIfAbsent for DDB lock provider Add upgrade step and validate while fetching configs Validate checksum for latest table version only while fetching config Move generateChecksum to BinaryUtil Rebase and resolve conflict Fix table version check
What is the purpose of the pull request
To detect partial writes on HDFS, this PR adds a new property which gets appended at the end of hoodie.properties file while creating or modiying table config. The value of the property is CRC of <database_name>.<table_name>. The PR also changes
TypedPropertiesto maintain order of insertion.Brief change log
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.