Skip to content

Conversation

@codope
Copy link
Member

@codope codope commented Jan 28, 2022

What is the purpose of the pull request

To detect partial writes on HDFS, this PR adds a new property which gets appended at the end of hoodie.properties file while creating or modiying table config. The value of the property is CRC of <database_name>.<table_name>. The PR also changes TypedProperties to maintain order of insertion.

Brief change log

  • Add a new table config.
  • Support the new config in UpgradeDowngrade.
  • Ensure insertion order in TypedProperties.

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@nsivabalan
Copy link
Contributor

can you help understand this scenario:

  1. lets say we take a backup. and delete original table config. start writing new table config w/ updates. but crashed mid -way. Until we restart the hudi writer, primary table config is in corrupted state and backup property file is in good state. So, during this, what does reader do? do they get routed to backup or original?
  2. If in above case, if crash happens mid-way, lets say out of 10 entries, only 5 got added, and if readers are using the primary copy to read table props, wouldn't they be reading partial table props?
    Probably this has nothing to do w/ the checksum patch I guess. but wanted to understand in general.

@codope
Copy link
Member Author

codope commented Feb 1, 2022

can you help understand this scenario:

  1. lets say we take a backup. and delete original table config. start writing new table config w/ updates. but crashed mid -way. Until we restart the hudi writer, primary table config is in corrupted state and backup property file is in good state. So, during this, what does reader do? do they get routed to backup or original?
  2. If in above case, if crash happens mid-way, lets say out of 10 entries, only 5 got added, and if readers are using the primary copy to read table props, wouldn't they be reading partial table props?
    Probably this has nothing to do w/ the checksum patch I guess. but wanted to understand in general.
  1. Get routed to backup props.
  2. You mean while writing props from backup file, some crash happened even before checksum match? In that case, yes partial files would be written to hdfs but the writer will fail due to IOException. Next time, when it comes up then props are recovered from the backup.

@nsivabalan
Copy link
Contributor

btw, we might need to add the checksum property as part of the upgrade.

@codope codope force-pushed the HUDI-2809-table-checksum branch 2 times, most recently from 53074e0 to f402ca3 Compare February 7, 2022 16:34
@nsivabalan nsivabalan added the priority:critical Production degraded; pipelines stalled label Feb 8, 2022
@codope codope force-pushed the HUDI-2809-table-checksum branch from 952302a to 39e9fb7 Compare February 8, 2022 10:30
@nsivabalan nsivabalan added priority:blocker Production down; release blocker and removed priority:critical Production degraded; pipelines stalled labels Feb 8, 2022
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like for us to solve 2 partial write failure cases.

Case1:
you take backup.
delete original props file
update in memory props file.
write to original props file: mid way process crashed.

With this, how do we ensure readers get routed to backup file which is prestine and not the original file which is corrupt.

Case2:
when taking backup, let's say process crashed. but file was partially written.

Guess here we don't need to worry, bcoz, original file is still intact and readers will anyways read the original prop file.

Also, do we need to add checksum to prop file with an explicit upgrade step ?

@codope codope force-pushed the HUDI-2809-table-checksum branch from 39e9fb7 to e7d96f2 Compare February 10, 2022 14:59
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, can you update the description of the patch (for eg, it does not cover upgrade info)

@codope codope force-pushed the HUDI-2809-table-checksum branch from e7d96f2 to 13f8523 Compare February 17, 2022 07:42
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 nit. once addressed you can land. also there are some CI failures. do check it out.

…erties

Address review comments

Exclude a dependency to avoid conflict

Fix dependency conflict

Fix repairs command

Implement putIfAbsent for DDB lock provider

Add upgrade step and validate while fetching configs

Validate checksum for latest table version only while fetching config

Move generateChecksum to BinaryUtil

Rebase and resolve conflict

Fix table version check
@codope codope force-pushed the HUDI-2809-table-checksum branch from 13f8523 to d2e24f8 Compare February 18, 2022 02:34
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope
Copy link
Member Author

codope commented Feb 18, 2022

CI was a build failure due to maven download timing out. It succeeded in the latest commit.

@codope codope merged commit ed106f6 into apache:master Feb 18, 2022
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
…erties (apache#4712)

Fix dependency conflict

Fix repairs command

Implement putIfAbsent for DDB lock provider

Add upgrade step and validate while fetching configs

Validate checksum for latest table version only while fetching config

Move generateChecksum to BinaryUtil

Rebase and resolve conflict

Fix table version check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants