-
Notifications
You must be signed in to change notification settings - Fork 593
HDDS-8101. Add FSO repair tool to ozone CLI in read-only and repair modes. #6608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc. @errose28 |
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DaveTeng0 for the patch.
Some comments about POM and CLI. Note: I haven't checked the code of the tool itself (FSORepairTool).
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/FSORepairCLI.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/FSORepairCLI.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/FSORepairCLI.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/FSORepairCLI.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need to decide what the CLI for this should look like. We could do ozone {debug,repair} fso-tree or ozone repair fso-tree [--dry-run]. Also as we add more of these type of commands I think ones that are specific to a component should be under their own subcommand for organization, like ozone repair om fso-tree.
Attila also brought up the --dry-run mode. I think if the command is under repair only, then dry run would not be the expected default value. If we add the read-only invocation under debug then that becomes the equivalent of dry run and no flag is needed.
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/TestFSORepairTool.java
Show resolved
Hide resolved
…d base class FSOBaseCLI and FSOBaseTool
Yeah! extracted common codes between FSODebugCLI and FSORepairCLI to separated base classes FSOBaseCLI and FSOBaseTool, and make them reuse same logic. |
|
Hello team! please feel free to let me know if there is any new comment~ Thanks! |
errose28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DaveTeng0 I just did a quick pass on the code outside of the main repair logic.
There seems to be a lot of failures on your branch, although I haven't looked deeply and it could just be a bad CI run. Can you get that to a green state?
For the CLI, If we go the route of different debug and repair commands, I think we want ozone debug om fso-tree and ozone repair om fso-tree to be the two options. Just having ozone repair om fso-tree [--dry-run] isn't a bad approach either. I think I slightly prefer that one and it prevents the need for an extra om subcommand under each option (for now at least), but I'm ok with two separate commands as well.
...ds/managed-rocksdb/src/main/java/org/apache/hadoop/hdds/utils/db/managed/ManagedRocksDB.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/common/FSOBaseCLI.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/common/FSOBaseCLI.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/common/FSOBaseCLI.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/common/FSOBaseCLI.java
Outdated
Show resolved
Hide resolved
|
@sarvekshayr will be resuming the work on this task. The remaining items to be addressed are:
|
|
Continued in #7368. |
What changes were proposed in this pull request?
Bugs like HDDS-7592 can break the FSO tree and cause data to be orphaned in the OM. We have developed a tool to identify and repair this condition in the OM and tested it on affected clusters. This jira is to contribute the tool back to the community under the ozone CLI.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8101
How was this patch tested?
Unit test, integration test.