-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-17564. EC: Fix the issue of inaccurate metrics when decommission mark busy DN. #6911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…decommission mark busy DN
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
The failed unit test is not related to this PR. |
|
Hi @Hexiaoqiao @ZanderXu @zhangshuyan0 could you please help me review this pr when you have free time? Thanks~. |
Hexiaoqiao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM almost. Leave one nit comment inline. FYI. Thanks.
| assertEquals(decommisionNodes.size(), liveDecommissioning); | ||
|
|
||
| //4. wait for decommission block to replicate | ||
| Thread.sleep(3000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about to use GenericTestUtils.waitFor rather than Thread.sleep?
| */ | ||
| @Test(timeout = 120000) | ||
| public void testBusyAfterDecommissionNode() throws Exception { | ||
| byte busyDNIndex = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any consideration when define byte type for index here? Not blocker just out of interest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Hexiaoqiao for you comment.
there is no special meaning to define byte type, maybe we can change it to int type.
|
Update PR, Hi @Hexiaoqiao help review it again, thanks! |
|
💔 -1 overall
This message was automatically generated. |
Hexiaoqiao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 from my side.
|
Will commit if no more other comments while wait one workday. |
|
Committed to trunk. Thanks @haiyang1987 . |
|
Thanks @Hexiaoqiao for your review and merge it. |
… mark busy DN. (apache#6911). Contributed by Haiyang Hu. Signed-off-by: He Xiaoqiao <[email protected]>
… mark busy DN. (apache#6911). Contributed by Haiyang Hu. Signed-off-by: He Xiaoqiao <[email protected]>
… mark busy DN. (apache#6911). Contributed by Haiyang Hu. Signed-off-by: He Xiaoqiao <[email protected]>
Description of PR
https://issues.apache.org/jira/browse/HDFS-17564
If DataNode is marked as busy and contains many EC blocks, when running decommission DataNode, when execute ErasureCodingWork#addTaskToDatanode, here will no replication work will be generated for ecBlocksToBeReplicated, but related metrics (such as DatanodeDescriptor#currApproxBlocksScheduled, pendingReconstruction and needReconstruction) will still updated.
Specific code:
BlockManager#scheduleReconstruction -> BlockManager#chooseSourceDatanodes [2628~2650]
If DataNode is marked as busy and contains many EC blocks here will not add to srcNodes.
ErasureCodingWork#addTaskToDatanode[149~157]
so we need to fix this logic to avoid inaccurate metrics.