-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-16624. Fix flaky unit test TestDFSAdmin#testAllDatanodesReconfig #4412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nodesReconfig ERROR.
|
@virajjasani @tomscut please help to review the code. |
|
Thanks for the PR @slfan1989 |
Hi @virajjasani , |
|
That's surprising indeed. One of the recent PRs, I see this test passing: |
But you can look at the screenshot of my local debugger(in jira HDFS-16624), the index is indeed inaccurate. |
|
🎊 +1 overall
This message was automatically generated. |
|
I already know the cause of the problem, because the log is out of order due to multi-threading. I compared the logs of several execution failures, as follows, after comparison, it is found that the log order cannot be It is the disorder caused by the following code. Therefore, using a fixed-order assertion judgment will occasionally succeed and occasionally fail. improve proposals: @virajjasani @tomscut Is it ok for me to do this? Or can you give me some advice? |
| if (!status.stopped()) { | ||
| out.println(" and is still running."); | ||
| return 0; | ||
| synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slfan1989 This is good for concurrency control but we should avoid this for performance issues. Rather, we can update test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your suggestion, I will modify the junit test.
| assertEquals("SUCCESS: Changed property dfs.datanode.peer.stats.enabled", outs.get(1)); | ||
| assertEquals("\tFrom: \"false\"", outs.get(2)); | ||
| assertEquals("\tTo: \"true\"", outs.get(3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that concurrency is at play here, we can do something like this:
assertTrue("SUCCESS: Changed property dfs.datanode.peer.stats.enabled".equals(outs.get(2))
|| "SUCCESS: Changed property dfs.datanode.peer.stats.enabled".equals(outs.get(1)));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will modify the code.
|
Good find btw @slfan1989 |
This reverts commit 689f74b.
|
@virajjasani @tomscut please help me review the code again. |
The code you contributed is very useful, I have learned a lot from you, thank you again! |
tomscut
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good to me. @virajjasani Please also take a look. Thanks.
virajjasani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 (non-binding)
|
Thank you for the nice fix @slfan1989 and for the review @tomscut !! |
|
🎊 +1 overall
This message was automatically generated. |
|
Thanks @slfan1989 for your contribution! Thanks @virajjasani for your review. |
…apache#4412) Reviewed-by: Viraj Jasani <[email protected]> Signed-off-by: Tao Li <[email protected]>
…#4412) Reviewed-by: Viraj Jasani <[email protected]> Signed-off-by: Tao Li <[email protected]>
…apache#4412) Reviewed-by: Viraj Jasani <[email protected]> Signed-off-by: Tao Li <[email protected]>
JIRA:HDFS-16624. Fix org.apache.hadoop.hdfs.tools.TestDFSAdmin#testAllDatanodesReconfig ERROR.
(#4406) found an error message during Junit unit testing, as follows:
After code debugging, it was found that there was an error in the selection outs.get(2) of the assertion(1208), index should be equal to 1.
Please see jira for details.
Introduce code in #4264