-
Notifications
You must be signed in to change notification settings - Fork 955
Deflake replica selection test by relaxing cluster configurations #2672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deflake replica selection test by relaxing cluster configurations #2672
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2672 +/- ##
============================================
+ Coverage 72.18% 72.62% +0.44%
============================================
Files 128 128
Lines 70994 71273 +279
============================================
+ Hits 51246 51762 +516
+ Misses 19748 19511 -237 🚀 New features to boost your workflow:
|
zuiderkwast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it will help or is it a wild guess?
3d1a7a3 to
0d3b28f
Compare
|
@zuiderkwast The test fails with only valgrind in the past couple of weeks, so it should be related to general slowness with valgrind. Also, I have few passing valgrind runs in my local repo after this change, so it should work! |
enjoy-binbin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5000 200 is quite a huge timeout and look odd to me, we have a lot of the same cluster test (i belive) under the daily, have you measured its testing time in daily ci? Do you think adjusting cluster-ping-interval and cluster-node-timeout would help?
|
Valgrind tests take about 3hrs 50mins ~ something that we see in daily tests too. Let me explore cluster-ping-interval and cluster-node-timeout. |
0d3b28f to
65d857e
Compare
|
@enjoy-binbin I think your suggestion has worked. I somehow didn't notice that the values for ping internal and node timeout are less by default. Just increasing for this test have gotten me 2-3 successful runs together. |
Signed-off-by: Sarthak Aggarwal <[email protected]>
65d857e to
2525015
Compare
enjoy-binbin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just increasing for this test have gotten me 2-3 successful runs together.
thanks, please try running it a few more times before we merge it.
|
@enjoy-binbin the test is green for 6 last runs in my local repo! |
…lkey-io#2672) We have relaxed the `cluster-ping-interval` and `cluster-node-timeout` so that cluster has enough time to stabilize and propagate changes. Fixes this test occasional failure when running with valgrind: [err]: Node #10 should eventually replicate node #5 in tests/unit/cluster/slave-selection.tcl #10 didn't became slave of #5 Signed-off-by: Sarthak Aggarwal <[email protected]>
) We have relaxed the `cluster-ping-interval` and `cluster-node-timeout` so that cluster has enough time to stabilize and propagate changes. Fixes this test occasional failure when running with valgrind: [err]: Node #10 should eventually replicate node #5 in tests/unit/cluster/slave-selection.tcl #10 didn't became slave of #5 Backported to the 9.0 branch in #2731. Signed-off-by: Sarthak Aggarwal <[email protected]>
…lkey-io#2672) We have relaxed the `cluster-ping-interval` and `cluster-node-timeout` so that cluster has enough time to stabilize and propagate changes. Fixes this test occasional failure when running with valgrind: [err]: Node valkey-io#10 should eventually replicate node valkey-io#5 in tests/unit/cluster/slave-selection.tcl valkey-io#10 didn't became slave of valkey-io#5 Signed-off-by: Sarthak Aggarwal <[email protected]>
We have relaxed the
cluster-ping-intervalandcluster-node-timeoutso that cluster has enough time to stabilize and propagate changes.Today's failed test run: https://github.com/valkey-io/valkey/actions/runs/18179260254/job/51751751729#step:6:11262