Skip to content

Conversation

@sarthakaggarwal97
Copy link
Contributor

@sarthakaggarwal97 sarthakaggarwal97 commented Oct 2, 2025

We have relaxed the cluster-ping-interval and cluster-node-timeout so that cluster has enough time to stabilize and propagate changes.

Today's failed test run: https://github.com/valkey-io/valkey/actions/runs/18179260254/job/51751751729#step:6:11262

[err]: Node #10 should eventually replicate node #5 in tests/unit/cluster/slave-selection.tcl
#10 didn't became slave of #5

@codecov
Copy link

codecov bot commented Oct 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.62%. Comparing base (3073979) to head (2525015).
⚠️ Report is 12 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2672      +/-   ##
============================================
+ Coverage     72.18%   72.62%   +0.44%     
============================================
  Files           128      128              
  Lines         70994    71273     +279     
============================================
+ Hits          51246    51762     +516     
+ Misses        19748    19511     -237     

see 32 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it will help or is it a wild guess?

@zuiderkwast zuiderkwast changed the title Deflake slave selection test by increasing wait condition time Deflake replica selection test by increasing wait condition time Oct 6, 2025
@sarthakaggarwal97 sarthakaggarwal97 force-pushed the slave-selection-flaky-test branch from 3d1a7a3 to 0d3b28f Compare October 8, 2025 03:33
@sarthakaggarwal97
Copy link
Contributor Author

sarthakaggarwal97 commented Oct 8, 2025

@zuiderkwast The test fails with only valgrind in the past couple of weeks, so it should be related to general slowness with valgrind. Also, I have few passing valgrind runs in my local repo after this change, so it should work!

Copy link
Member

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5000 200 is quite a huge timeout and look odd to me, we have a lot of the same cluster test (i belive) under the daily, have you measured its testing time in daily ci? Do you think adjusting cluster-ping-interval and cluster-node-timeout would help?

@sarthakaggarwal97
Copy link
Contributor Author

Valgrind tests take about 3hrs 50mins ~ something that we see in daily tests too. Let me explore cluster-ping-interval and cluster-node-timeout.

@sarthakaggarwal97 sarthakaggarwal97 force-pushed the slave-selection-flaky-test branch from 0d3b28f to 65d857e Compare October 8, 2025 23:11
@sarthakaggarwal97
Copy link
Contributor Author

@enjoy-binbin I think your suggestion has worked. I somehow didn't notice that the values for ping internal and node timeout are less by default. Just increasing for this test have gotten me 2-3 successful runs together.

Signed-off-by: Sarthak Aggarwal <[email protected]>
@sarthakaggarwal97 sarthakaggarwal97 force-pushed the slave-selection-flaky-test branch from 65d857e to 2525015 Compare October 8, 2025 23:13
@sarthakaggarwal97 sarthakaggarwal97 changed the title Deflake replica selection test by increasing wait condition time Deflake replica selection test by relaxing cluster configurations Oct 8, 2025
Copy link
Member

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just increasing for this test have gotten me 2-3 successful runs together.

thanks, please try running it a few more times before we merge it.

@sarthakaggarwal97
Copy link
Contributor Author

@enjoy-binbin the test is green for 6 last runs in my local repo!

@zuiderkwast zuiderkwast merged commit f598d11 into valkey-io:unstable Oct 9, 2025
52 checks passed
@sarthakaggarwal97 sarthakaggarwal97 moved this to To be backported in Valkey 9.0 Oct 10, 2025
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Oct 13, 2025
…lkey-io#2672)

We have relaxed the `cluster-ping-interval` and `cluster-node-timeout`
so that cluster has enough time to stabilize and propagate changes.

Fixes this test occasional failure when running with valgrind:

    [err]: Node #10 should eventually replicate node #5 in tests/unit/cluster/slave-selection.tcl
    #10 didn't became slave of #5

Signed-off-by: Sarthak Aggarwal <[email protected]>
zuiderkwast pushed a commit that referenced this pull request Oct 13, 2025
)

We have relaxed the `cluster-ping-interval` and `cluster-node-timeout`
so that cluster has enough time to stabilize and propagate changes.

Fixes this test occasional failure when running with valgrind:

[err]: Node #10 should eventually replicate node #5 in
tests/unit/cluster/slave-selection.tcl
    #10 didn't became slave of #5

Backported to the 9.0 branch in #2731.

Signed-off-by: Sarthak Aggarwal <[email protected]>
@zuiderkwast zuiderkwast moved this from To be backported to Done in Valkey 9.0 Oct 13, 2025
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Oct 14, 2025
…lkey-io#2672)

We have relaxed the `cluster-ping-interval` and `cluster-node-timeout`
so that cluster has enough time to stabilize and propagate changes.

Fixes this test occasional failure when running with valgrind:

    [err]: Node valkey-io#10 should eventually replicate node valkey-io#5 in tests/unit/cluster/slave-selection.tcl
    valkey-io#10 didn't became slave of valkey-io#5

Signed-off-by: Sarthak Aggarwal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants