Skip to content

Conversation

@enjoy-binbin
Copy link
Member

We may rely on auth_time to determine whether a failover is in progress, like #1009, so it is best to reset it.

We may rely on auth_time to determine whether a failover is
in progress, like valkey-io#1009, so it is best to reset it.

Signed-off-by: Binbin <[email protected]>
@enjoy-binbin
Copy link
Member Author

I found it when i am debugging something, here is the logs:

96937:S 11 Feb 2025 19:38:26.268 * Forced failover user request accepted (user request from 'id=3 addr=127.0.0.1:63767 laddr=127.0.0.1:21111 fd=14 name= user=default lib-name= lib-ver=').
96937:S 11 Feb 2025 19:38:26.269 * Start of election delayed for 0 milliseconds (rank #0, primary rank #0, offset 14).
96937:S 11 Feb 2025 19:38:26.269 * Starting a failover election for epoch 4, node config epoch is 1
96937:S 11 Feb 2025 19:38:26.303 * Currently unable to failover: Waiting for votes, but majority still not reached.
96937:S 11 Feb 2025 19:38:26.303 * Needed quorum: 2. Number of votes received so far: 0
96937:S 11 Feb 2025 19:38:26.326 # Failover election in progress for epoch 4, but received a claim from node 42fcebf9af0713b81ff6e0251a9175b7d71b767e () with an equal or higher epoch 4. Resetting the election since we cannot win an election in the past.
96937:S 11 Feb 2025 19:38:26.326 * Start of election delayed for 0 milliseconds (rank #0, primary rank #0, offset 14).
96937:S 11 Feb 2025 19:38:26.326 * Starting a failover election for epoch 5, node config epoch is 1

# In here we won the election.
96937:S 11 Feb 2025 19:38:26.385 * Failover election won: I'm the new primary.
96937:S 11 Feb 2025 19:38:26.385 * configEpoch set to 5 after successful failover
96937:S 11 Feb 2025 19:38:26.385 * Setting myself to primary in shard 6e8f931f6c14ed5de0c1a5b340135f332299907f after failover; my old primary is 33b9f47e71a2b59f81d1ac97bb596d82e0d3d334 ()
96937:M 11 Feb 2025 19:38:26.385 * Connection with primary lost.
96937:M 11 Feb 2025 19:38:26.385 * Caching the disconnected primary state.
96937:M 11 Feb 2025 19:38:26.385 * Discarding previously cached primary state.
96937:M 11 Feb 2025 19:38:26.385 * Setting secondary replication ID to 701eaad3916b38e04ab945f6e045770bfc20c917, valid up to offset: 15. New replication ID is 5d6a17ad6bca6585512d3eca50ec81eac2223afc

# In here we got another message that printing a misleading log.
96937:M 11 Feb 2025 19:38:26.608 # Failover election in progress for epoch 5, but received a claim from node 33b9f47e71a2b59f81d1ac97bb596d82e0d3d334 () with an equal or higher epoch 6. Resetting the election since we cannot win an election in the past.
96937:M 11 Feb 2025 19:38:26.608 * Configuration change detected. Reconfiguring myself as a replica of node 33b9f47e71a2b59f81d1ac97bb596d82e0d3d334 () in shard 6e8f931f6c14ed5de0c1a5b340135f332299907f
96937:S 11 Feb 2025 19:38:26.609 * Before turning into a replica, using my own primary parameters to synthesize a cached primary: I may be able to synchronize with the new primary with just a partial transfer.
96937:S 11 Feb 2025 19:38:26.609 * Connecting to PRIMARY 127.0.0.1:21114

@codecov
Copy link

codecov bot commented Feb 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.13%. Comparing base (4e0149a) to head (360837a).
Report is 4 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1711      +/-   ##
============================================
+ Coverage     71.11%   71.13%   +0.01%     
============================================
  Files           123      123              
  Lines         65541    65536       -5     
============================================
+ Hits          46610    46617       +7     
+ Misses        18931    18919      -12     
Files with missing lines Coverage Δ
src/cluster_legacy.c 85.81% <100.00%> (-0.27%) ⬇️

... and 13 files with indirect coverage changes

Co-authored-by: Madelyn Olson <[email protected]>
Signed-off-by: Binbin <[email protected]>
@enjoy-binbin enjoy-binbin merged commit eeda8ae into valkey-io:unstable Feb 12, 2025
50 checks passed
@enjoy-binbin enjoy-binbin deleted the reset_auth_time branch February 12, 2025 02:47
xbasel pushed a commit to xbasel/valkey that referenced this pull request Mar 27, 2025
We may rely on auth_time to determine whether a failover is in progress,
like valkey-io#1009, so it is best to reset it.

Signed-off-by: Binbin <[email protected]>
xbasel pushed a commit to xbasel/valkey that referenced this pull request Mar 27, 2025
We may rely on auth_time to determine whether a failover is in progress,
like valkey-io#1009, so it is best to reset it.

Signed-off-by: Binbin <[email protected]>
murphyjacob4 pushed a commit to enjoy-binbin/valkey that referenced this pull request Apr 13, 2025
We may rely on auth_time to determine whether a failover is in progress,
like valkey-io#1009, so it is best to reset it.

Signed-off-by: Binbin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants