Backport - Fix replica can't finish failover when config epoch is outdated (#2178) to 7.2 (#2232)

ranshid · web-flow · commit 525551ab4a3d · 2025-06-18T11:30:39.000+03:00
When the primary changes the config epoch and then down immediately, the replica may not update the config epoch in time. Although we will broadcast the change in cluster (see #1813), there may be a race in the network or in the code. In this case, the replica will never finish the failover since other primaries will refuse to vote because the replica's slot config epoch is old. We need a way to allow the replica can finish the failover in this case. When the primary refuses to vote because the replica's config epoch is less than the dead primary's config epoch, it can send an UPDATE packet to the replica to inform the replica about the dead primary. The UPDATE message contains information about the dead primary's config epoch and owned slots. The failover will time out, but later the replica can try again with the updated config epoch and it can succeed. Fixes #2169. --------- Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
diff --git a/src/cluster.c b/src/cluster.c
@@ -3145,9 +3145,11 @@ int clusterProcessPacket(clusterLink *link) {
                         senderConfigEpoch)
                     {
                         serverLog(LL_VERBOSE,
-                            "Node %.40s has old slots configuration, sending "
-                            "an UPDATE message about %.40s",
-                                sender->name, server.cluster->slots[j]->name);
+                            "Node %.40s (%s) has old slots configuration, sending "
+                            "an UPDATE message about %.40s (%s)",
+                                sender->name, sender->human_nodename,
+                                server.cluster->slots[j]->name,
+                                server.cluster->slots[j]->human_nodename);
                         clusterSendUpdate(sender->link,
                             server.cluster->slots[j]);
 
@@ -4080,6 +4082,16 @@ void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) {
                 node->name, node->human_nodename, j,
                 (unsigned long long) server.cluster->slots[j]->configEpoch,
                 (unsigned long long) requestConfigEpoch);
+        
+        /* Send an UPDATE message to the replica. After receiving the UPDATE message,
+         * the replica will update the slots config so that it can initiate a failover
+         * again later. Otherwise the replica will never get votes if the primary is down. */
+        serverLog(LL_VERBOSE,
+                  "Node %.40s (%s) has old slots configuration, sending "
+                  "an UPDATE message about %.40s (%s)",
+                  node->name, node->human_nodename,
+                  server.cluster->slots[j]->name, server.cluster->slots[j]->human_nodename);
+        clusterSendUpdate(node->link, server.cluster->slots[j]);
         return;
     }