-
Notifications
You must be signed in to change notification settings - Fork 962
Description
Overview
This issue outlines the design for implementing the SWIM (Scalable Weakly-consistent Infection-style Process Group Membership) protocol in Valkey's Cluster. The SWIM protocol will replace the current approach where all nodes are contacted every node_timeout/2 period with a more scalable approach that contacts only one node per period. And further uses indirect probing through K additional nodes when failures are suspected.
The implementation will integrate seamlessly with the existing functionality and maintain backward compatibility with current cluster functionality while significantly reducing CPU utilization / message transfer in steady-state operations.
Architecture
Current Implementation
The current clusterCron() function performs the following operations every 100ms:
- Iterates through all nodes in the cluster
- Sends PING messages to nodes that haven't been contacted within
node_timeout/2 - Checks for node failures based on timeout thresholds
- This results in O(N) network operations per period where N is the number of nodes
SWIM Protocol Architecture
The SWIM protocol introduces three key mechanisms:
- Direct Probing: Contact one random node per period
- Indirect Probing: When direct probe fails, contact all primaries to verify
- Dissemination: Spread membership information through gossip
The implementation will integrate with existing cluster infrastructure:
- Utilize existing
clusterNodestructures andclusterLinkconnections - Leverage current message types (PING/PONG) with new message types for indirect probing
- Maintain compatibility with existing failure detection and cluster state management
Error Handling
- Direct Probe Timeout: When a direct probe times out, initiate indirect probing
- Indirect Probe Timeout: If indirect probes also fail, mark node as suspected
- Suspicion Timeout: If node remains unresponsive during suspicion period, mark as failed
Backward Compatibility
- SWIM protocol is disabled by default
- When disabled, system falls back to original ping behavior
- Configuration changes require cluster-wide coordination
- Mixed-mode operation not supported (all nodes must use same protocol)
References
- https://en.wikipedia.org/wiki/SWIM_Protocol
- https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf
- Lifeguard - https://arxiv.org/pdf/1707.00788 (Enhancements over SWIM)
Alternatives
Reduce overall message size of each packet in the current peer to peer failure detection mechanism to reduce CPU utilization