Skip to content

Reduce steady state CPU utilization for clusterbus via SWIM protocol #2369

@hpatro

Description

@hpatro

Overview

This issue outlines the design for implementing the SWIM (Scalable Weakly-consistent Infection-style Process Group Membership) protocol in Valkey's Cluster. The SWIM protocol will replace the current approach where all nodes are contacted every node_timeout/2 period with a more scalable approach that contacts only one node per period. And further uses indirect probing through K additional nodes when failures are suspected.

The implementation will integrate seamlessly with the existing functionality and maintain backward compatibility with current cluster functionality while significantly reducing CPU utilization / message transfer in steady-state operations.

Architecture

Current Implementation

The current clusterCron() function performs the following operations every 100ms:

  • Iterates through all nodes in the cluster
  • Sends PING messages to nodes that haven't been contacted within node_timeout/2
  • Checks for node failures based on timeout thresholds
  • This results in O(N) network operations per period where N is the number of nodes

SWIM Protocol Architecture

The SWIM protocol introduces three key mechanisms:

  1. Direct Probing: Contact one random node per period
  2. Indirect Probing: When direct probe fails, contact all primaries to verify
  3. Dissemination: Spread membership information through gossip
Image

The implementation will integrate with existing cluster infrastructure:

  • Utilize existing clusterNode structures and clusterLink connections
  • Leverage current message types (PING/PONG) with new message types for indirect probing
  • Maintain compatibility with existing failure detection and cluster state management

Error Handling

  1. Direct Probe Timeout: When a direct probe times out, initiate indirect probing
  2. Indirect Probe Timeout: If indirect probes also fail, mark node as suspected
  3. Suspicion Timeout: If node remains unresponsive during suspicion period, mark as failed

Backward Compatibility

  • SWIM protocol is disabled by default
  • When disabled, system falls back to original ping behavior
  • Configuration changes require cluster-wide coordination
  • Mixed-mode operation not supported (all nodes must use same protocol)

References

Alternatives

Reduce overall message size of each packet in the current peer to peer failure detection mechanism to reduce CPU utilization

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions