-
Notifications
You must be signed in to change notification settings - Fork 962
Open
Labels
Description
Currently, a PING/PONG/MEET packet carries a subset (10%) of node information (gossip) which is relative the total number of nodes. Along with that, each packet also carries all of the nodes which have timed out i.e. in a partially failed (PFAIL) state.
There are two challenges with that:
- Each gossip message is 106 bytes and in a large cluster setup, it could account for around 10-20KB and could consume lot of network bandwidth on each node.
- If there are multiple PFAIL nodes observed from self view, it could also cause the increase in overall packet size and isn't bounded. This leads to indeterministic behavior and could cause high network usage/CPU utilization.
Solution:
Make the overall gossip size bounded and include PFAIL nodes within that. Provide this as a configuration cluster-node-gossip-percent to the administrator to control the rate of gossip transfer and information dissemination.
sarthakaggarwal97zuiderkwast