Skip to content

Commit e024b4b

Browse files
authored
Drop the MEET packet if the link node is in handshake state (#1436)
After #1307 got merged, we notice there is a assert happen in setClusterNodeToInboundClusterLink: ``` === ASSERTION FAILED === ==> '!link->node' is not true ``` In #778, we will call setClusterNodeToInboundClusterLink to attach the node to the link during the MEET processing, so if we receive a another MEET packet in a short time, the node is still in handshake state, we will meet this assert and crash the server. If the link is bound to a node and the node is in the handshake state, and we receive a MEET packet, it may be that the sender sent multiple MEET packets so in here we are dropping the MEET to avoid the assert in setClusterNodeToInboundClusterLink. The assert will happen if the other sends a MEET packet because it detects that there is no inbound link, this node creates a new node in HANDSHAKE state (with a random node name), and respond with a PONG. The other node receives the PONG and removes the CLUSTER_NODE_MEET flag. This node is supposed to open an outbound connection to the other node in the next cron cycle, but before this happens, the other node re-sends a MEET on the same link because it still detects no inbound connection. Note that in getNodeFromLinkAndMsg, the node in the handshake state has a random name and not truly "known", so we don't know the sender. Dropping the MEET packet can prevent us from creating a random node, avoid incorrect link binding, and avoid duplicate MEET packet eliminate the handshake state. Signed-off-by: Binbin <[email protected]>
1 parent ad24220 commit e024b4b

File tree

1 file changed

+26
-4
lines changed

1 file changed

+26
-4
lines changed

src/cluster_legacy.c

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3003,7 +3003,8 @@ int clusterIsValidPacket(clusterLink *link) {
30033003
}
30043004

30053005
if (type == server.cluster_drop_packet_filter || server.cluster_drop_packet_filter == -2) {
3006-
serverLog(LL_WARNING, "Dropping packet that matches debug drop filter");
3006+
serverLog(LL_WARNING, "Dropping packet of type %s that matches debug drop filter",
3007+
clusterGetMessageTypeString(type));
30073008
return 0;
30083009
}
30093010

@@ -3094,7 +3095,7 @@ int clusterProcessPacket(clusterLink *link) {
30943095
if (server.debug_cluster_close_link_on_packet_drop &&
30953096
(type == server.cluster_drop_packet_filter || server.cluster_drop_packet_filter == -2)) {
30963097
freeClusterLink(link);
3097-
serverLog(LL_WARNING, "Closing link for matching packet type %hu", type);
3098+
serverLog(LL_WARNING, "Closing link for matching packet type %s", clusterGetMessageTypeString(type));
30983099
return 0;
30993100
}
31003101
return 1;
@@ -3110,8 +3111,8 @@ int clusterProcessPacket(clusterLink *link) {
31103111
freeClusterLink(link);
31113112
serverLog(
31123113
LL_NOTICE,
3113-
"Closing link for node that sent a lightweight message of type %hu as its first message on the link",
3114-
type);
3114+
"Closing link for node that sent a lightweight message of type %s as its first message on the link",
3115+
clusterGetMessageTypeString(type));
31153116
return 0;
31163117
}
31173118
clusterNode *sender = link->node;
@@ -3120,6 +3121,27 @@ int clusterProcessPacket(clusterLink *link) {
31203121
return 1;
31213122
}
31223123

3124+
if (type == CLUSTERMSG_TYPE_MEET && link->node && nodeInHandshake(link->node)) {
3125+
/* If the link is bound to a node and the node is in the handshake state, and we receive
3126+
* a MEET packet, it may be that the sender sent multiple MEET packets so in here we are
3127+
* dropping the MEET to avoid the assert in setClusterNodeToInboundClusterLink. The assert
3128+
* will happen if the other sends a MEET packet because it detects that there is no inbound
3129+
* link, this node creates a new node in HANDSHAKE state (with a random node name), and
3130+
* respond with a PONG. The other node receives the PONG and removes the CLUSTER_NODE_MEET
3131+
* flag. This node is supposed to open an outbound connection to the other node in the next
3132+
* cron cycle, but before this happens, the other node re-sends a MEET on the same link
3133+
* because it still detects no inbound connection. We improved the re-send logic of MEET in
3134+
* #1441, now we will only re-send MEET packet once every handshake timeout period.
3135+
*
3136+
* Note that in getNodeFromLinkAndMsg, the node in the handshake state has a random name
3137+
* and not truly "known", so we don't know the sender. Dropping the MEET packet can prevent
3138+
* us from creating a random node, avoid incorrect link binding, and avoid duplicate MEET
3139+
* packet eliminate the handshake state. */
3140+
serverLog(LL_NOTICE, "Dropping MEET packet from node %.40s because the node is already in handshake state",
3141+
link->node->name);
3142+
return 1;
3143+
}
3144+
31233145
uint16_t flags = ntohs(hdr->flags);
31243146
uint64_t sender_claimed_current_epoch = 0, sender_claimed_config_epoch = 0;
31253147
clusterNode *sender = getNodeFromLinkAndMsg(link, hdr);

0 commit comments

Comments
 (0)