Introduce keepalives for ZmqClient and ZmqServer#1162
Draft
prabhataravind wants to merge 2 commits intosonic-net:masterfrom
Draft
Introduce keepalives for ZmqClient and ZmqServer#1162prabhataravind wants to merge 2 commits intosonic-net:masterfrom
prabhataravind wants to merge 2 commits intosonic-net:masterfrom
Conversation
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
653fed6 to
9c68d7c
Compare
Fixes issue: sonic-net/sonic-buildimage#23110 When a DPU is powered off and back on, the ZMQ client on the switch still holds a stale TCP connection. The first message sent after DPU restart is delivered over the dead connection, gets a TCP RST, and is silently lost. ZMQ then auto-reconnects, so subsequent messages succeed. This patch enables: 1. TCP keepalive on ZmqClient PUSH sockets to detect dead connections proactively (within ~8 seconds of peer going down). 2. ZMQ_IMMEDIATE on ZmqClient PUSH sockets to prevent queueing messages to peers whose underlying TCP connection is not yet completed. 3. TCP keepalive on ZmqServer PULL sockets as defense-in-depth. With these changes, after DPU power-off: - TCP keepalive probes will fail, causing ZMQ to tear down the stale connection and reconnect - ZMQ_IMMEDIATE prevents the first message from being queued to a peer with an incomplete connection, so it stays in the send queue until the reconnection completes Signed-off-by: Prabhat Aravind <[email protected]>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Prabhat Aravind <[email protected]>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes issue: sonic-net/sonic-buildimage#23110
When a DPU is powered off and back on, the ZMQ client on the switch still holds a stale TCP connection. The first message sent after DPU restart is delivered over the dead connection, gets a TCP RST, and is silently lost. ZMQ then auto-reconnects, so subsequent messages succeed.
This patch enables:
With these changes, after DPU power-off: