Fix DPU restart message drop by Zmq lazy bind.#3837
Fix DPU restart message drop by Zmq lazy bind.#3837prsunny merged 4 commits intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
swss-common PR merged |
|
/azpw run Azure.sonic-swss |
|
/AzurePipelines run Azure.sonic-swss |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run Azure.sonic-swss |
|
/AzurePipelines run Azure.sonic-swss |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull Request Overview
This PR fixes message drops during DPU restart by implementing lazy binding for ZMQ servers. Instead of binding immediately when the ZmqServer is created, the binding is deferred until after message handlers are registered, preventing messages from being lost during the initialization gap.
- Introduces lazy binding for ZmqServer instances by passing a
trueparameter to the constructor - Updates the main orchestration agent to call
bind()after all handlers are registered - Modifies unit tests to explicitly call
bind()after registering handlers
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| lib/orch_zmq_config.cpp | Enables lazy binding by passing true to ZmqServer constructor |
| orchagent/main.cpp | Adds explicit bind() call after handler registration with logging |
| tests/mock_tests/zmq_orch_ut.cpp | Updates unit test to call bind() after registering message handler |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Fix DPU restart message drop by Zmq lazy bind. Why I did it Fix issue: sonic-net/sonic-buildimage#23110 When creating a ZmqServer followed by a ZmqProducerStateTable, there may be a time gap between the server starting to receive messages and the producer state table registering its handler. This gap can lead to dropped messages. To avoid this, use lazy binding and invoke bind() only after the handler is registered. How I did it Update Orchagent to use lazy binding for ZMQ. ZMQ is now created with lazy bind in Orchagent, and the bind() operation is deferred until all ZmqProducerStateTable instances have been initialized. This ensures handlers are registered before any messages are received, preventing potential data loss during startup.
* upstream/master: [ssw][ha] set `SAI_HA_SCOPE_ATTR_ADMIN_STATE` (sonic-net#3841) Fix DPU restart message drop by Zmq lazy bind. (sonic-net#3837) [ssw][ha] consume new ha_scope fields (sonic-net#3825) Add PFC historical statistics estimation to the PFCWD Orch (sonic-net#3533)
* upstream/master: [ssw][ha] set `SAI_HA_SCOPE_ATTR_ADMIN_STATE` (sonic-net#3841) Fix DPU restart message drop by Zmq lazy bind. (sonic-net#3837) [ssw][ha] consume new ha_scope fields (sonic-net#3825) Add PFC historical statistics estimation to the PFCWD Orch (sonic-net#3533)
Fix DPU restart message drop by Zmq lazy bind. Why I did it Fix issue: sonic-net/sonic-buildimage#23110 When creating a ZmqServer followed by a ZmqProducerStateTable, there may be a time gap between the server starting to receive messages and the producer state table registering its handler. This gap can lead to dropped messages. To avoid this, use lazy binding and invoke bind() only after the handler is registered. How I did it Update Orchagent to use lazy binding for ZMQ. ZMQ is now created with lazy bind in Orchagent, and the bind() operation is deferred until all ZmqProducerStateTable instances have been initialized. This ensures handlers are registered before any messages are received, preventing potential data loss during startup.
|
Cherry-pick PR to msft-202506: Azure/sonic-swss.msft#144 |
Fix DPU restart message drop by Zmq lazy bind. Why I did it Fix issue: sonic-net/sonic-buildimage#23110 When creating a ZmqServer followed by a ZmqProducerStateTable, there may be a time gap between the server starting to receive messages and the producer state table registering its handler. This gap can lead to dropped messages. To avoid this, use lazy binding and invoke bind() only after the handler is registered. How I did it Update Orchagent to use lazy binding for ZMQ. ZMQ is now created with lazy bind in Orchagent, and the bind() operation is deferred until all ZmqProducerStateTable instances have been initialized. This ensures handlers are registered before any messages are received, preventing potential data loss during startup.
Fix DPU restart message drop by Zmq lazy bind. Why I did it Fix issue: sonic-net/sonic-buildimage#23110 When creating a ZmqServer followed by a ZmqProducerStateTable, there may be a time gap between the server starting to receive messages and the producer state table registering its handler. This gap can lead to dropped messages. To avoid this, use lazy binding and invoke bind() only after the handler is registered. How I did it Update Orchagent to use lazy binding for ZMQ. ZMQ is now created with lazy bind in Orchagent, and the bind() operation is deferred until all ZmqProducerStateTable instances have been initialized. This ensures handlers are registered before any messages are received, preventing potential data loss during startup.
Fix DPU restart message drop by Zmq lazy bind. Why I did it Fix issue: sonic-net/sonic-buildimage#23110 When creating a ZmqServer followed by a ZmqProducerStateTable, there may be a time gap between the server starting to receive messages and the producer state table registering its handler. This gap can lead to dropped messages. To avoid this, use lazy binding and invoke bind() only after the handler is registered. How I did it Update Orchagent to use lazy binding for ZMQ. ZMQ is now created with lazy bind in Orchagent, and the bind() operation is deferred until all ZmqProducerStateTable instances have been initialized. This ensures handlers are registered before any messages are received, preventing potential data loss during startup. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Fix DPU restart message drop by Zmq lazy bind. Why I did it Fix issue: sonic-net/sonic-buildimage#23110 When creating a ZmqServer followed by a ZmqProducerStateTable, there may be a time gap between the server starting to receive messages and the producer state table registering its handler. This gap can lead to dropped messages. To avoid this, use lazy binding and invoke bind() only after the handler is registered. How I did it Update Orchagent to use lazy binding for ZMQ. ZMQ is now created with lazy bind in Orchagent, and the bind() operation is deferred until all ZmqProducerStateTable instances have been initialized. This ensures handlers are registered before any messages are received, preventing potential data loss during startup. Signed-off-by: Baorong Liu <96146196+baorliu@users.noreply.github.com>
Fix DPU restart message drop by Zmq lazy bind.
Why I did it
Fix issue:
sonic-net/sonic-buildimage#23110
When creating a ZmqServer followed by a ZmqProducerStateTable, there may be a time gap between the server starting to receive messages and the producer state table registering its handler.
This gap can lead to dropped messages. To avoid this, use lazy binding and invoke bind() only after the handler is registered.
How I did it
Update Orchagent to use lazy binding for ZMQ.
ZMQ is now created with lazy bind in Orchagent, and the bind() operation is deferred until all ZmqProducerStateTable instances have been initialized. This ensures handlers are registered before any messages are received, preventing potential data loss during startup.
This PR depends on sonic-net/sonic-swss-common#1068
Work item tracking
How to verify it
Pass all test cases.
Which release branch to backport (provide reason below if selected)
Description for the changelog
Fix DPU restart message drop by Zmq lazy bind.
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)