Skip to content

[action] [PR:3837] Fix DPU restart message drop by Zmq lazy bind.#144

Merged
mssonicbld merged 1 commit intoAzure:202506from
mssonicbld:cherry/msft-202506/3837
Aug 30, 2025
Merged

[action] [PR:3837] Fix DPU restart message drop by Zmq lazy bind.#144
mssonicbld merged 1 commit intoAzure:202506from
mssonicbld:cherry/msft-202506/3837

Conversation

@mssonicbld
Copy link
Copy Markdown
Collaborator

Fix DPU restart message drop by Zmq lazy bind.

Why I did it

Fix issue:
sonic-net/sonic-buildimage#23110

When creating a ZmqServer followed by a ZmqProducerStateTable, there may be a time gap between the server starting to receive messages and the producer state table registering its handler.

This gap can lead to dropped messages. To avoid this, use lazy binding and invoke bind() only after the handler is registered.

How I did it

Update Orchagent to use lazy binding for ZMQ.
ZMQ is now created with lazy bind in Orchagent, and the bind() operation is deferred until all ZmqProducerStateTable instances have been initialized. This ensures handlers are registered before any messages are received, preventing potential data loss during startup.

This PR depends on sonic-net/sonic-swss-common#1068

Work item tracking
  • Microsoft ADO: 33995986

How to verify it

Pass all test cases.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Fix DPU restart message drop by Zmq lazy bind.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Fix DPU restart message drop by Zmq lazy bind.

#### Why I did it
Fix issue:
sonic-net/sonic-buildimage#23110

When creating a ZmqServer followed by a ZmqProducerStateTable, there may be a time gap between the server starting to receive messages and the producer state table registering its handler.

This gap can lead to dropped messages. To avoid this, use lazy binding and invoke bind() only after the handler is registered.

#### How I did it
Update Orchagent to use lazy binding for ZMQ.
ZMQ is now created with lazy bind in Orchagent, and the bind() operation is deferred until all ZmqProducerStateTable instances have been initialized. This ensures handlers are registered before any messages are received, preventing potential data loss during startup.

This PR depends on sonic-net/sonic-swss-common#1068

##### Work item tracking
- Microsoft ADO: 33995986

#### How to verify it
Pass all test cases.

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111

#### Description for the changelog
Fix DPU restart message drop by Zmq lazy bind.

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/SONiC/wiki/Configuration.
-->

#### A picture of a cute animal (not mandatory but encouraged)
@mssonicbld
Copy link
Copy Markdown
Collaborator Author

Original PR: sonic-net/sonic-swss#3837

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit bee09f8 into Azure:202506 Aug 30, 2025
5 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant