Skip to content

Fix ZMQ lost data when connection break issue#1059

Closed
liuh-80 wants to merge 17 commits intosonic-net:masterfrom
liuh-80:dev/liuh/fix_gnmi_zmq_retry
Closed

Fix ZMQ lost data when connection break issue#1059
liuh-80 wants to merge 17 commits intosonic-net:masterfrom
liuh-80:dev/liuh/fix_gnmi_zmq_retry

Conversation

@liuh-80
Copy link
Copy Markdown
Contributor

@liuh-80 liuh-80 commented Jul 30, 2025

Fix ZMQ lost data when connection break issue

Why I did it

Gnmi configuration fails for the first dash object when DPU is powered off and powered back on due to zmq connection.
This is because ZMQ is reconnecting and note set ZMQ_IMMEDIATE flag.

How I did it

Set ZMQ_IMMEDIATE flag in ZMQ client to prevent data lost:
https://libzmq.readthedocs.io/en/latest/zmq_socket.html#:~:text=When%20a%20%27ZMQ_PUSH,are%20not%20discarded.

This flag will make ZMQ accept data only when server side ready.
Enable ZMQ_IMMEDIATE flag will not cause push ZMQ can't connect when ZMQ pull side not ready.

Work item tracking
  • Microsoft ADO: 33995986

How to verify it

Pass all test cases.
Verify with new test case: sonic-net/sonic-mgmt#20107

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Fix ZMQ lost data when connection break issue

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@liuh-80 liuh-80 changed the title Fix ZMQ lost data when conenction break issue Fix ZMQ lost data when connection break issue Jul 30, 2025
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Aug 1, 2025

This change will verify with sonic-net/sonic-buildimage#23525

@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Aug 5, 2025

This PR depends on test case PR merge first: sonic-net/sonic-mgmt#20107

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Aug 14, 2025

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Aug 15, 2025

/azpw run Azure.sonic-swss-common

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run Azure.sonic-swss-common

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

@yue-fred-gao yue-fred-gao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since IMMEDIATE=1 won't change zmq_connect behaviour, the PR is an incremental improvement over current behaviour. Approved.

@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Aug 16, 2025

PR validation blocked because a known validation pipeline issue, waiting for the issue fixed.

@liuh-80
Copy link
Copy Markdown
Contributor Author

liuh-80 commented Aug 16, 2025

I suspect the root cause might be a timing gap between the registration of the event handler in ZmqServer and ZmqConsumerStateTable.
Create a draft PR: #1068

@liuh-80 liuh-80 marked this pull request as draft August 16, 2025 08:31
@wangxin
Copy link
Copy Markdown

wangxin commented Aug 18, 2025

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@liuh-80 liuh-80 closed this Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants