-
Notifications
You must be signed in to change notification settings - Fork 342
Fix ZMQ lost data when connection break issue #1059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 10 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
2ad0d1c
Fix ZMQ lost data when conenction break issue
mssonicbld f9b9e9c
Fix test casee
mssonicbld f8a8a93
Fix test case
mssonicbld 4eb1e9b
Fix testcase comments
mssonicbld 7e0f8bb
Fix test case
mssonicbld 749df10
Update zmq_state_ut.cpp
liuh-80 e882628
Merge branch 'master' into dev/liuh/fix_gnmi_zmq_retry
liuh-80 4b9f0b3
Improve retry count
mssonicbld ad18392
Merge branch 'master' into dev/liuh/fix_gnmi_zmq_retry
liuh-80 512d9a0
Merge branch 'master' into dev/liuh/fix_gnmi_zmq_retry
liuh-80 543b645
Update zmqclient.cpp
liuh-80 f360c03
Merge branch 'master' into dev/liuh/fix_gnmi_zmq_retry
liuh-80 fa6b027
Update sync.rs
liuh-80 d662842
Merge branch 'master' into dev/liuh/fix_gnmi_zmq_retry
liuh-80 62582a2
Update zmqclient.cpp
liuh-80 abf7952
Merge branch 'master' into dev/liuh/fix_gnmi_zmq_retry
liuh-80 cab1fae
Update zmqclient.cpp
liuh-80 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, current hamgrd relies on ZMQ_IMMEDIATE=0 because it doesn't actively retry. With this change, does zmq_connect fails if the peer is not connected? And what the client needs to do if connection is lost?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZMQ client will retry when connection dropped. and if retry failed, ZMQ client will throw an exception: "zmqerrno: 11:Resource temporarily unavailable"
Using ZMQ_IMMEDIATE=0 is unstable, hamgrd need handle exception and retry to make sure data not lost.
But because ZMQ client already retry before throw exception, so hamgrd may need log the exception and ignore it or crash with a error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on ZMQ_IMMEDIATE=0 is unstable? Do you mean message could lose when queue is full?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When ZMQ server side down, client side can't queue.
Then when ZMQ server side up, client side can queue data and send to server.
If ZMQ server side down again, then client side can't queue new data to server. client will retry:
If server start before client retry finish, the send will succeed.
If server does not start before client retry finish, the send will failed and throw exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the last 2 "if" are same condition but different result. Did you miss a "not" somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Basically, if server is not connected, client calling zmq.send() will fail but the messages already queued are safe and zmq will retry to deliver them to the server. Then I don't understand the root cause of this issue. I thought zmq dropped the first message during server is reconnected. But apparently it is not because it is able to resend queued messages without issue. Is the issue in gnmi resend logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ZMQ document doesn't give detail about how data dropped. if connection is setup, the pull/push is safe, but when connection broken, the queued data may drop.