Added LAG member check on addLagMember() (#2464)#2665
Closed
yenlu-keith wants to merge 1 commit intosonic-net:202205from
Closed
Added LAG member check on addLagMember() (#2464)#2665yenlu-keith wants to merge 1 commit intosonic-net:202205from
yenlu-keith wants to merge 1 commit intosonic-net:202205from
Conversation
*[teammgr] Added LAG member check into addLagMember()
prsunny
approved these changes
Feb 15, 2023
Collaborator
|
Cherry-picking PR #2464 |
Collaborator
|
@yenlu-keith , i've added labels to original PR. Branch owner shall get back if there are any conflicts and require manual PR. Until then we can hold off on this PR. |
Collaborator
|
@yenlu-keith , closing this PR as it is already included as part of cherry-picking original PR. Please check the labels |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What I did
Added a check into addLagMember() whether this new LAG member still exists in the kernel.
Why I did it
During syncd container autorestart scenario, on syncd exit, the host interfaces (tun/tap netdevs) go to the DOWN state and then get removed.
Due to the validation as follows, the teammgr will receive the notification about the port state change (the information will be updated in the state DB and pubsub message sent) but the port state record will not be removed from the state DB on port delete:
sonic-swss/portsyncd/linksync.cpp
Line 210 in 7cc035f
if (master && nlmsg_type == RTM_DELLINK)
Due to this, on port state change notification, the isPortStateOk() will succeed and TeamMgr::addLagMember() will be executed even the host interface was actually removed.
The operation is expected to be ignored if the port is already enslaved:
sonic-swss/cfgmgr/teammgr.cpp
Line 721 in 7cc035f
if (isPortEnslaved(member))
The check fails since the port has already been removed:
sonic-swss/cfgmgr/teammgr.cpp
Line 412 in 7cc035f
return lstat(path.c_str(), &buf) == 0;
As a result, the TeamMgr::addLagMember() logic will be executed and failed:
Jun 21 11:47:12.265955 cab18-7-dut INFO teamd#/supervisord: teammgrd Cannot find device "Ethernet0"
Jun 21 11:47:12.294550 cab18-7-dut INFO teamd#/supervisord: teammgrd libteamdctl: cli_usock_process_msg: usock: Error message received: "NoSuchDev"
Jun 21 11:47:12.294550 cab18-7-dut INFO teamd#/supervisord: teammgrd libteamdctl: cli_usock_process_msg: usock: Error message content: "No such device."
Jun 21 11:47:12.294550 cab18-7-dut INFO teamd#/supervisord: teammgrd command call failed (Invalid argument)
Jun 21 11:47:12.322497 cab18-7-dut INFO teamd#/supervisord: teammgrd libteamdctl: cli_usock_process_msg: usock: Error message received: "NoSuchDev"
Jun 21 11:47:12.322497 cab18-7-dut INFO teamd#/supervisord: teammgrd libteamdctl: cli_usock_process_msg: usock: Error message content: "No such device."
Jun 21 11:47:12.322497 cab18-7-dut INFO teamd#/supervisord: teammgrd command call failed (Invalid argument)
Jun 21 11:47:12.328844 cab18-7-dut ERR teamd#teammgrd: :- checkPortIffUp: Failed to get port Ethernet0 flags
Jun 21 11:47:12.328844 cab18-7-dut ERR teamd#teammgrd: :- addLagMember: Failed to add Ethernet0 to port channel PortChannel102
The issue started to reproduce after #2233
How I verified it
autorestart/test_container_autorestart.py -k 'syncd'