Merged
Conversation
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
yxieca
approved these changes
Jun 28, 2022
yxieca
pushed a commit
that referenced
this pull request
Jun 28, 2022
What is the motivation for this PR? This issue occurs when running config load_minigraph to load new configs at both ToRs. After write_standby.py, the muxorch will try to direct all traffic downstream to the SoC IP and server IP to the tunnel, but xcvrd might fail to set the adminforwardingstate of the port via gRPC because the gRPC channel could not be established because the other ToR is in standby state. After linkmgrd starts to run and receives heartbeats from itself, it will try to toggle to active[toggle#1], but xcvrd might not be able to make the hardware toggle at the moment, so linkmgrd will mux wait. Also, because the mux probe table is initialized with Unknown state, linkmgrd will handle the initial mux probe state to have the composite states (active, unknown, up) and tries to probe the mux state. As the muxorch has been toggled to active[toggle#1], the gRPC channel will be established at some point after, xcvrd will be able to answer the mux probe, so the linkmgrd will be able to change into (active, active, up) state. But the toggling to active[toggle#1] is only finished half way, the mux status in STATE_DB:MUX_CABLE_TABLE is not updated, so show mux status will show unknown for those ports. Signed-off-by: Longxiang Lyu lolv@microsoft.com How did you do it? When the linkmgrd changes into states (active, active, up) and has the original mux state as unknown, it will toggle the mux to active again to have those DB tables updated: linkmgrd -> APP_DB:MUX_CABLE_TABLE -> swss -> APP_DB:HW_MUX_CABLE_TABLE -> xcvrd xcvrd -> STATE_DB:HW_MUX_CABLE_TABLE -> swss -> STATE_DB:MUX_CABLE_TABLE -> linkmgrd How did you verify/test it? On dualtor-mixed topo with icmp_responder running, do config load_minigraph on both ToRs, verify the show mux status on both ToRs. Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
7 tasks
zjswhhh
added a commit
to sonic-net/sonic-buildimage
that referenced
this pull request
Jul 12, 2022
[master][sonic-linkmgrd] submodule update 58d8aae Longxiang Lyu Sat Jul 2 10:14:50 2022 +0800 Enforce switch after config mux to active (sonic-net/sonic-linkmgrd#95) 600df46 Longxiang Lyu Thu Jun 30 15:09:10 2022 +0800 Add unittest to verify mux toggle active (sonic-net/sonic-linkmgrd#94) 400b1b8 gregshpit Wed Jun 29 21:32:45 2022 +0300 For Sonic cross-compilation build. CC variable is used as gcc compiler. CXX variable is used as g++ compiler. (sonic-net/sonic-linkmgrd#91) a516668 Jing Zhang Tue Jun 28 11:07:23 2022 -0700 Use Vlan MAC as src MAC for link prober by default (sonic-net/sonic-linkmgrd#93) 6b5d739 Longxiang Lyu Tue Jun 28 22:46:12 2022 +0800 Fix inconsistent mux state (sonic-net/sonic-linkmgrd#92) 9265497 Jing Zhang Fri Jun 24 09:10:12 2022 -0700 Remove exception throwing when initializing missing loopback interface (sonic-net/sonic-linkmgrd#90) sign-off: Jing Zhang zhangjing@microsoft.com
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary:
Fixes # (issue)
Type of change
Approach
What is the motivation for this PR?
This issue occurs when running
config load_minigraphto load new configs at both ToRs.After
write_standby.py, themuxorchwill try to direct all traffic downstream to the SoC IP and server IP to the tunnel, butxcvrdmight fail to set theadminforwardingstateof the port via gRPC because the gRPC channel could not be established because the other ToR is instandbystate.After
linkmgrdstarts to run and receives heartbeats from itself, it will try to toggle to active[toggle#1], butxcvrdmight not be able to make the hardware toggle at the moment, solinkmgrdwill mux wait. Also, because the mux probe table is initialized withUnknownstate,linkmgrdwill handle the initial mux probe state to have the composite states(active, unknown, up)and tries to probe the mux state.As the
muxorchhas been toggled to active[toggle#1], the gRPC channel will be established at some point after,xcvrdwill be able to answer the mux probe, so thelinkmgrdwill be able to change into(active, active, up)state.But the toggling to active[toggle#1] is only finished half way, the mux status in
STATE_DB:MUX_CABLE_TABLEis not updated, soshow mux statuswill showunknownfor those ports.Signed-off-by: Longxiang Lyu lolv@microsoft.com
How did you do it?
When the
linkmgrdchanges into states(active, active, up)and has the original mux state asunknown, it will toggle the mux toactiveagain to have those DB tables updated:How did you verify/test it?
On dualtor-mixed topo with
icmp_responderrunning, doconfig load_minigraphon both ToRs, verify theshow mux statuson both ToRs:Any platform specific information?
Documentation