[sonic-mgmt][dualtor-aa] Fix fdb/test_fdb_mac_learning.py failures#15675
Conversation
1) Add fixture to setup topo in active-standby mode. This is needed to make sure packets goto selected dut (for mac learning to happen correctly). 2) Introduce logic to wait for mux status to become consistent before sending traffic (instead of relying on time.sleep delay). 3) Ignoring "...All port channels failed to come up within 3 minutes" syslog, as test is bringing down portchannels and restores them at the end.
|
The pre-commit check detected issues in the files touched by this pull request. Detailed pre-commit check results: To run the pre-commit checks locally, you can follow below steps:
|
tests/fdb/test_fdb_mac_learning.py
Outdated
| time.sleep(30) | ||
| target_ports = [target_ports_to_ptf_mapping[0][0]] | ||
| duthost.shell("sudo config interface startup {}".format(target_ports[0])) | ||
| pytest_assert(wait_until(150, 5, 0, self.check_mux_status_consistency, duthost, target_ports)) |
There was a problem hiding this comment.
do we need to check if this is dualtor testbed first? What if this is a t0?
There was a problem hiding this comment.
Thanks @lolyu for catching this, yeah for t0 mux status is irrelevant (as muxcable is specific to dualtor), will update check_mux_status_consistency method to handle this case.
There was a problem hiding this comment.
Hi @lolyu I have updated the fix to take care of non-dualtor topologies, please review.
Muxcable is irrelevant for non-dualtor topologies and thus adding a condition to check for mux status consistency in case of dualtor, otherwise add delay using time.sleep (which is a existing change).
For active-active dualtor, NIC simulator doesn't install OVS flows for downlink ports until the link status becomes consistent which seems to happen only if upstream connectivity is restored
…onic-net#15675) * [sonic-mgmt][dualtor-aa] Fix fdb/test_fdb_mac_learning.py failures 1) Add fixture to setup topo in active-standby mode. This is needed to make sure packets goto selected dut (for mac learning to happen correctly). 2) Introduce logic to wait for mux status to become consistent before sending traffic (instead of relying on time.sleep delay). 3) Ignoring "...All port channels failed to come up within 3 minutes" syslog, as test is bringing down portchannels and restores them at the end. * Fix pre-commit check failures. * Update fix to handle non-dualtor case. Muxcable is irrelevant for non-dualtor topologies and thus adding a condition to check for mux status consistency in case of dualtor, otherwise add delay using time.sleep (which is a existing change). * [dualtor-aa] Bringup upstream connectivity for mac learning to happen For active-active dualtor, NIC simulator doesn't install OVS flows for downlink ports until the link status becomes consistent which seems to happen only if upstream connectivity is restored
|
Cherry-pick PR to 202405: #15784 |
…15675) * [sonic-mgmt][dualtor-aa] Fix fdb/test_fdb_mac_learning.py failures 1) Add fixture to setup topo in active-standby mode. This is needed to make sure packets goto selected dut (for mac learning to happen correctly). 2) Introduce logic to wait for mux status to become consistent before sending traffic (instead of relying on time.sleep delay). 3) Ignoring "...All port channels failed to come up within 3 minutes" syslog, as test is bringing down portchannels and restores them at the end. * Fix pre-commit check failures. * Update fix to handle non-dualtor case. Muxcable is irrelevant for non-dualtor topologies and thus adding a condition to check for mux status consistency in case of dualtor, otherwise add delay using time.sleep (which is a existing change). * [dualtor-aa] Bringup upstream connectivity for mac learning to happen For active-active dualtor, NIC simulator doesn't install OVS flows for downlink ports until the link status becomes consistent which seems to happen only if upstream connectivity is restored
Description of PR
Summary: [dualtor-aa] Fix "fdb/test_fdb_mac_learning.py" failures
Fixes # https://github.com/aristanetworks/sonic-qual.msft/issues/329
Type of change
Back port request
Approach
What is the motivation for this PR?
Test is currently failing on
dualtor-aatopologies due toPacket sometimes going to unselected dut (due to active-active topology) and thus lead to mac learning failure.
After bringing up interfaces (from shutdown state), there is time.sleep of 30 seconds which seem to be not enough for muxcable status on duthost to become consistent with mux
server_status(seeSERVER_STATUSshown asunknownbelow). We need to wait for SERVER_STATUS to match with STATUS field for mac learning to happen.ERR swss#tunnel_packet_handler.py: All portchannels failed to come up within 3 minutes, exiting.is coming during the test and causing test faiure (as log_analyzer is complaining)How did you do it?
learning to happen correctly).
portchannels.
How did you verify/test it?
Stressed the test on
Arista-7260CX3-D108C8platform withdualtor-aa[-56]deployed and test is passing.Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation