Skip to content

[action] [PR:15675] [sonic-mgmt][dualtor-aa] Fix fdb/test_fdb_mac_learning.py failures#15784

Merged
mssonicbld merged 1 commit intosonic-net:202405from
mssonicbld:cherry/202405/15675
Nov 28, 2024
Merged

[action] [PR:15675] [sonic-mgmt][dualtor-aa] Fix fdb/test_fdb_mac_learning.py failures#15784
mssonicbld merged 1 commit intosonic-net:202405from
mssonicbld:cherry/202405/15675

Conversation

@mssonicbld
Copy link
Collaborator

Description of PR

Summary: [dualtor-aa] Fix "fdb/test_fdb_mac_learning.py" failures
Fixes # https://github.com/aristanetworks/sonic-qual.msft/issues/329

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

Test is currently failing on dualtor-aa topologies due to

  1. Packet sometimes going to unselected dut (due to active-active topology) and thus lead to mac learning failure.

  2. After bringing up interfaces (from shutdown state), there is time.sleep of 30 seconds which seem to be not enough for muxcable status on duthost to become consistent with mux server_status (see SERVER_STATUS shown as unknown below). We need to wait for SERVER_STATUS to match with STATUS field for mac learning to happen.

PORT STATUS SERVER_STATUS HEALTH HWSTATUS LAST_SWITCHOVER_TIME
--------- -------- --------------- --------- ------------ ----------------------
Ethernet0 active unknown unhealthy inconsistent
  1. As test is bringing down all the interfaces (including portchannels), ERR swss#tunnel_packet_handler.py: All portchannels failed to come up within 3 minutes, exiting. is coming during the test and causing test faiure (as log_analyzer is complaining)

How did you do it?

  1. Add fixture to setup topo in active-standby mode. This is needed to make sure packets goto selected dut (for mac
    learning to happen correctly).
  2. Introduce logic to wait for mux status to become consistent before sending traffic (instead of relying on time.sleep delay).
  3. Ignore "All port channels failed to come up ..." syslog, which seems to be expected as test is bringing down all the
    portchannels.

How did you verify/test it?

Stressed the test on Arista-7260CX3-D108C8 platform with dualtor-aa[-56] deployed and test is passing.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

…onic-net#15675)

* [sonic-mgmt][dualtor-aa] Fix fdb/test_fdb_mac_learning.py failures

1) Add fixture to setup topo in active-standby mode. This is needed to
   make sure packets goto selected dut (for mac learning to happen
   correctly).
2) Introduce logic to wait for mux status to become consistent before
   sending traffic (instead of relying on time.sleep delay).
3) Ignoring "...All port channels failed to come up within 3 minutes"
   syslog, as test is bringing down portchannels and restores them at
   the end.

* Fix pre-commit check failures.

* Update fix to handle non-dualtor case.

Muxcable is irrelevant for non-dualtor topologies and thus adding a
condition to check for mux status consistency in case of dualtor,
otherwise add delay using time.sleep (which is a existing change).

* [dualtor-aa] Bringup upstream connectivity for mac learning to happen

For active-active dualtor, NIC simulator doesn't install OVS flows for
downlink ports until the link status becomes consistent which seems to
happen only if upstream connectivity is restored
@mssonicbld
Copy link
Collaborator Author

Original PR: #15675

@mssonicbld mssonicbld merged commit 16fa42a into sonic-net:202405 Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants