Skip to content

Re-enable ICMP responder and gratuitous ARP service for active-standby dualtor topologies.#12860

Closed
vivekverma-arista wants to merge 5 commits intosonic-net:masterfrom
vivekverma-arista:fix-oscillation
Closed

Re-enable ICMP responder and gratuitous ARP service for active-standby dualtor topologies.#12860
vivekverma-arista wants to merge 5 commits intosonic-net:masterfrom
vivekverma-arista:fix-oscillation

Conversation

@vivekverma-arista
Copy link
Copy Markdown
Contributor

Description of PR

Summary:
Fixes #119

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911
  • 202012
  • 202205
  • 202305
  • 202311

Approach

What is the motivation for this PR?

#221 introduced oscillation logic in active-standby dualtor. These oscillations happen continuously in the testing environment as well because we don't run ICMP responder as it was disabled some time back in #9117.

These continuous oscillations interfere with the testing and has made a lot of traffic tests flaky in active-standby dualtor.

Details can be found in #119

How did you do it?

Re-enabled ICMP responder and gratuitous ARP service in active-standby dualtor topologies.

We have introduced new fixtures toggle_all_simulator_ports_to_rand_selected_tor_unconditionally and toggle_all_simulator_ports_to_enum_rand_one_per_hwsku_frontend_host_unconditionally similar to active-active dualtor to run few tests in active-standby mode where ICMP responder interferes with the testing and we need to pause it.

This also affects then following tests which have been fixed

  1. everflow tests: These tests shutdown BGP on the randomly selected ToR and if ICMP responder is running toggle_all_simulator_ports_to_rand_selected_tor fails to toggle the MUX direction towards this ToR because it is now technically unhealthy as it lost routes ( which should be the expected behaviour in the production environment as well ). The fix for this test is very similar to active-active dualtor by using the fixture toggle_all_simulator_ports_to_rand_selected_tor_unconditionally. If ICMP responder is not running then toggle_all_simulator_ports_to_rand_selected_tor successfully toggles the MUX direction towards the unhealthy ToR which is a testing gap that got introduced due to disabling of ICMP responder.

  2. In case of pfcwd/test_pfcwd_function.py and arp/test_unknown_mac.py. ICMP responder and GARP interferes with the testing therefore the fix is to selectively pause them for these tests and run them in active-standby mode by using the new fixtures.

How did you verify/test it?

Tested on Arista-7260 and Arista-7050 platforms with dualtor and dualtor-120.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@vivekverma-arista
Copy link
Copy Markdown
Contributor Author

This has been fixed by modifying the product code therefore closing this pull request: sonic-net/sonic-linkmgrd#250

@vivekverma-arista vivekverma-arista deleted the fix-oscillation branch May 27, 2024 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant