Skip to content

[dualtor] Add test to simulate server reboot#14690

Merged
wangxin merged 7 commits intosonic-net:masterfrom
opcoder0:testgap/dualtor-test-server-pxe-boot
Oct 25, 2024
Merged

[dualtor] Add test to simulate server reboot#14690
wangxin merged 7 commits intosonic-net:masterfrom
opcoder0:testgap/dualtor-test-server-pxe-boot

Conversation

@opcoder0
Copy link
Contributor

Description of PR

Add test to simulate server reboot via restart of icmp_responder and verify that the TORs get back to healthy status after the restart is complete.

Summary:
Fixes #12039

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

Address test gap.

How did you do it?

Add test to simulate server reboot via shutdown of icmp_responder service and verified that the TOR health is back to normal after reboot is complete.

How did you verify/test it?

Ran tests on dualtor-aa testbeds.

Any platform specific information?

None

Supported testbed topology if it's a new test case?

dualtor

Documentation

None

Copy link
Contributor

@zjswhhh zjswhhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that your change didn't cover scenario 1.

The goal is to confirm when one ToR toggles mux direction, the other ToR will be in sync eventually even without icmp heartbeats.

Scenario 1 link flap after boot up.
One side ToR toggles mux, and the other side is able to sync up in short amount of time.

Comment on lines +101 to +109
def upper_tor_mux_state_verification(state, health):
mux_state_upper_tor = show_muxcable_status(upper_tor_host)
return (mux_state_upper_tor[test_iface]['status'] == state and
mux_state_upper_tor[test_iface]['health'] == health)

def lower_tor_mux_state_verfication(state, health):
mux_state_lower_tor = show_muxcable_status(lower_tor_host)
return (mux_state_lower_tor[test_iface]['status'] == state and
mux_state_lower_tor[test_iface]['health'] == health)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try using dualtor test utilities?

def verify_tor_states(
expected_active_host, expected_standby_host,
expected_standby_health='healthy', intf_names='all',
cable_type=CableType.default_type, skip_state_db=False,
skip_tunnel_route=True, standalone_tunnel_route=False,
verify_db_timeout=30
):

Copy link
Collaborator

@lolyu lolyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zjswhhh zjswhhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@opcoder0 opcoder0 marked this pull request as draft October 8, 2024 05:18
@opcoder0 opcoder0 marked this pull request as ready for review October 23, 2024 14:16
@opcoder0 opcoder0 requested a review from zjswhhh October 23, 2024 14:21
Copy link
Contributor

@zjswhhh zjswhhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@zjswhhh
Copy link
Contributor

zjswhhh commented Oct 23, 2024

Hi @opcoder0 - can you address the PR checker failure?

dualtor_mgmt/test_server_failure.py::test_server_reboot[active-standby] 
-------------------------------- live log call ---------------------------------
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet4" in full_dut_fanout_port_map
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet8" in full_dut_fanout_port_map
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet12" in full_dut_fanout_port_map
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet16" in full_dut_fanout_port_map
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet20" in full_dut_fanout_port_map
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet24" in full_dut_fanout_port_map
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet28" in full_dut_fanout_port_map

wangxin
wangxin previously approved these changes Oct 24, 2024
@opcoder0
Copy link
Contributor Author

Hi @opcoder0 - can you address the PR checker failure?

dualtor_mgmt/test_server_failure.py::test_server_reboot[active-standby] 
-------------------------------- live log call ---------------------------------
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet4" in full_dut_fanout_port_map
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet8" in full_dut_fanout_port_map
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet12" in full_dut_fanout_port_map
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet16" in full_dut_fanout_port_map
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet20" in full_dut_fanout_port_map
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet24" in full_dut_fanout_port_map
17:01:08 dual_tor_utils._shutdown_fanout_tor_intf L0451 ERROR  | No dut intf "vlab-05|Ethernet28" in full_dut_fanout_port_map

Thanks @zjswhhh fixed. It shouldn't have run on vs testbed as there is no fanout here.

@wangxin wangxin merged commit 4eb11d6 into sonic-net:master Oct 25, 2024
sreejithsreekumaran pushed a commit to sreejithsreekumaran/sonic-mgmt that referenced this pull request Nov 15, 2024
Add test to simulate server reboot via restart of icmp_responder and verify that the TORs get back to healthy status after the restart is complete.
yutongzhang-microsoft pushed a commit to yutongzhang-microsoft/sonic-mgmt that referenced this pull request Nov 21, 2024
Add test to simulate server reboot via restart of icmp_responder and verify that the TORs get back to healthy status after the restart is complete.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[dualtor][test gap] PXEboot scenario

4 participants