Skip to content

[action] [PR:13974] [test_ro_disk] Recover DUT to RW state by power-cycle when reboot doesn't work#13985

Merged
mssonicbld merged 1 commit intosonic-net:202405from
mssonicbld:cherry/202405/13974
Aug 5, 2024
Merged

[action] [PR:13974] [test_ro_disk] Recover DUT to RW state by power-cycle when reboot doesn't work#13985
mssonicbld merged 1 commit intosonic-net:202405from
mssonicbld:cherry/202405/13974

Conversation

@mssonicbld
Copy link
Copy Markdown
Collaborator

Description of PR

Summary:
On some platforms, DUT cannot be recovered from RO-disk state by reboot. (e.g., On Nokia-7215, we saw the reboot is blocked by systemd-journald.service) To avoid DUT stuck at RO disk state, this PR introduce power-cycle as the final approach to recover DUT.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

On some platforms, DUT cannot be recovered from RO-disk state by reboot. (e.g., On Nokia-7215, we saw the reboot is blocked by systemd-journald.service) To avoid DUT stuck at RO disk state, this PR introduce power-cycle as the final approach to recover DUT.

How did you do it?

If reboot failed to recover DUT from RO disk state, try power-cycle to recover the DUT.

How did you verify/test it?

Verified on Nokia-7215 M0 testbed. Get test passed with below logs:

tacacs/test_ro_disk.py::test_ro_disk[dut-7215-4]
-------------------------------------------------------------------------------- live log call --------------------------------------------------------------------------------
10:02:17 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results =>
{"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:0/3
10:04:02 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results =>
{"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:1/3
10:05:24 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results =>
{"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:2/3
10:05:44 test_ro_disk.do_reboot L0095 ERROR | Failed to reboot DUT after 3 retries
10:05:44 test_ro_disk.test_ro_disk L0262 WARNING| Failed to reboot dut-7215-4, try PDU reboot to restore disk RW state
PASSED [100%]

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

…sn't work (sonic-net#13974)

What is the motivation for this PR?
On some platforms, DUT cannot be recovered from RO-disk state by reboot. (e.g., On Nokia-7215, we saw the reboot is blocked by systemd-journald.service) To avoid DUT stuck at RO disk state, this PR introduce power-cycle as the final approach to recover DUT.

How did you do it?
If reboot failed to recover DUT from RO disk state, try power-cycle to recover the DUT.

How did you verify/test it?
Verified on Nokia-7215 M0 testbed. Get test passed with below logs:

tacacs/test_ro_disk.py::test_ro_disk[dut-7215-4]
-------------------------------------------------------------------------------- live log call --------------------------------------------------------------------------------
10:02:17 test_ro_disk.do_reboot                   L0089 ERROR  | DUT did not go down, exception: run module command failed, Ansible Results =>
{"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:0/3
10:04:02 test_ro_disk.do_reboot                   L0089 ERROR  | DUT did not go down, exception: run module command failed, Ansible Results =>
{"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:1/3
10:05:24 test_ro_disk.do_reboot                   L0089 ERROR  | DUT did not go down, exception: run module command failed, Ansible Results =>
{"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:2/3
10:05:44 test_ro_disk.do_reboot                   L0095 ERROR  | Failed to reboot DUT after 3 retries
10:05:44 test_ro_disk.test_ro_disk                L0262 WARNING| Failed to reboot dut-7215-4, try PDU reboot to restore disk RW state
PASSED
@mssonicbld
Copy link
Copy Markdown
Collaborator Author

Original PR: #13974

@mssonicbld mssonicbld merged commit 54c5bcc into sonic-net:202405 Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants