[action] [PR:13974] [test_ro_disk] Recover DUT to RW state by power-cycle when reboot doesn't work by mssonicbld · Pull Request #13986 · sonic-net/sonic-mgmt

mssonicbld · 2024-08-05T16:19:39Z

Description of PR

Summary:
On some platforms, DUT cannot be recovered from RO-disk state by reboot. (e.g., On Nokia-7215, we saw the reboot is blocked by systemd-journald.service) To avoid DUT stuck at RO disk state, this PR introduce power-cycle as the final approach to recover DUT.

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Back port request

Approach

What is the motivation for this PR?

On some platforms, DUT cannot be recovered from RO-disk state by reboot. (e.g., On Nokia-7215, we saw the reboot is blocked by systemd-journald.service) To avoid DUT stuck at RO disk state, this PR introduce power-cycle as the final approach to recover DUT.

How did you do it?

If reboot failed to recover DUT from RO disk state, try power-cycle to recover the DUT.

How did you verify/test it?

Verified on Nokia-7215 M0 testbed. Get test passed with below logs:

tacacs/test_ro_disk.py::test_ro_disk[dut-7215-4]
-------------------------------------------------------------------------------- live log call --------------------------------------------------------------------------------
10:02:17 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results =>
{"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:0/3
10:04:02 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results =>
{"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:1/3
10:05:24 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results =>
{"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:2/3
10:05:44 test_ro_disk.do_reboot L0095 ERROR | Failed to reboot DUT after 3 retries
10:05:44 test_ro_disk.test_ro_disk L0262 WARNING| Failed to reboot dut-7215-4, try PDU reboot to restore disk RW state
PASSED [100%]

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

…sn't work (sonic-net#13974) What is the motivation for this PR? On some platforms, DUT cannot be recovered from RO-disk state by reboot. (e.g., On Nokia-7215, we saw the reboot is blocked by systemd-journald.service) To avoid DUT stuck at RO disk state, this PR introduce power-cycle as the final approach to recover DUT. How did you do it? If reboot failed to recover DUT from RO disk state, try power-cycle to recover the DUT. How did you verify/test it? Verified on Nokia-7215 M0 testbed. Get test passed with below logs: tacacs/test_ro_disk.py::test_ro_disk[dut-7215-4] -------------------------------------------------------------------------------- live log call -------------------------------------------------------------------------------- 10:02:17 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results => {"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:0/3 10:04:02 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results => {"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:1/3 10:05:24 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results => {"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:2/3 10:05:44 test_ro_disk.do_reboot L0095 ERROR | Failed to reboot DUT after 3 retries 10:05:44 test_ro_disk.test_ro_disk L0262 WARNING| Failed to reboot dut-7215-4, try PDU reboot to restore disk RW state PASSED

mssonicbld · 2024-08-05T16:19:43Z

Original PR: #13974

mssonicbld · 2024-08-06T01:47:31Z

/azp run Azure.sonic-mgmt

azure-pipelines · 2024-08-06T01:47:42Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2024-08-07T01:48:41Z

/azp run Azure.sonic-mgmt

azure-pipelines · 2024-08-07T01:48:52Z

Azure Pipelines successfully started running 1 pipeline(s).

lizhijianrd · 2024-08-07T05:14:44Z

I'll manually backport and fix the PR test failure.

mssonicbld added the automerge label Aug 5, 2024

mssonicbld mentioned this pull request Aug 5, 2024

[test_ro_disk] Recover DUT to RW state by power-cycle when reboot doesn't work #13974

Merged

8 tasks

lizhijianrd closed this Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[action] [PR:13974] [test_ro_disk] Recover DUT to RW state by power-cycle when reboot doesn't work#13986

[action] [PR:13974] [test_ro_disk] Recover DUT to RW state by power-cycle when reboot doesn't work#13986
mssonicbld wants to merge 1 commit intosonic-net:202311from
mssonicbld:cherry/202311/13974

mssonicbld commented Aug 5, 2024

Uh oh!

mssonicbld commented Aug 5, 2024

Uh oh!

mssonicbld commented Aug 6, 2024

Uh oh!

azure-pipelines bot commented Aug 6, 2024

Uh oh!

mssonicbld commented Aug 7, 2024

Uh oh!

azure-pipelines bot commented Aug 7, 2024

Uh oh!

lizhijianrd commented Aug 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mssonicbld commented Aug 5, 2024

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Uh oh!

mssonicbld commented Aug 5, 2024

Uh oh!

mssonicbld commented Aug 6, 2024

Uh oh!

azure-pipelines bot commented Aug 6, 2024

Uh oh!

mssonicbld commented Aug 7, 2024

Uh oh!

azure-pipelines bot commented Aug 7, 2024

Uh oh!

lizhijianrd commented Aug 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants