[action] [PR:13974] [test_ro_disk] Recover DUT to RW state by power-cycle when reboot doesn't work#13986
Closed
mssonicbld wants to merge 1 commit intosonic-net:202311from
Closed
[action] [PR:13974] [test_ro_disk] Recover DUT to RW state by power-cycle when reboot doesn't work#13986mssonicbld wants to merge 1 commit intosonic-net:202311from
mssonicbld wants to merge 1 commit intosonic-net:202311from
Conversation
…sn't work (sonic-net#13974) What is the motivation for this PR? On some platforms, DUT cannot be recovered from RO-disk state by reboot. (e.g., On Nokia-7215, we saw the reboot is blocked by systemd-journald.service) To avoid DUT stuck at RO disk state, this PR introduce power-cycle as the final approach to recover DUT. How did you do it? If reboot failed to recover DUT from RO disk state, try power-cycle to recover the DUT. How did you verify/test it? Verified on Nokia-7215 M0 testbed. Get test passed with below logs: tacacs/test_ro_disk.py::test_ro_disk[dut-7215-4] -------------------------------------------------------------------------------- live log call -------------------------------------------------------------------------------- 10:02:17 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results => {"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:0/3 10:04:02 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results => {"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:1/3 10:05:24 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results => {"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:2/3 10:05:44 test_ro_disk.do_reboot L0095 ERROR | Failed to reboot DUT after 3 retries 10:05:44 test_ro_disk.test_ro_disk L0262 WARNING| Failed to reboot dut-7215-4, try PDU reboot to restore disk RW state PASSED
8 tasks
Collaborator
Author
|
Original PR: #13974 |
Collaborator
Author
|
/azp run Azure.sonic-mgmt |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
Author
|
/azp run Azure.sonic-mgmt |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
|
I'll manually backport and fix the PR test failure. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary:
On some platforms, DUT cannot be recovered from RO-disk state by reboot. (e.g., On Nokia-7215, we saw the reboot is blocked by systemd-journald.service) To avoid DUT stuck at RO disk state, this PR introduce power-cycle as the final approach to recover DUT.
Type of change
Back port request
Approach
What is the motivation for this PR?
On some platforms, DUT cannot be recovered from RO-disk state by reboot. (e.g., On Nokia-7215, we saw the reboot is blocked by systemd-journald.service) To avoid DUT stuck at RO disk state, this PR introduce power-cycle as the final approach to recover DUT.
How did you do it?
If reboot failed to recover DUT from RO disk state, try power-cycle to recover the DUT.
How did you verify/test it?
Verified on Nokia-7215 M0 testbed. Get test passed with below logs:
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation