[recover_server] Add script to recover lab testbeds#2801
Merged
lolyu merged 1 commit intosonic-net:masterfrom Jan 19, 2021
Merged
[recover_server] Add script to recover lab testbeds#2801lolyu merged 1 commit intosonic-net:masterfrom
lolyu merged 1 commit intosonic-net:masterfrom
Conversation
fb0e357 to
1606936
Compare
|
This pull request introduces 1 alert when merging 16069362f49be6ace0795a9fbedbff1eef0db37e into 1db5db1 - view on LGTM.com new alerts:
|
852f6da to
e908f05
Compare
Add script `recover_server` to recover lab testbeds after lab power outtage or server reboot. The script will create threads running the recovering jobs for each server(one server to one thread). And each thread will try to first cleanup the server, then run start-vms, add-topo, deploy-mg for each testbed defined. Note that for each server, if the cleanup fails, all the recoveries for testbeds(start-vms, add-topo, deploy-mg) will be skipped. And for each recovery for a testbed, if one task fails, all the following tasks will be skipped. Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
e908f05 to
2742a30
Compare
bingwang-ms
reviewed
Jan 19, 2021
bingwang-ms
approved these changes
Jan 19, 2021
qiluo-msft
reviewed
Feb 27, 2021
| """Task deploy-mg.""" | ||
|
|
||
| def __init__(self, tbname, inventory, passfile, log_save_dir, tbfile=None, vmfile=None, dry_run=False): | ||
| Task.__init__(self, tbname + '_deloy_mg', log_save_dir=log_save_dir, tbfile=tbfile, vmfile=vmfile, dry_run=dry_run) |
kazinator-arista
pushed a commit
to kazinator-arista/sonic-mgmt
that referenced
this pull request
Mar 4, 2026
…atically (sonic-net#14752) src/sonic-utilities * ece22b7d - (HEAD -> 202205, origin/202205) Revert "[GCU] Add PFC_WD RDMA validator (sonic-net#2781)" (4 minutes ago) [Ying Xie] * 7d16b184 - Remove the no use new line in show version (sonic-net#2792) (21 hours ago) [xumia] * 3a880a2b - Support to display the SONiC OS Version in the command show version (sonic-net#2787) (21 hours ago) [xumia] * a5199f75 - [voq][chassis][generate_dump] [BCM] Dump only the relevant BCM commands for fabric cards (sonic-net#2606) (21 hours ago) [saksarav-nokia] * 2410d364 - Fixed a bug in "show vnet routes all" causing screen overrun. (sonic-net#2644) (sonic-net#2801) (
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Add script
recover_serverto recover lab testbeds after lab powerouttage or server reboot.
The script will create threads running the recovering jobs for each
server(one server to one thread). And each thread will try to first
cleanup the server, then run start-vms, add-topo, deploy-mg for each
testbed defined.
Note that for each server, if the cleanup fails, all the recoveries for
testbeds(start-vms, add-topo, deploy-mg) will be skipped. And for each
recovery for a testbed, if one task fails, all the following tasks will
be skipped.
Signed-off-by: Longxiang Lyu lolv@microsoft.com
Summary:
Fixes # (issue)
Type of change
Approach
What is the motivation for this PR?
Automate the process of server cleanup, testbed deployment after lab power outage or server reboot.
How did you do it?
Add a new script
recover_server.pyto run the testbed deployments in parallel.How did you verify/test it?
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation