[recover_server] Add script to recover lab testbeds by lolyu · Pull Request #2801 · sonic-net/sonic-mgmt

lolyu · 2021-01-13T11:17:37Z

Description of PR

Add script recover_server to recover lab testbeds after lab power
outtage or server reboot.

The script will create threads running the recovering jobs for each
server(one server to one thread). And each thread will try to first
cleanup the server, then run start-vms, add-topo, deploy-mg for each
testbed defined.

Note that for each server, if the cleanup fails, all the recoveries for
testbeds(start-vms, add-topo, deploy-mg) will be skipped. And for each
recovery for a testbed, if one task fails, all the following tasks will
be skipped.

Signed-off-by: Longxiang Lyu lolv@microsoft.com

Summary:
Fixes # (issue)

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Approach

What is the motivation for this PR?

Automate the process of server cleanup, testbed deployment after lab power outage or server reboot.

How did you do it?

Add a new script recover_server.py to run the testbed deployments in parallel.

How did you verify/test it?

./recover_server.py --testbed-servers server_1 --testbed-servers server_2 --testbed-servers server_3 --inventory str --testbed testbed1.csv --log-level=debug

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

lgtm-com · 2021-01-13T12:02:14Z

This pull request introduces 1 alert when merging 16069362f49be6ace0795a9fbedbff1eef0db37e into 1db5db1 - view on LGTM.com

new alerts:

1 for Unused import

Add script `recover_server` to recover lab testbeds after lab power outtage or server reboot. The script will create threads running the recovering jobs for each server(one server to one thread). And each thread will try to first cleanup the server, then run start-vms, add-topo, deploy-mg for each testbed defined. Note that for each server, if the cleanup fails, all the recoveries for testbeds(start-vms, add-topo, deploy-mg) will be skipped. And for each recovery for a testbed, if one task fails, all the following tasks will be skipped. Signed-off-by: Longxiang Lyu <lolv@microsoft.com>

ansible/recover_server.py

qiluo-msft · 2021-02-27T01:08:44Z

ansible/recover_server.py

+    """Task deploy-mg."""
+
+    def __init__(self, tbname, inventory, passfile, log_save_dir, tbfile=None, vmfile=None, dry_run=False):
+        Task.__init__(self, tbname + '_deloy_mg', log_save_dir=log_save_dir, tbfile=tbfile, vmfile=vmfile, dry_run=dry_run)


typo "deloy"?

…atically (sonic-net#14752) src/sonic-utilities * ece22b7d - (HEAD -> 202205, origin/202205) Revert "[GCU] Add PFC_WD RDMA validator (sonic-net#2781)" (4 minutes ago) [Ying Xie] * 7d16b184 - Remove the no use new line in show version (sonic-net#2792) (21 hours ago) [xumia] * 3a880a2b - Support to display the SONiC OS Version in the command show version (sonic-net#2787) (21 hours ago) [xumia] * a5199f75 - [voq][chassis][generate_dump] [BCM] Dump only the relevant BCM commands for fabric cards (sonic-net#2606) (21 hours ago) [saksarav-nokia] * 2410d364 - Fixed a bug in "show vnet routes all" causing screen overrun. (sonic-net#2644) (sonic-net#2801) (

lolyu force-pushed the add_recover_server branch 2 times, most recently from fb0e357 to 1606936 Compare January 13, 2021 11:33

lolyu force-pushed the add_recover_server branch 3 times, most recently from 852f6da to e908f05 Compare January 13, 2021 16:37

lolyu force-pushed the add_recover_server branch from e908f05 to 2742a30 Compare January 15, 2021 04:41

lolyu marked this pull request as ready for review January 15, 2021 04:44

lolyu requested a review from a team January 15, 2021 04:44

lolyu added the Testbed label Jan 15, 2021

bingwang-ms reviewed Jan 19, 2021

View reviewed changes

ansible/recover_server.py Show resolved Hide resolved

bingwang-ms approved these changes Jan 19, 2021

View reviewed changes

lolyu merged commit f696e02 into sonic-net:master Jan 19, 2021

qiluo-msft reviewed Feb 27, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[recover_server] Add script to recover lab testbeds#2801

[recover_server] Add script to recover lab testbeds#2801
lolyu merged 1 commit intosonic-net:masterfrom
lolyu:add_recover_server

lolyu commented Jan 13, 2021 •

edited

Loading

Uh oh!

lgtm-com bot commented Jan 13, 2021

Uh oh!

Uh oh!

qiluo-msft Feb 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lolyu commented Jan 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of PR

Type of change

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Uh oh!

lgtm-com bot commented Jan 13, 2021

Uh oh!

Uh oh!

qiluo-msft Feb 27, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lolyu commented Jan 13, 2021 •

edited

Loading