Skip to content

[recover_server] Add script to recover lab testbeds#2801

Merged
lolyu merged 1 commit intosonic-net:masterfrom
lolyu:add_recover_server
Jan 19, 2021
Merged

[recover_server] Add script to recover lab testbeds#2801
lolyu merged 1 commit intosonic-net:masterfrom
lolyu:add_recover_server

Conversation

@lolyu
Copy link
Collaborator

@lolyu lolyu commented Jan 13, 2021

Description of PR

Add script recover_server to recover lab testbeds after lab power
outtage or server reboot.

The script will create threads running the recovering jobs for each
server(one server to one thread). And each thread will try to first
cleanup the server, then run start-vms, add-topo, deploy-mg for each
testbed defined.

Note that for each server, if the cleanup fails, all the recoveries for
testbeds(start-vms, add-topo, deploy-mg) will be skipped. And for each
recovery for a testbed, if one task fails, all the following tasks will
be skipped.

Signed-off-by: Longxiang Lyu lolv@microsoft.com

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

What is the motivation for this PR?

Automate the process of server cleanup, testbed deployment after lab power outage or server reboot.

How did you do it?

Add a new script recover_server.py to run the testbed deployments in parallel.

How did you verify/test it?

./recover_server.py --testbed-servers server_1 --testbed-servers server_2 --testbed-servers server_3 --inventory str --testbed testbed1.csv --log-level=debug

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@lolyu lolyu force-pushed the add_recover_server branch 2 times, most recently from fb0e357 to 1606936 Compare January 13, 2021 11:33
@lgtm-com
Copy link

lgtm-com bot commented Jan 13, 2021

This pull request introduces 1 alert when merging 16069362f49be6ace0795a9fbedbff1eef0db37e into 1db5db1 - view on LGTM.com

new alerts:

  • 1 for Unused import

@lolyu lolyu force-pushed the add_recover_server branch 3 times, most recently from 852f6da to e908f05 Compare January 13, 2021 16:37
Add script `recover_server` to recover lab testbeds after lab power
outtage or server reboot.

The script will create threads running the recovering jobs for each
server(one server to one thread). And each thread will try to first
cleanup the server, then run start-vms, add-topo, deploy-mg for each
testbed defined.

Note that for each server, if the cleanup fails, all the recoveries for
testbeds(start-vms, add-topo, deploy-mg) will be skipped. And for each
recovery for a testbed, if one task fails, all the following tasks will
be skipped.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@lolyu lolyu force-pushed the add_recover_server branch from e908f05 to 2742a30 Compare January 15, 2021 04:41
@lolyu lolyu marked this pull request as ready for review January 15, 2021 04:44
@lolyu lolyu requested a review from a team January 15, 2021 04:44
@lolyu lolyu added the Testbed label Jan 15, 2021
@lolyu lolyu merged commit f696e02 into sonic-net:master Jan 19, 2021
"""Task deploy-mg."""

def __init__(self, tbname, inventory, passfile, log_save_dir, tbfile=None, vmfile=None, dry_run=False):
Task.__init__(self, tbname + '_deloy_mg', log_save_dir=log_save_dir, tbfile=tbfile, vmfile=vmfile, dry_run=dry_run)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "deloy"?

kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…atically (sonic-net#14752)

src/sonic-utilities

* ece22b7d - (HEAD -> 202205, origin/202205) Revert "[GCU] Add PFC_WD RDMA validator  (sonic-net#2781)" (4 minutes ago) [Ying Xie]
* 7d16b184 - Remove the no use new line in show version (sonic-net#2792) (21 hours ago) [xumia]
* 3a880a2b - Support to display the SONiC OS Version in the command show version (sonic-net#2787) (21 hours ago) [xumia]
* a5199f75 - [voq][chassis][generate_dump] [BCM] Dump only the relevant BCM commands for fabric cards (sonic-net#2606) (21 hours ago) [saksarav-nokia]
* 2410d364 - Fixed a bug in "show vnet routes all" causing screen overrun. (sonic-net#2644) (sonic-net#2801) (
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants