Skip to content

Fix test_warm_reboot issues blocking PR merge#2309

Merged
bingwang-ms merged 3 commits intosonic-net:masterfrom
vaibhavhd:check-skip-wb
Jun 8, 2022
Merged

Fix test_warm_reboot issues blocking PR merge#2309
bingwang-ms merged 3 commits intosonic-net:masterfrom
vaibhavhd:check-skip-wb

Conversation

@vaibhavhd
Copy link
Copy Markdown
Contributor

@vaibhavhd vaibhavhd commented Jun 3, 2022

What I did

Added fixes to test_warm_reboot which is consistently failing on master branch.
This PR should unblock the pending PR merges on sonic-swss due to PR test issues.

Why I did it

There are multiple issues that I think should be fixed in the test_warm_reboot.

Changes added in this PR:

  1. Failures to assert restore_count during warm-restart of different services. The cause is that sometimes when service is stopped and immediately started back again, the start command is being missed/ignored/denied by supervisord. This is fixed by adding sleep(1) after service stop and before start.
  2. Failures in test_routing_WarmRestart: the errors (below snippet) are supposedly coming due to some change that added weight attribute to return value of routes. Due to this if exact match of dict is done, it would fail as the device has more attributes than what are being expected.
>       assert rt_val == {"ifname": "Ethernet0,Ethernet4,Ethernet8", "nexthop": "111.0.0.2,122.0.0.2,133.0.0.2"}
E       AssertionError: assert {'ifname': 'E...ght': '2,2,2'} == {'ifname': 'E....2,133.0.0.2'}
E         Omitting 2 identical items, use -vv to show
E         Left contains 1 more item:
E         {'weight': '2,2,2'}
  1. Whenever failure # 1 is triggered, the device is left in bad state (service stopped, and never restarted), wherein when next testcase gets executed, device does not respond the requests from test, and test keeps waiting indefinitely. Due to # 1 being fixed, hung issue should be seen much less.

Future fixes:

  1. Fix how testbed setup and teardown is handled per case within test_warm_reboot.
  2. Understand why explicit start is needed for some services, rather then waiting for timed auto restart to kick in.
  3. Add a timeout for each case so that even if the test is hung, it does not block pipeline for 8+hours.
  4. Collect the artifacts from test_warm_reboot for easier debug.

How I verified it
Tested locally and on pipeline.

Details if related

@stephenxs
Copy link
Copy Markdown
Collaborator

Hi @vaibhavhd
I noticed that it took 6 hours to finish swss vs test, which caused it eventually timeout.
is this the WA to avoid this test issue or we need to extend the timeout?
thank you.

@vaibhavhd vaibhavhd changed the title Do not merge - test without warmboot Fix test_warm_reboot issues blocking PR merge Jun 8, 2022
@vaibhavhd vaibhavhd marked this pull request as ready for review June 8, 2022 00:18
@vaibhavhd vaibhavhd requested a review from prsunny as a code owner June 8, 2022 00:18
@vaibhavhd
Copy link
Copy Markdown
Contributor Author

Hi @vaibhavhd I noticed that it took 6 hours to finish swss vs test, which caused it eventually timeout. is this the WA to avoid this test issue or we need to extend the timeout? thank you.

The previous workaround was added just to confirm if without warmboot the test suite passes or not. It did pass.
Now, I have added fixes for test_warm_reboot, and identified some items to be fixed in future PR.

@bingwang-ms bingwang-ms merged commit 2ff763f into sonic-net:master Jun 8, 2022
@prsunny
Copy link
Copy Markdown
Collaborator

prsunny commented Jun 8, 2022

@vaibhavhd , thanks for fixing.

@vivekrnv
Copy link
Copy Markdown
Contributor

vivekrnv commented Jun 8, 2022

PR to 202111 (Eg: #2291, #2313, #2303) are also failing/timing out because of test_warm_reboot. Can you backport this fix to 202111?

liushilongbuaa pushed a commit to liushilongbuaa/sonic-swss that referenced this pull request Jun 9, 2022
* Two fixes: sleep after stop and check values in routes
prsunny pushed a commit that referenced this pull request Jun 9, 2022
* Two fixes: sleep after stop and check values in routes

Co-authored-by: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com>
dgsudharsan pushed a commit to dgsudharsan/sonic-swss that referenced this pull request Jun 13, 2022
* Two fixes: sleep after stop and check values in routes
judyjoseph pushed a commit that referenced this pull request Jun 14, 2022
* Two fixes: sleep after stop and check values in routes
qiluo-msft pushed a commit that referenced this pull request Jun 14, 2022
* Two fixes: sleep after stop and check values in routes
preetham-singh pushed a commit to preetham-singh/sonic-swss that referenced this pull request Aug 6, 2022
* Two fixes: sleep after stop and check values in routes
Janetxxx pushed a commit to Janetxxx/sonic-swss that referenced this pull request Nov 10, 2025
* Two fixes: sleep after stop and check values in routes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants