Skip to content

[autorestart] parametize autorestart test#2533

Merged
yxieca merged 3 commits intosonic-net:masterfrom
yxieca:autorestart
Nov 17, 2020
Merged

[autorestart] parametize autorestart test#2533
yxieca merged 3 commits intosonic-net:masterfrom
yxieca:autorestart

Conversation

@yxieca
Copy link
Collaborator

@yxieca yxieca commented Nov 15, 2020

Description of PR

Summary:
Improve autorestart test.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

What is the motivation for this PR?

autorestart test run test in a loop for multiple containers. We don't get to see which container caused failure easily. And when a container fails, it stops the rest of the test.

Also noticed that autorestart could leave all BGP sessions down. Also don't know which container restart caused it.

How did you do it?

  • Add infrastructure to enumerate dut features.
  • Address an issue with DutHosts nodes indexing.
  • Parameterize autorestart test.
  • Add BGP session check and recover code to autorestart.

Signed-off-by: Ying Xie [email protected]

How did you verify/test it?

Run autorestart test on multi-dut testbed and single dut testbed. Single testbed passes the test on 201911 branch image. Dualtor testbed is running master image and there are some failures.

autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|lldp] PASSED [ 3%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|pmon] PASSED [ 7%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|pmon] ERROR [ 7%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|sflow] SKIPPED [ 10%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|database] SKIPPED [ 14%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|snmp] PASSED [ 17%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|telemetry] PASSED [ 21%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|bgp] PASSED [ 25%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|radv] PASSED [ 28%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|radv] ERROR [ 28%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|mgmt-framework] PASSED [ 32%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|nat] SKIPPED [ 35%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|teamd] FAILED [ 39%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|teamd] ERROR [ 39%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|dhcp_relay] PASSED [ 42%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|swss] PASSED [ 46%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|swss] ERROR [ 46%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|syncd] FAILED [ 50%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|lldp] SKIPPED [ 53%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|pmon] SKIPPED [ 57%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|sflow] SKIPPED [ 60%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|database] SKIPPED [ 64%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|snmp] SKIPPED [ 67%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|telemetry] SKIPPED [ 71%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|bgp] SKIPPED [ 75%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|radv] SKIPPED [ 78%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|mgmt-framework] SKIPPED [ 82%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|nat] SKIPPED [ 85%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|teamd] SKIPPED [ 89%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|dhcp_relay] SKIPPED [ 92%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|swss] SKIPPED [ 96%]
autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-09|syncd] SKIPPED [100%]

ERROR autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|pmon] - LogAnalyzerError: {'match_messages': {'/tmp/syslog.2020-11-15-08:03:54': ["Nov...
ERROR autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|radv] - LogAnalyzerError: {'match_messages': {'/tmp/syslog.2020-11-15-08:11:05': ['Nov...
ERROR autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|teamd] - LogAnalyzerError: {'match_messages': {'/tmp/syslog.2020-11-15-08:17:36': ['No...
ERROR autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|swss] - LogAnalyzerError: {'match_messages': {'/tmp/syslog.2020-11-15-08:21:59': ['Nov...
FAILED autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|teamd] - Failed: Some BGP sessions went down after testing feature teamd
FAILED autorestart/test_container_autorestart.py::test_containers_autorestart[str2-7050cx3-acs-08|syncd] - Failed: Failed to restart container 'syncd'

- Add infrastructure to enumerate dut features.
- Address an issue with DutHosts nodes indexing.
- Parameterize autorestart test.
- Add BGP session check and recover code to autorestart.

Signed-off-by: Ying Xie <[email protected]>
except Exception as e:
logging.error("Hack for https://github.com/ansible/pytest-ansible/issues/47 failed: {}".format(repr(e)))

logger = logging.getLogger(__name__)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks strange. But there is logger access in this file without defining it. Not sure how these code were tested, or even tested?

@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2020

This pull request introduces 1 alert when merging 2989c15 into 85d0b3a - view on LGTM.com

new alerts:

  • 1 for Unused import

@bingwang-ms
Copy link
Collaborator

Is LogAnalyzer enable when verifying this case? I remembered that there are several ERROR in syslog when debugged this case.

@yxieca
Copy link
Collaborator Author

yxieca commented Nov 16, 2020

Is LogAnalyzer enable when verifying this case? I remembered that there are several ERROR in syslog when debugged this case.

Yes. There are some loganalyzer failures when testing against master branch image. I included the brief summary in the PR description.

@yozhao101
Copy link
Contributor

yozhao101 commented Nov 17, 2020

The changes in the file test_container_autorestart.py looks good to me. For logAnalyzer, do we need disable it for a specific container?

@yxieca
Copy link
Collaborator Author

yxieca commented Nov 17, 2020

The changes in the file test_container_autorestart.py looks good to me. For logAnalyzer, do we need disable it for a specific container?

Thanks Yong. I think we want to do per feature loganalyzer skipping rule. I want to limit this change to parameterize, bgp checking, and recover. I don't want to continue bloat this change.

@yxieca yxieca merged commit f8f9201 into sonic-net:master Nov 17, 2020
@yxieca yxieca deleted the autorestart branch November 17, 2020 20:01
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
swss:
* 6902a98 2022-12-13 | [muxorch] Skip programming SoC IP kernel tunnel route (sonic-net#2557) (HEAD -> 202205) [Longxiang Lyu]
* 8a86404 2022-12-07 | [portinit] Do not call GET on SAI_PORT_ATTR_SPEED when AUTONEG is enabled (sonic-net#2484) [Vaibhav Hemant Dixit]
* d16f51d 2022-12-07 | Revert "sonic-swss: Fix orchagent crash in generateQueueMapPerPort. (sonic-net#2552)" (github/202205) [Ying Xie]
* abc6a81 2022-12-05 | sonic-swss: Fix orchagent crash in generateQueueMapPerPort. (sonic-net#2552) [Sambath Kumar Balasubramanian]

sonic-utilities:
* 2c29fde 2022-12-13 | [202205][route_check]: Ignore ASIC only SOC IPs (cherry-picking sonic-net#2548) (sonic-net#2552) (HEAD -> 202205, github/202205) [Ying Xie]
* aaa8d25 2022-12-13 | [202205][generate_dump]: Enhance show techsupport for cisco-8000 platform (sonic-net#2533) [Geert Vlaemynck]
* 25d581e 2022-12-13 | [202205][show]Fix show route return code on error (sonic-net#2547) [Sudharsan Dhamal Gopalarathnam]
* da870fc 2022-11-17 | [azure-pipelines] update azp from buster to bullseye (sonic-net#2455) [Mai Bui]

Signed-off-by: Ying Xie <[email protected]>

Signed-off-by: Ying Xie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants