Conversation
swss restart test discovers an image issue that would cause orchagent to crash. As result, the test should fail instead passing and leaving the system in a bad state. This PR addressed the test false negative issue. The 'leaving system unhealthy' part will be addressed by a subsequent PR. Signed-off-by: Ying Xie <[email protected]>
neethajohn
reviewed
Jul 29, 2020
| status, _ = _get_critical_processes_status(dut) | ||
| return status | ||
|
|
||
| def check_critical_processes(dut, watch_secs=0): |
Contributor
There was a problem hiding this comment.
Why are we not using the existing 'all_critical_process_status' in devices.py?
Collaborator
Author
There was a problem hiding this comment.
We are using it. See line 14 :-)
neethajohn
reviewed
Jul 29, 2020
| logging.info("Check all critical processes are healthy for {} seconds".format(watch_secs)) | ||
| while watch_secs >= 0: | ||
| status, details = _get_critical_processes_status(dut) | ||
| pytest_assert(status, "Not all critical processes are healthy: {}".format(details)) |
Contributor
There was a problem hiding this comment.
shouldn't this be just logging error since we want this loop to continue?
Collaborator
Author
There was a problem hiding this comment.
The purpose of this loop is to make sure that there is no critical process failure for 60 seconds (or a spot check if watch_secs == 0). If there is a failure, then the test should fail.
Collaborator
Author
There was a problem hiding this comment.
What you have in mind is the other method: wait_critical_process()
Contributor
There was a problem hiding this comment.
I got confused with the wait_critical_process approach. It is clear now
neethajohn
approved these changes
Jul 29, 2020
kazinator-arista
pushed a commit
to kazinator-arista/sonic-mgmt
that referenced
this pull request
Mar 4, 2026
swss 73caba3 Allow interface type value none (sonic-net#1991) utilities 32e530f Allow interface type value none (sonic-net#1902) 53f066c Fix log_ssd_health hang issue (sonic-net#1904)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Fixes # (issue)
Type of change
Approach
What is the motivation for this PR?
Test should have failed due to critical process crash (image issue). But test failed to detect it and left system in unhealthy state.
How did you do it?
swss restart test discovers an image issue that would cause orchagent to crash. As result, the test should fail instead passing and leaving the system in a bad state.
This PR addressed the test false negative issue. The 'leaving system unhealthy' part will be addressed by a subsequent PR.
Signed-off-by: Ying Xie [email protected]
How did you verify/test it?
yinxi@acs-trusty8:/var/src/sonic-mgmt/tests$ ./run_tests.sh -d str-dx010-acs-1 -n vms3-t1-dx010-1 -i /var/src/sonic-mgmt/ansible/str,/var/src/sonic-mgmt/ansible/veos -u -p /tmp/logs-01 -c platform_tests/test_sequential_restart.py::test_restart_swss
=== Running tests in groups ===
================================================================================== test session starts ===================================================================================
platform linux2 -- Python 2.7.12, pytest-4.6.5, py-1.8.1, pluggy-0.13.1
ansible: 2.8.7
rootdir: /var/src/sonic-mgmt/tests, inifile: pytest.ini
plugins: ansible-2.2.2, xdist-1.28.0, forked-1.1.3, repeat-0.8.0
collected 1 item
platform_tests/test_sequential_restart.py::test_restart_swss FAILED [100%]
======================================================================================== FAILURES ========================================================================================
___________________________________________________________________________________ test_restart_swss ____________________________________________________________________________________
duthost = <tests.common.devices.SonicHost object at 0x7f8abb159950>, localhost = <tests.common.devices.Localhost object at 0x7f8abb1592d0>
conn_graph_facts = {'device_conn': {'Ethernet0': {'peerdevice': u'str-7060cx-32s-21', 'peerport': u'Ethernet1/1', 'speed': u'100000'}, 'E...ss', 'vlanids': u'2006', 'vlanlist': [2006]}, ...}, 'device_vlan_list': [1981, 1979, 1980, 2006, 2004, 2005, ...], ...}
conn_graph_facts = {'device_conn': {'Ethernet0': {'peerdevice': u'str-7060cx-32s-21', 'peerport': u'Ethernet1/1', 'speed': u'100000'}, 'E...ss', 'vlanids': u'2006', 'vlanlist': [2006]}, ...}, 'device_vlan_list': [1981, 1979, 1980, 2006, 2004, 2005, ...], ...}
duthost = <tests.common.devices.SonicHost object at 0x7f8abb159950>
localhost = <tests.common.devices.Localhost object at 0x7f8abb1592d0>
platform_tests/test_sequential_restart.py:61:
platform_tests/test_sequential_restart.py:54: in restart_service_and_check
check_critical_processes(dut, 60)
dut = <tests.common.devices.SonicHost object at 0x7f8abb159950>, watch_secs = 60
E Failed: Not all critical processes are healthy: {'lldp': {'status': True, 'exited_critical_process': [], 'running_critical_process': [u'lldp-syncd', u'lldpd', u'lldpmgrd']}, 'pmon': {'status': True, 'exited_critical_process': [], 'running_critical_process': [u'psud', u'xcvrd']}, 'database': {'status': True, 'exited_critical_process': [], 'running_critical_process': [u'redis']}, 'snmp': {'status': True, 'exited_critical_process': [], 'running_critical_process': [u'snmp-subagent', u'snmpd']}, 'bgp': {'status': True, 'exited_critical_process': [], 'running_critical_process': [u'bgpcfgd', u'bgpd', u'fpmsyncd', u'staticd', u'zebra']}, 'teamd': {'status': True, 'exited_critical_process': [], 'running_critical_process': [u'teammgrd', u'teamsyncd']}, 'syncd': {'status': True, 'exited_critical_process': [], 'running_critical_process': [u'syncd']}, 'swss': {'status': False, 'exited_critical_process': [u'orchagent'], 'running_critical_process': [u'buffermgrd', u'intfmgrd', u'nbrmgrd', u'neighsyncd', u'portmgrd', u'portsyncd', u'vlanmgrd', u'vrfmgrd', u'vxlanmgrd']}}
details = {'bgp': {'exited_critical_process': [], 'running_critical_process': ['bgpcfgd', 'bgpd', 'fpmsyncd', 'staticd', 'zebra'...s': True}, 'pmon': {'exited_critical_process': [], 'running_critical_process': ['psud', 'xcvrd'], 'status': True}, ...}
dut = <tests.common.devices.SonicHost object at 0x7f8abb159950>
status = False
watch_secs = 60
common/platform/processes_utils.py:37: Failed
==================================================================================== warnings summary ====================================================================================
/usr/local/lib/python2.7/dist-packages/_pytest/cacheprovider.py:127
/usr/local/lib/python2.7/dist-packages/_pytest/cacheprovider.py:127: PytestCacheWarning: could not create cache path /var/src/sonic-mgmt/tests/.pytest_cache/v/cache/stepwise
self.warn("could not create cache path {path}", path=path)
/usr/local/lib/python2.7/dist-packages/_pytest/cacheprovider.py:127
/usr/local/lib/python2.7/dist-packages/_pytest/cacheprovider.py:127: PytestCacheWarning: could not create cache path /var/src/sonic-mgmt/tests/.pytest_cache/v/cache/nodeids
self.warn("could not create cache path {path}", path=path)
/usr/local/lib/python2.7/dist-packages/_pytest/cacheprovider.py:127
/usr/local/lib/python2.7/dist-packages/_pytest/cacheprovider.py:127: PytestCacheWarning: could not create cache path /var/src/sonic-mgmt/tests/.pytest_cache/v/cache/lastfailed
self.warn("could not create cache path {path}", path=path)
-- Docs: https://docs.pytest.org/en/latest/warnings.html
------------------------------------------------------------------------ generated xml file: /tmp/logs-01/tr.xml -------------------------------------------------------------------------
================================================================================ short test summary info =================================================================================
FAILED platform_tests/test_sequential_restart.py::test_restart_swss - Failed: Not all critical processes are healthy: {'lldp': {'status': True, 'exited_critical_process': [], 'running...
========================================================================= 1 failed, 3 warnings in 242.44 seconds =========================================================================