Skip to content

[action] [PR:12153] Revert "Fix qos/test_qos_sai.py teardown. (#11934)"#12164

Merged
mssonicbld merged 1 commit intosonic-net:202311from
mssonicbld:cherry/202311/12153
Mar 26, 2024
Merged

[action] [PR:12153] Revert "Fix qos/test_qos_sai.py teardown. (#11934)"#12164
mssonicbld merged 1 commit intosonic-net:202311from
mssonicbld:cherry/202311/12153

Conversation

@mssonicbld
Copy link
Collaborator

This reverts commit 66751e5.

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911
  • 202012
  • 202205
  • 202305
  • 202311

Approach

What is the motivation for this PR?

#11934 caused SNMP critical process crash issue in qos test

11:15:47 __init__._fixture_generator_decorator L0081 INFO | -------------------- fixture swapSyncd_on_selected_duts setup starts --------------------
11:16:43 sonic.is_container_running L0450 INFO | container syncd is not running
11:16:43 sonic.is_container_running L0450 INFO | container swss is not running
11:17:32 docker.swap_syncd L0169 INFO | Reloading config and restarting swss...
11:17:32 config_reload.config_reload L0092 INFO | reloading config_db
11:19:13 processes_utils.wait_critical_processes L0070 INFO | Wait until all critical processes are healthy in 300 sec
11:19:13 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:19:38 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:20:02 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:20:27 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:20:52 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:21:17 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:21:41 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:22:06 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:22:31 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:22:55 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:23:21 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:23:46 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:24:10 processes_utils._all_critical_processes_ L0039 INFO | Check critical processes status
11:25:25 docker.restore_default_syncd L0206 INFO | Reloading config and restarting swss...
11:25:25 config_reload.config_reload L0092 INFO | reloading config_db
ERROR [ 33%]

... omitted ...


 def wait_critical_processes(dut):
 """
 @summary: wait until all critical processes are healthy.
 @param dut: The AnsibleHost object of DUT. For interacting with DUT.
 """
 timeout = reset_timeout(dut)
 # No matter what we set in inventory file, we always set sup timeout to 900
 # because most SUPs have 10+ dockers that need to come up
 if dut.is_supervisor_node():
 timeout = 900
 logging.info("Wait until all critical processes are healthy in {} sec"
 .format(timeout))
> pytest_assert(wait_until(timeout, 20, 0, _all_critical_processes_healthy, dut),
 "Not all critical processes are healthy")
E Failed: Not all critical processes are healthy

dut = <MultiAsicSonicHost str3-8111-01>
timeout = 300

snmp's critical process hit issue:

$ docker exec snmp bash -c "[ -f /etc/supervisor/critical_processes ] && cat /etc/supervisor/critical_processes" 
program:snmpd
program:snmp-subagent

$ docker exec snmp supervisorctl status ;
dependent-startup RUNNING pid 8, uptime 0:00:31
rsyslogd RUNNING pid 19, uptime 0:00:30
snmp-subagent STOPPED Not started
snmpd FATAL Exited too quickly (process log may have details)
start EXITED Mar 22 10:20 AM
supervisor-proc-exit-listener RUNNING pid 9, uptime 0:00:31

How did you do it?

revert #11934 can help

How did you verify/test it?

after rever #11934, not see snmp crash in local qos test

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator Author

Original PR: #12153

@mssonicbld mssonicbld merged commit 8c0cb1a into sonic-net:202311 Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants