kill zombie process before running test#4191
Conversation
|
This behavior change is a little risky, I think. Since we may run |
The run_tests.sh tool is mainly used for nightly test. In this case, it does not need to worry about deploy jobs running at the same time. This change is to workaround the possible issue that the pytest process is not terminate properly. Then the run_tests.sh would fail and do not have a chance to do cleanup. I think purpose of such kind of change is similar to "restart-ptf", to prepare a clean environment for nightly tests. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
What is the motivation for this PR? In case something went wrong, previous test run may result in zombie processes running in the container even after the testing is completed. The zombie process could have negative impact to subsequent test runs. It would be more robust to start new tests if we try to kill any possible zombie process before test runs. How did you do it? Use pkill to kill pytest/ansible-playbook process and ssh process initiated by ansible How did you verify/test it? Run first test to simulate the zombie process: ./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c bgp/test_bgp_fact.py -f vtestbed.csv -i veos_vtb When first test is running, run test command again and check whether the previous process was killed Co-authored-by: yuxuanye <[email protected]>
…atically (#24992) #### Why I did it src/sonic-utilities ``` * 367aba94 - (HEAD -> 202511, origin/202511) [mellanox] [db_migrator] add a migration for tunnel ecn mode (sonic-net#4132) (sonic-net#4167) (5 days ago) [Yakiv Huryk] * 12601e4f - [Mellanox] Fix generate_dump sysfs copy to copy only files with permission (sonic-net#4191) (5 days ago) [mssonicbld] ``` #### How I did it #### How to verify it #### Description for the changelog
…lly (#25405) #### Why I did it src/sonic-swss ``` * 13227d02 - (HEAD -> 202511, origin/202511) [countersyncd]: Add communication statistics recording and utilities (sonic-net#4222) (2 days ago) [mssonicbld] * 3c4d3b2b - [countersyncd]: Add retry between client and otel collector (sonic-net#4220) (3 days ago) [mssonicbld] * 77acf5a0 - [countersyncd] fix otel actor log level (sonic-net#4221) (3 days ago) [mssonicbld] * 03ec77c7 - [countersyncd]: Add benchmark suite for countersyncd and optimize otel actor (sonic-net#4216) (5 days ago) [mssonicbld] * 08050f2e - [hft]: Fix TAM type capability enable list (sonic-net#4215) (6 days ago) [mssonicbld] * d0793b45 - [Fixbug]: Fix delete default HFT configuration issue (sonic-net#4138) (7 days ago) [mssonicbld] * 246d9575 - [hft]: Enable output queue for HFT (sonic-net#4187) (7 days ago) [mssonicbld] * ae6a9887 - [countersyncd]: Fix netlink fd leakage and deadlock issue (sonic-net#4191) (7 days ago) [mssonicbld] * c468e1fc - [countersyncd]: Fix compiling warning of otel (sonic-net#4192) (7 days ago) [mssonicbld] * d675062c - Enabling the FEC histogram for gbsyncd counters (sonic-net#4195) (9 days ago) [mssonicbld] ``` #### How I did it #### How to verify it #### Description for the changelog
Description of PR
Summary:
Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
In case something went wrong, previous test run may result in zombie processes running in the container even after the testing is completed. The zombie process could have negative impact to subsequent test runs. It would be more robust to start new
tests if we try to kill any possible zombie process before test runs.
How did you do it?
Use
pkillto kill pytest/ansible-playbook process and ssh process initiated by ansibleHow did you verify/test it?
Run first test to simulate the zombie process:
./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c bgp/test_bgp_fact.py -f vtestbed.csv -i veos_vtbWhen first test is running, run test command again and check whether the previous process was killed
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation