Skip to content

kill zombie process before running test#4191

Merged
wangxin merged 1 commit intosonic-net:masterfrom
diaryevil:kill-zombie-process
Sep 11, 2021
Merged

kill zombie process before running test#4191
wangxin merged 1 commit intosonic-net:masterfrom
diaryevil:kill-zombie-process

Conversation

@diaryevil
Copy link
Copy Markdown
Contributor

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911

Approach

What is the motivation for this PR?

In case something went wrong, previous test run may result in zombie processes running in the container even after the testing is completed. The zombie process could have negative impact to subsequent test runs. It would be more robust to start new
tests if we try to kill any possible zombie process before test runs.

How did you do it?

Use pkill to kill pytest/ansible-playbook process and ssh process initiated by ansible

How did you verify/test it?

Run first test to simulate the zombie process:

./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c bgp/test_bgp_fact.py -f vtestbed.csv -i veos_vtb

When first test is running, run test command again and check whether the previous process was killed

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@diaryevil diaryevil requested a review from a team as a code owner September 7, 2021 09:55
@bingwang-ms
Copy link
Copy Markdown
Collaborator

This behavior change is a little risky, I think. Since we may run ansible-playbook to do some job, say deploying a testbed, and at the same time we may use run_test.sh to debug some test case. Then the deploy job will be killed unexpectedly. I think users should be responsible to do the cleanup job after running test manually. How do you think?

@wangxin
Copy link
Copy Markdown
Collaborator

wangxin commented Sep 8, 2021

This behavior change is a little risky, I think. Since we may run ansible-playbook to do some job, say deploying a testbed, and at the same time we may use run_test.sh to debug some test case. Then the deploy job will be killed unexpectedly. I think users should be responsible to do the cleanup job after running test manually. How do you think?

The run_tests.sh tool is mainly used for nightly test. In this case, it does not need to worry about deploy jobs running at the same time. This change is to workaround the possible issue that the pytest process is not terminate properly. Then the run_tests.sh would fail and do not have a chance to do cleanup.

I think purpose of such kind of change is similar to "restart-ptf", to prepare a clean environment for nightly tests.

@bingwang-ms
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@wangxin wangxin merged commit 647a5fe into sonic-net:master Sep 11, 2021
vmittal-msft pushed a commit to vmittal-msft/sonic-mgmt that referenced this pull request Sep 28, 2021
What is the motivation for this PR?
In case something went wrong, previous test run may result in zombie processes running in the container even after the testing is completed. The zombie process could have negative impact to subsequent test runs. It would be more robust to start new tests if we try to kill any possible zombie process before test runs.

How did you do it?
Use pkill to kill pytest/ansible-playbook process and ssh process initiated by ansible

How did you verify/test it?
Run first test to simulate the zombie process:

./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c bgp/test_bgp_fact.py -f vtestbed.csv -i veos_vtb

When first test is running, run test command again and check whether the previous process was killed

Co-authored-by: yuxuanye <[email protected]>
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…atically (#24992)

#### Why I did it
src/sonic-utilities
```
* 367aba94 - (HEAD -> 202511, origin/202511) [mellanox] [db_migrator] add a migration for tunnel ecn mode (sonic-net#4132) (sonic-net#4167) (5 days ago) [Yakiv Huryk]
* 12601e4f - [Mellanox] Fix generate_dump sysfs copy to copy only files with permission (sonic-net#4191) (5 days ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…lly (#25405)

#### Why I did it
src/sonic-swss
```
* 13227d02 - (HEAD -> 202511, origin/202511) [countersyncd]: Add communication statistics recording and utilities (sonic-net#4222) (2 days ago) [mssonicbld]
* 3c4d3b2b - [countersyncd]: Add retry between client and otel collector (sonic-net#4220) (3 days ago) [mssonicbld]
* 77acf5a0 - [countersyncd] fix otel actor log level (sonic-net#4221) (3 days ago) [mssonicbld]
* 03ec77c7 - [countersyncd]: Add benchmark suite for countersyncd and optimize otel actor (sonic-net#4216) (5 days ago) [mssonicbld]
* 08050f2e - [hft]: Fix TAM type capability enable list (sonic-net#4215) (6 days ago) [mssonicbld]
* d0793b45 - [Fixbug]: Fix delete default HFT configuration issue (sonic-net#4138) (7 days ago) [mssonicbld]
* 246d9575 - [hft]: Enable output queue for HFT (sonic-net#4187) (7 days ago) [mssonicbld]
* ae6a9887 - [countersyncd]: Fix netlink fd leakage and deadlock issue (sonic-net#4191) (7 days ago) [mssonicbld]
* c468e1fc - [countersyncd]: Fix compiling warning of otel (sonic-net#4192) (7 days ago) [mssonicbld]
* d675062c - Enabling the FEC histogram for gbsyncd counters (sonic-net#4195) (9 days ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants