Skip to content

[pytest] Test the feature of container checker.#2852

Merged
yozhao101 merged 7 commits intosonic-net:masterfrom
yozhao101:test_container_status
Jan 27, 2021
Merged

[pytest] Test the feature of container checker.#2852
yozhao101 merged 7 commits intosonic-net:masterfrom
yozhao101:test_container_status

Conversation

@yozhao101
Copy link
Contributor

@yozhao101 yozhao101 commented Jan 22, 2021

Description of PR

Summary:
This PR aims to test the feature of container checker and PR link is sonic-net/sonic-buildimage#6251.

Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • [x ] Test case(new/improvement)

Approach

What is the motivation for this PR?

This PR aims to test the feature of container checker and PR link of container checker is sonic-net/sonic-buildimage#6251.

The script of container_checker was run periodically by Monit and aims to monitor the running status of each container. Currently the auto-restart feature was enabled. If a critical process exited unexpected, the container will be restarted. If the container was restarted 3 times during 20 minutes, then it will not run anymore unless we cleared the flag using the command sudo systemctl reset-failed <container_name> manually.

How did you do it?

This pytest script will test the script container_checker in the following steps:

  1. Stop the containers explicitly.
  2. Check whether the names of stopped containers appear in the Monit alerting message.
  3. Restart the corresponding stopped containers.
  4. Post-check all the critical processes are running and BGP sessions are established.

How did you verify/test it?

I tested this pytest script on a virtual testbed.

Any platform specific information?

N/A

Supported testbed topology if it's a new test case?

N/A

Documentation

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@lguohan
Copy link
Contributor

lguohan commented Jan 25, 2021

can you add this to kvmtest.sh list?

@yozhao101 yozhao101 requested a review from yxieca January 25, 2021 23:43
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
alerting_message = duthost.shell("sudo cat /var/log/syslog | grep -m 1 '.*monit.*container_checker'",
module_ignore_errors=True)

pytest_assert(len(alerting_message["stdout_lines"]) > 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pytest_assert(len(alerting_message["stdout_lines"]) > 0,
pytest_assert(len(stopped_containers_list) == 0 or len(alerting_message["stdout_lines"]) > 0,

Although it's not likely to happen, I think it's better to add this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! Will update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

"Failed to get Monit alerting message from container_checker!")

for container_name in stopped_containers_list:
if container_name not in alerting_message["stdout_lines"][0]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why alerting_message["stdout_lines"][0] ? Always in the first line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At here, after the running containers except database were stopped for 6 minutes, the script container_checker would write the alerting message into the syslog. So at the line 223, I used the command sudo cat /var/log/syslog | grep -m 1 '.*monit.*container_checker' to try to get the message generated by container_checker. Since this command will only get a single alerting message from syslog, that's why at there I used alerting_message["stdout_lines"][0]. I do not know whether such method is appropriate or not? Any suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with a flexible method! Please review again!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks!

return False


def clear_failed_flag_and_restart(duthost, container_name):
Copy link
Collaborator

@bingwang-ms bingwang-ms Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same definition is found in test_container_autorestart.py. Could you help to make it a common utility in tests/common/helpers/dut_utils.py ? Thanks

Copy link
Contributor Author

@yozhao101 yozhao101 Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! Will move this function and its related functions to the file tests/common/helpers/dut_utils.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

@lguohan
Copy link
Contributor

lguohan commented Jan 26, 2021

can you fix your test failing?

tests/kvmtest.sh Outdated
platform_tests/test_cpu_memory_usage.py \
bgp/test_bgpmon.py"
bgp/test_bgpmon.py" \
container_checker/test_container_checker.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above two lines should be:

bgp/test_bgpmon.py \
container_checker/test_container_checker.py"

Otherwise, the kvmtest.sh script will try to run container_checker/test_container_checker.py and will fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed.

Args:
duthost: Host DUT.
container_name: Name of container.
should be running: Boolean value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be running -> should_be_running

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
1.Move the functions such as 'is_container_running(...)' to the
file /tests/common/helpers/dut_utils.py.

2.Fix the logic in the function "check_alerting_message(...)".

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@yozhao101
Copy link
Contributor Author

can you fix your test failing?

Fixed.

@yozhao101
Copy link
Contributor Author

yozhao101 commented Jan 27, 2021

can you add this to kvmtest.sh list?

Added.

@bingwang-ms
Copy link
Collaborator

LGTM. Let's watch test results in regression test.

@yozhao101 yozhao101 merged commit e8bfd88 into sonic-net:master Jan 27, 2021
@yozhao101 yozhao101 deleted the test_container_status branch January 27, 2021 16:48
lguohan added a commit that referenced this pull request Jan 28, 2021
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…AD (sonic-net#15532)

#### Why I did it
Update submodule sonic-utilities to the latest HEAD
```
50296b90 - [202012][dhcp-relay] Fix dhcp6relay counter issue (sonic-net#2866) (sonic-net#2873) (Fri Jun 30 18:08:53 2023 +0800) <Yaqiang Zhu>
160030c4 - [202012][dhcp_relay] Add "Reconfigure", "Information-Request", "Malformed" counter options (sonic-net#2844) (Sun Jun 11 20:35:30 2023 -0700) <kellyyeh>
e6289ced - [vlan][dhcp_relay] Clear dhcpv6 relay counter while deleting vlan (sonic-net#2852) (Fri Jun 2 18:20:21 2023 +0000) <Yaqiang Zhu>
885082ec - correctly parsing complete ipv6 vnet info (sonic-net#2827) (Thu May 25 06:01:40 2023 +0000) <Keith Lu>
```

##### Work item tracking
- Microsoft ADO 22635770
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…atically (sonic-net#15320)

src/sonic-utilities

* ec472146 - (HEAD -> 202205, origin/202205) fix show interface neighbor expected empty issue (sonic-net#2465) (3 minutes ago) [jcaiMR]
* d1f4413c - [vlan][dhcp_relay] Clear dhcpv6 relay counter while deleting vlan (sonic-net#2852) (6 hours ago) [Yaqiang Zhu]
* 051f28ce - [db-migrator] Fix hwsku match for 6100 and add errors when hwsku is None (sonic-net#2821) (7 hours ago) [Vaibhav Hemant Dixit]
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…atically (sonic-net#15422)

#### Why I did it
src/sonic-utilities
```
* 1246bc81 - (HEAD -> 202211, origin/202211) [config reload]Config Reload Enhancement (sonic-net#2693) (sonic-net#2863) (2 days ago) [Sudharsan Dhamal Gopalarathnam]
* d69aae4d - [vlan][dhcp_relay] Clear dhcpv6 relay counter while deleting vlan (sonic-net#2852) (2 days ago) [Yaqiang Zhu]
* 0f6bf8ac - [config]: Dynamically start and stop ndppd (sonic-net#2814) (2 days ago) [Lawrence Lee]
* 48a63ff1 - Fix issue: out of range sflow polling interval is accepted and stored in config_db (sonic-net#2847) (2 days ago) [Junchao-Mellanox]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants