Skip to content

[Monit] Monitoring the running status of containers.#6251

Merged
lguohan merged 18 commits intosonic-net:masterfrom
yozhao101:monitoring_containers
Jan 8, 2021
Merged

[Monit] Monitoring the running status of containers.#6251
lguohan merged 18 commits intosonic-net:masterfrom
yozhao101:monitoring_containers

Conversation

@yozhao101
Copy link
Contributor

- Why I did it
This PR aims to monitor the running status of each container. Currently the auto-restart feature was enabled. If a critical process exited unexpected, the container will be restarted. If the container was restarted 3 times during 20 minutes, then it will not run anymore unless we cleared the flag using the command sudo systemctl reset-failed <container_name> manually.

- How I did it
We will employ Monit to monitor a script. This script will generate the expected running container list and compare it with the current running containers. If there are containers which were expected to run but were not running, then an alerting message will be written into syslog.

- How to verify it
I tested this feature on a lab device str-a7050-acs-3 which has single ASIC and str2-n3164-acs-3 which has a Multi-ASIC. First I manually stopped a container by running the command sudo systemctl stop <container_name>, then I checked whether there was an alerting message in the syslog.

- Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • [ x] 202006

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants