Skip to content

[pull] master from Azure:master#1302

Merged
pull[bot] merged 2 commits intopphuchar:masterfrom
sonic-net:master
Jan 8, 2021
Merged

[pull] master from Azure:master#1302
pull[bot] merged 2 commits intopphuchar:masterfrom
sonic-net:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Jan 8, 2021

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

lguohan and others added 2 commits January 7, 2021 19:50
**- Why I did it**
This PR aims to monitor the running status of each container. Currently the auto-restart feature was enabled. If a critical process exited unexpected, the container will be restarted. If the container was restarted 3 times during 20 minutes, then it will not run anymore unless we cleared the flag using the command `sudo systemctl reset-failed <container_name>` manually. 

**- How I did it**
We will employ Monit to monitor a script. This script will generate the expected running container list and compare it with the current running containers. If there are containers which were expected to run but were not running, then an alerting message will be written into syslog.

**- How to verify it**
I tested this feature on a lab device `str-a7050-acs-3` which has single ASIC and `str2-n3164-acs-3` which has a Multi-ASIC. First I manually stopped a container by running the command `sudo systemctl stop <container_name>`, then I checked whether there was an alerting message in the syslog.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@pull pull bot added the ⤵️ pull label Jan 8, 2021
@pull pull bot merged commit 04cd1d6 into pphuchar:master Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants