[system-health] No longer check critical process/service status via monit#9068
Conversation
|
This pull request introduces 1 alert when merging 5a8d671 into b0c73d9 - view on LGTM.com new alerts:
|
|
/azpw run |
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
32fbcdf to
a218d1b
Compare
|
Add a link of your design doc to the PR description. |
84fbf7b to
f8982b6
Compare
|
Hi @qiluo-msft , I have fixed all review comment, could you please review and sign-off? |
|
Hi @qiluo-msft , could you please review and sign off? |
|
/azpw run Azure.sonic-buildimage |
|
/AzurePipelines run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
06c8b23 to
042fd95
Compare
|
/azpw run Azure.sonic-buildimage |
|
/AzurePipelines run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Could you rebase to latest master? |
042fd95 to
e799843
Compare
|
No clean cherry-pick for 202106/202012, need separate PRs. |
…tus via monit (#9367) Backport #9068 to 202012 #### Why I did it Command `monit summary -B` can no longer display the status for each critical process, system-health should not depend on it and need find a way to monitor the status of critical processes. The PR is to address that. monit is still used by system-health to do file system check as well as customize check. #### How I did it 1. Get container names from FEATURE table 2. For each container, collect critical process names from file critical_processes 3. Use “docker exec -it <container_name> bash -c ‘supervisorctl status’” to get processes status inside container, parse the output and check if any critical processes exit #### How to verify it 1. Add unit test case to cover it 2. Adjust sonic-mgmt cases to cover it 3. Manual test
Can you submit separate PRs against 202012/202106 please? |
|
Hi @yozhao101, I suppose they are already in 202012 and 202106. Please check #9366 and #9367 |
HLD updated here: sonic-net/SONiC#887
Why I did it
Command
monit summary -Bcan no longer display the status for each critical process, system-health should not depend on it and need find a way to monitor the status of critical processes. The PR is to address that. monit is still used by system-health to do file system check as well as customize check.How I did it
How to verify it
Which release branch to backport (provide reason below if selected)
Description for the changelog
A picture of a cute animal (not mandatory but encouraged)