Skip to content

[system-health] [202106] No longer check critical process/service status via monit#108

Closed
Junchao-Mellanox wants to merge 1 commit into202106from
system-health-202106
Closed

[system-health] [202106] No longer check critical process/service status via monit#108
Junchao-Mellanox wants to merge 1 commit into202106from
system-health-202106

Conversation

@Junchao-Mellanox
Copy link
Copy Markdown
Owner

Why I did it

Command monit summary -B can no longer display the status for each critical process, system-health should not depend on it and need find a way to monitor the status of critical processes. The PR is to address that. monit is still used by system-health to do file system check as well as customize check.

How I did it

  1. Get container names from FEATURE table
  2. For each container, collect critical process names from file critical_processes
  3. Use “docker exec -it <container_name> bash -c ‘supervisorctl status’” to get processes status inside container, parse the output and check if any critical processes exit

How to verify it

  1. Add unit test case to cover it
  2. Adjust sonic-mgmt cases to cover it
  3. Manual test

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

@Junchao-Mellanox Junchao-Mellanox deleted the system-health-202106 branch December 2, 2021 02:52
Junchao-Mellanox pushed a commit that referenced this pull request Aug 25, 2022
[master][sonic-linkmgrd] submodule update

4bf8b3d Jing Zhang Fri Aug 12 12:07:40 2022 -0700 wait for handler to be completed (#114)
cf849a0 Longxiang Lyu Fri Aug 12 17:21:43 2022 +0800 Use table to toggle peer forwarding state (#108)
d4540ba Jing Zhang Thu Aug 11 16:08:03 2022 -0700 Adjust DbInterfaceRaceConditionCheck to Wait Longer for Handlers to be executed (#111)
d5c47b3 Jing Zhang Thu Aug 11 15:31:22 2022 -0700 [lgtm]: add uuid-dev to lgtm prepare (#112)
f4bb5d5 Jing Zhang Thu Aug 11 10:03:05 2022 -0700 Backoff mux probing for server down scenario (#106)
3f7a6f2 Jing Zhang Tue Aug 9 10:42:51 2022 -0700 Fix race condition caused by strand wrap method (#104)
4cff43f Jing Zhang Mon Aug 8 10:36:18 2022 -0700 [Active-Standby]Remove unnecessary handleMuxWaitTimeout logs (#100)
3b22533 Jing Zhang Tue Aug 2 13:18:01 2022 -0700 [active-active] Update unhealthy label definition (#102)

sign-off: Jing Zhang zhangjing@microsoft.com
Junchao-Mellanox pushed a commit that referenced this pull request Aug 25, 2022
…submodule head (sonic-net#11761)

linkmgrd:
* 476f85e 2022-08-17 | Update linkmgr health after getting default route update (#117) (HEAD -> 202205, github/202205) [Longxiang Lyu]
* fc589e9 2022-08-17 | Use `table` to toggle peer forwarding state (#108) (#120) [Longxiang Lyu]
* bcb5a56 2022-08-17 | Fix azure pipeline (#118) (#121) [Longxiang Lyu]

swss:
* ef3a601 2022-08-17 | [muxorch] Returning true if nbr in skip_neighbor_ in isNeighborActive() (sonic-net#2415) (HEAD -> 202205) [Nikola Dancejic]

sairedis:
* aed01cd 2022-08-12 | Fix: missing sonic-db-cli in docker-sonic-vs image (sonic-net#1072) (sonic-net#1104) (github/202205) [Hua Liu]

platform-daemon:
* 5a68073 2022-08-01 | Xcvrd changes to support 400G ZR configuration (sonic-net#270) (HEAD -> 202205) [Prince George]

swsssdk:
* ca785a2 2022-06-01 | Remove sonic-db-cli (#122) (HEAD -> 202205, origin/202205) [Hua Liu]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Junchao-Mellanox pushed a commit that referenced this pull request Jun 3, 2024
…lly (sonic-net#19038)

#### Why I did it
src/sonic-gnmi
```
* 585f441 - (HEAD -> master, origin/master, origin/HEAD) Add SaveOnSet (#108) (28 hours ago) [Ryan Lucus]
* 81174c0 - Fix full config update (#240) (2 days ago) [ganglv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Sep 15, 2025
…ically (sonic-net#23947)

#### Why I did it
src/sonic-dash-ha
```
* d14d54b - (HEAD -> master, origin/master, origin/HEAD) Implement cleanup logic for all the actors (#102) (10 hours ago) [yue-fred-gao]
* 4e3706a - Fix ha state. (#107) (20 hours ago) [dypet]
* 58dbc27 - fix show hamgrd actor command (#108) (20 hours ago) [yue-fred-gao]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants