Skip to content

[Sys Health] Fix the service entry delete in state_db because of timer job#45

Closed
vivekrnv wants to merge 1 commit intomasterfrom
fix_sysmon_timer
Closed

[Sys Health] Fix the service entry delete in state_db because of timer job#45
vivekrnv wants to merge 1 commit intomasterfrom
fix_sysmon_timer

Conversation

@vivekrnv
Copy link
Owner

@vivekrnv vivekrnv commented Apr 17, 2023

Why I did it

systemd stop event on service with timers can sometime delete the state_db entry for the corresponding service.

Note: This won't be observed on the master label since the dependency on timer was removed with the recent config reload enhancement. However, it is better to have the fix since there might be some systemd services added to system health daemon in the future which may contain timers

root@qa-eth-vt01-4-3700c:/home/admin# systemctl stop snmp
root@qa-eth-vt01-4-3700c:/home/admin# show system-health sysready-status 
System is not ready - one or more services are not up

Service-Name            Service-Status    App-Ready-Status    Down-Reason
----------------------  ----------------  ------------------  -------------
<Truncated>
ssh                     OK                OK                  -
swss                    OK                OK                  -
syncd                   OK                OK                  -
sysstat                 OK                OK                  -
teamd                   OK                OK                  -
telemetry               OK                OK                  -
what-just-happened      OK                OK                  -
ztp                     OK                OK                  -
<Truncated>

Expected

Should see a Down entry for SNMP instead of the entry being deleted from the STATE_DB

root@qa-eth-vt01-4-3700c:/home/admin# show system-health sysready-status 
System is not ready - one or more services are not up

Service-Name            Service-Status    App-Ready-Status    Down-Reason
----------------------  ----------------  ------------------  -------------
<Truncated>
snmp                    Down              Down                Inactive
ssh                     OK                OK                  -
swss                    OK                OK                  -
syncd                   OK                OK                  -
sysstat                 OK                OK                  -
teamd                   OK                OK                  -
telemetry               OK                OK                  -
what-just-happened      OK                OK                  -
ztp                     OK                OK                  -
<Truncated>

How I did it

Happens because the timer is usually a PartOf service and thus a stop on service is propagated to timer. Fixed the logic to handle this

Apr 18 02:06:47.711252 r-lionfish-16 DEBUG healthd: Main process- received event:snmp.service from source:sysbus time:2023-04-17 23:06:47
Apr 18 02:06:47.711347 r-lionfish-16 INFO healthd: check_unit_status for [ snmp.service ] 
Apr 18 02:06:47.722363 r-lionfish-16 INFO healthd: snmp.service service state changed to [inactive/dead]

Apr 18 02:06:47.723230 r-lionfish-16 DEBUG healthd: Main process- received event:snmp.timer from source:sysbus time:2023-04-17 23:06:47
Apr 18 02:06:47.723328 r-lionfish-16 INFO healthd: check_unit_status for [ snmp.timer ] 

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@vivekrnv vivekrnv changed the title [Sys Mon] Fix the service entry delete in state_db because of timer job [Sys Health] Fix the service entry delete in state_db because of timer job Apr 17, 2023
@vivekrnv
Copy link
Owner Author

@Junchao-Mellanox, Can you review this?

@vivekrnv vivekrnv closed this Apr 18, 2023
vivekrnv pushed a commit that referenced this pull request Jan 12, 2024
…ly (sonic-net#17572)

#### Why I did it
src/dhcprelay
```
* 5ae186f - (HEAD -> master, origin/master, origin/HEAD) [counter] Clear counter table when init (#45) (10 hours ago) [Yaqiang Zhu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
vivekrnv pushed a commit that referenced this pull request Apr 2, 2024
5ae186f Yaqiang Zhu Tue Dec 19 12:05:15 2023 -0500 [counter] Clear counter table when init (#45)
vivekrnv pushed a commit that referenced this pull request Mar 24, 2025
… automatically (sonic-net#760)

#### Why I did it
src/sonic-platform-common
```
* 047e12b - (HEAD -> 202412, origin/202412) [code sync] Merge code from sonic-net/sonic-platform-common:202411 to 202412 (#45) (21 hours ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
vivekrnv pushed a commit that referenced this pull request Aug 19, 2025
…sonic-net#23654)

#### Why I did it
src/dhcpmon
```
* 1cb6ced - (HEAD -> master, origin/master, origin/HEAD) Update clear_counter_timeout to fix clear counter issue (#45) (19 hours ago) [Yaqiang Zhu]
* 848304e - [build] Update to use libyang3 (#46) (4 days ago) [Yaqiang Zhu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants