[action] [PR:25874] Fix rsyslogd memory growth in syncd swss containers over long term#26298
Merged
mssonicbld merged 1 commit intosonic-net:202511from Mar 21, 2026
Merged
Conversation
<!--
Please make sure you've read and understood our contributing guidelines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md
** Make sure all your commits include a signature generated with `git commit -s` **
If this is a bug fix, make sure your description includes "fixes #xxxx", or
"closes #xxxx" or "resolves #xxxx"
Please provide the following information:
-->
#### Why I did it
1. We observed long-term rsyslogd memory growth in syncd container.
2. Deep diagnostics (impstats) showed imuxsock.ratelimit.numratelimiters growing continuously (about ~2/min), while queue depth stayed near zero, indicating sender/PID churn rather than queue backlog.
3. phcsync.sh runs every 60 seconds and repeatedly invokes phc_ctl for /dev/ptp* devices. These short-lived process invocations contribute to new sender identities seen by imuxsock, which correlates with ratelimiter-state growth and memory increase over time because of data structures stored by rsyslogd for ratelimiting.
##### Work item tracking
- Microsoft ADO **(number only)**:
#### How I did it
- Updated phcsync.sh in SONiC to keep successful phc_ctl execution silent:
- Use `phc_ctl -q -Q ... >/dev/null 2>&1`
- Keep explicit error handling and error logs on non-zero exit.
- Added stable logger identity in service debug helpers:
- `logger -i "$$" -- "$1"` in syncd_common.sh and swss.sh. This reduces per-call sender churn during script execution phases (start/wait/stop).
syncd
Every second we currently see following log from syncd and it creates a new ratelimiter context in rsyslogd because of new PID each time
```
syslog.1:15477:2026 Mar 2 22:25:01.754471 sonic NOTICE syncd#phc_ctl: [561375.455] set clock time to 1772490301.754287973 or Mon Mar 2 22:25:01 2026
```
logger commands
before
```
Mar 04 03:55:44 sonic root[1775781]: Starting swss service...
Mar 04 03:55:44 sonic root[1775785]: Locking /tmp/swss-syncd-lock from swss service
Mar 04 03:55:44 sonic root[1775792]: Locked /tmp/swss-syncd-lock (10) from swss service
Mar 04 03:55:44 sonic root[1775816]: Warm boot flag: swss false.
Mar 04 03:55:44 sonic root[1775822]: Flushing APP, ASIC, COUNTER, CONFIG, and partial STATE databases ...
Mar 04 03:55:45 sonic root[1776045]: Started swss service...
Mar 04 03:55:45 sonic root[1776051]: Unlocking /tmp/swss-syncd-lock (10) from swss service
```
After
```
Mar 04 03:58:52 sonic root[1891651]: Starting swss service...
Mar 04 03:58:52 sonic root[1891651]: Locking /tmp/swss-syncd-lock from swss service
Mar 04 03:58:52 sonic root[1891651]: Locked /tmp/swss-syncd-lock (10) from swss service
Mar 04 03:58:52 sonic root[1891651]: Warm boot flag: swss false.
Mar 04 03:58:52 sonic root[1891651]: Flushing APP, ASIC, COUNTER, CONFIG, and partial STATE databases ...
Mar 04 03:58:53 sonic root[1891651]: Started swss service...
Mar 04 03:58:53 sonic root[1891651]: Unlocking /tmp/swss-syncd-lock (10) from swss service
```
#### How to verify it
- imuxsock.ratelimit.numratelimiters in syncd should stop continuous growth (or reduce drastically).
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [X] 202511
Signed-off-by: Sonic Build Admin <[email protected]>
Collaborator
Author
|
Original PR: #25874 |
6 tasks
Collaborator
Author
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why I did it
Work item tracking
How I did it
phc_ctl -q -Q ... >/dev/null 2>&1logger -i "$$" -- "$1"in syncd_common.sh and swss.sh. This reduces per-call sender churn during script execution phases (start/wait/stop).syncd
Every second we currently see following log from syncd and it creates a new ratelimiter context in rsyslogd because of new PID each time
logger commands
before
After
How to verify it
Which release branch to backport (provide reason below if selected)
Signed-off-by: Sonic Build Admin [email protected]