Skip to content

[action] [PR:25874] Fix rsyslogd memory growth in syncd swss containers over long term#26298

Merged
mssonicbld merged 1 commit intosonic-net:202511from
mssonicbld:cherry/202511/25874
Mar 21, 2026
Merged

[action] [PR:25874] Fix rsyslogd memory growth in syncd swss containers over long term#26298
mssonicbld merged 1 commit intosonic-net:202511from
mssonicbld:cherry/202511/25874

Conversation

@mssonicbld
Copy link
Collaborator

Why I did it

  1. We observed long-term rsyslogd memory growth in syncd container.
  2. Deep diagnostics (impstats) showed imuxsock.ratelimit.numratelimiters growing continuously (about ~2/min), while queue depth stayed near zero, indicating sender/PID churn rather than queue backlog.
  3. phcsync.sh runs every 60 seconds and repeatedly invokes phc_ctl for /dev/ptp* devices. These short-lived process invocations contribute to new sender identities seen by imuxsock, which correlates with ratelimiter-state growth and memory increase over time because of data structures stored by rsyslogd for ratelimiting.
Work item tracking
  • Microsoft ADO (number only):

How I did it

  • Updated phcsync.sh in SONiC to keep successful phc_ctl execution silent:
  • Use phc_ctl -q -Q ... >/dev/null 2>&1
  • Keep explicit error handling and error logs on non-zero exit.
  • Added stable logger identity in service debug helpers:
  • logger -i "$$" -- "$1" in syncd_common.sh and swss.sh. This reduces per-call sender churn during script execution phases (start/wait/stop).

syncd
Every second we currently see following log from syncd and it creates a new ratelimiter context in rsyslogd because of new PID each time

syslog.1:15477:2026 Mar  2 22:25:01.754471 sonic NOTICE syncd#phc_ctl: [561375.455] set clock time to 1772490301.754287973 or Mon Mar  2 22:25:01 2026

logger commands
before

Mar 04 03:55:44 sonic root[1775781]: Starting swss service...
Mar 04 03:55:44 sonic root[1775785]: Locking /tmp/swss-syncd-lock from swss service
Mar 04 03:55:44 sonic root[1775792]: Locked /tmp/swss-syncd-lock (10) from swss service
Mar 04 03:55:44 sonic root[1775816]: Warm boot flag: swss false.
Mar 04 03:55:44 sonic root[1775822]: Flushing APP, ASIC, COUNTER, CONFIG, and partial STATE databases ...
Mar 04 03:55:45 sonic root[1776045]: Started swss service...
Mar 04 03:55:45 sonic root[1776051]: Unlocking /tmp/swss-syncd-lock (10) from swss service

After

Mar 04 03:58:52 sonic root[1891651]: Starting swss service...
Mar 04 03:58:52 sonic root[1891651]: Locking /tmp/swss-syncd-lock from swss service
Mar 04 03:58:52 sonic root[1891651]: Locked /tmp/swss-syncd-lock (10) from swss service
Mar 04 03:58:52 sonic root[1891651]: Warm boot flag: swss false.
Mar 04 03:58:52 sonic root[1891651]: Flushing APP, ASIC, COUNTER, CONFIG, and partial STATE databases ...
Mar 04 03:58:53 sonic root[1891651]: Started swss service...
Mar 04 03:58:53 sonic root[1891651]: Unlocking /tmp/swss-syncd-lock (10) from swss service

How to verify it

  • imuxsock.ratelimit.numratelimiters in syncd should stop continuous growth (or reduce drastically).

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Signed-off-by: Sonic Build Admin [email protected]

<!--
     Please make sure you've read and understood our contributing guidelines:
     https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

     ** Make sure all your commits include a signature generated with `git commit -s` **

     If this is a bug fix, make sure your description includes "fixes #xxxx", or
     "closes #xxxx" or "resolves #xxxx"

     Please provide the following information:
-->

#### Why I did it

1. We observed long-term rsyslogd memory growth in syncd container.
2. Deep diagnostics (impstats) showed imuxsock.ratelimit.numratelimiters growing continuously (about ~2/min), while queue depth stayed near zero, indicating sender/PID churn rather than queue backlog.
3. phcsync.sh runs every 60 seconds and repeatedly invokes phc_ctl for /dev/ptp* devices. These short-lived process invocations contribute to new sender identities seen by imuxsock, which correlates with ratelimiter-state growth and memory increase over time because of data structures stored by rsyslogd for ratelimiting.

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it

- Updated phcsync.sh in SONiC to keep successful phc_ctl execution silent:
- Use `phc_ctl -q -Q ... >/dev/null 2>&1`
- Keep explicit error handling and error logs on non-zero exit.
- Added stable logger identity in service debug helpers:
- `logger -i "$$" -- "$1"` in syncd_common.sh and swss.sh. This reduces per-call sender churn during script execution phases (start/wait/stop).

syncd
Every second we currently see following log from syncd and it creates a new ratelimiter context in rsyslogd because of new PID each time
```
syslog.1:15477:2026 Mar  2 22:25:01.754471 sonic NOTICE syncd#phc_ctl: [561375.455] set clock time to 1772490301.754287973 or Mon Mar  2 22:25:01 2026
```

logger commands
before
```
Mar 04 03:55:44 sonic root[1775781]: Starting swss service...
Mar 04 03:55:44 sonic root[1775785]: Locking /tmp/swss-syncd-lock from swss service
Mar 04 03:55:44 sonic root[1775792]: Locked /tmp/swss-syncd-lock (10) from swss service
Mar 04 03:55:44 sonic root[1775816]: Warm boot flag: swss false.
Mar 04 03:55:44 sonic root[1775822]: Flushing APP, ASIC, COUNTER, CONFIG, and partial STATE databases ...
Mar 04 03:55:45 sonic root[1776045]: Started swss service...
Mar 04 03:55:45 sonic root[1776051]: Unlocking /tmp/swss-syncd-lock (10) from swss service
```

After
```
Mar 04 03:58:52 sonic root[1891651]: Starting swss service...
Mar 04 03:58:52 sonic root[1891651]: Locking /tmp/swss-syncd-lock from swss service
Mar 04 03:58:52 sonic root[1891651]: Locked /tmp/swss-syncd-lock (10) from swss service
Mar 04 03:58:52 sonic root[1891651]: Warm boot flag: swss false.
Mar 04 03:58:52 sonic root[1891651]: Flushing APP, ASIC, COUNTER, CONFIG, and partial STATE databases ...
Mar 04 03:58:53 sonic root[1891651]: Started swss service...
Mar 04 03:58:53 sonic root[1891651]: Unlocking /tmp/swss-syncd-lock (10) from swss service
```
#### How to verify it
- imuxsock.ratelimit.numratelimiters in syncd should stop continuous growth (or reduce drastically).

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [X] 202511

Signed-off-by: Sonic Build Admin <[email protected]>
@mssonicbld
Copy link
Collaborator Author

Original PR: #25874

@mssonicbld
Copy link
Collaborator Author

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 9ca8407 into sonic-net:202511 Mar 21, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant