Skip to content

Fix rsyslogd memory growth in syncd swss containers over long term#25874

Merged
yxieca merged 1 commit intosonic-net:masterfrom
tirupatihemanth:rsyslogd_fix
Mar 20, 2026
Merged

Fix rsyslogd memory growth in syncd swss containers over long term#25874
yxieca merged 1 commit intosonic-net:masterfrom
tirupatihemanth:rsyslogd_fix

Conversation

@tirupatihemanth
Copy link
Contributor

@tirupatihemanth tirupatihemanth commented Mar 4, 2026

Why I did it

  1. We observed long-term rsyslogd memory growth in syncd container.
  2. Deep diagnostics (impstats) showed imuxsock.ratelimit.numratelimiters growing continuously (about ~2/min), while queue depth stayed near zero, indicating sender/PID churn rather than queue backlog.
  3. phcsync.sh runs every 60 seconds and repeatedly invokes phc_ctl for /dev/ptp* devices. These short-lived process invocations contribute to new sender identities seen by imuxsock, which correlates with ratelimiter-state growth and memory increase over time because of data structures stored by rsyslogd for ratelimiting.
Work item tracking
  • Microsoft ADO (number only):

How I did it

  • Updated phcsync.sh in SONiC to keep successful phc_ctl execution silent:
  • Use phc_ctl -q -Q ... >/dev/null 2>&1
  • Keep explicit error handling and error logs on non-zero exit.
  • Added stable logger identity in service debug helpers:
  • logger -i "$$" -- "$1" in syncd_common.sh and swss.sh. This reduces per-call sender churn during script execution phases (start/wait/stop).

syncd
Every second we currently see following log from syncd and it creates a new ratelimiter context in rsyslogd because of new PID each time

syslog.1:15477:2026 Mar  2 22:25:01.754471 sonic NOTICE syncd#phc_ctl: [561375.455] set clock time to 1772490301.754287973 or Mon Mar  2 22:25:01 2026

logger commands
before

Mar 04 03:55:44 sonic root[1775781]: Starting swss service...
Mar 04 03:55:44 sonic root[1775785]: Locking /tmp/swss-syncd-lock from swss service
Mar 04 03:55:44 sonic root[1775792]: Locked /tmp/swss-syncd-lock (10) from swss service
Mar 04 03:55:44 sonic root[1775816]: Warm boot flag: swss false.
Mar 04 03:55:44 sonic root[1775822]: Flushing APP, ASIC, COUNTER, CONFIG, and partial STATE databases ...
Mar 04 03:55:45 sonic root[1776045]: Started swss service...
Mar 04 03:55:45 sonic root[1776051]: Unlocking /tmp/swss-syncd-lock (10) from swss service

After

Mar 04 03:58:52 sonic root[1891651]: Starting swss service...
Mar 04 03:58:52 sonic root[1891651]: Locking /tmp/swss-syncd-lock from swss service
Mar 04 03:58:52 sonic root[1891651]: Locked /tmp/swss-syncd-lock (10) from swss service
Mar 04 03:58:52 sonic root[1891651]: Warm boot flag: swss false.
Mar 04 03:58:52 sonic root[1891651]: Flushing APP, ASIC, COUNTER, CONFIG, and partial STATE databases ...
Mar 04 03:58:53 sonic root[1891651]: Started swss service...
Mar 04 03:58:53 sonic root[1891651]: Unlocking /tmp/swss-syncd-lock (10) from swss service

How to verify it

  • imuxsock.ratelimit.numratelimiters in syncd should stop continuous growth (or reduce drastically).

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

@tirupatihemanth tirupatihemanth requested a review from lguohan as a code owner March 4, 2026 04:05
Copilot AI review requested due to automatic review settings March 4, 2026 04:05
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses rsyslogd memory growth in the syncd and swss containers by reducing PID churn that was causing rsyslog's imuxsock ratelimiter to accumulate entries for short-lived senders. Two strategies are applied: suppressing unnecessary output from phc_ctl in phcsync.sh, and anchoring syslog messages to a stable PID ($$) in syncd_common.sh and swss.sh.

Changes:

  • phcsync.sh now runs phc_ctl with -q -Q flags and redirects stdout to /dev/null to suppress normal output, with explicit error logging on non-zero exit.
  • syncd_common.sh and swss.sh debug() functions use logger --id=$$ to emit all messages under the parent shell's PID, preventing a new ratelimiter entry per logger invocation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
platform/mellanox/docker-syncd-mlnx/phcsync.sh Adds -q -Q flags to silence normal phc_ctl output; redirects only stdout to /dev/null, removing the previous 2>/dev/null stderr suppression
files/scripts/syncd_common.sh Adds --id=$$ to logger in the debug() function to anchor all log messages to the parent shell's PID
files/scripts/swss.sh Same --id=$$ fix as syncd_common.sh for the debug() function in the swss service script

@tirupatihemanth
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tirupatihemanth
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tirupatihemanth
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tirupatihemanth
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tirupatihemanth
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tirupatihemanth
Copy link
Contributor Author

/azpw run

1 similar comment
@tirupatihemanth
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tirupatihemanth
Copy link
Contributor Author

/azpw. run

1 similar comment
@tirupatihemanth
Copy link
Contributor Author

/azpw. run

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vmittal-msft
Copy link
Contributor

@yxieca @rlhui please help merge

Copy link
Contributor

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The PID-stable logger and quiet phc_ctl usage match the known pattern to reduce imuxsock ratelimiter growth. AI agent on behalf of Ying.

@yxieca yxieca merged commit 468cd8b into sonic-net:master Mar 20, 2026
21 checks passed
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202511: #26298

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants