Add missing [Install] section to container service templates#25932
Add missing [Install] section to container service templates#25932StormLiangMS wants to merge 1 commit intosonic-net:masterfrom
Conversation
After the systemd-sonic-generator rework (PR sonic-net#23340), the generator only creates sonic.target.wants/ symlinks for services that have an explicit [Install] section with WantedBy=. Container services (pmon, lldp, gnmi, snmp, telemetry, otel, sflow, bmp, mgmt-framework) use BindsTo=sonic.target but lacked an [Install] section, so the generator skipped creating symlinks for them. This caused two problems: 1. _reset_failed_services() iterates sonic.target dependencies and never resets rate limits for these services, causing start-limit-hit after multiple config reloads. 2. featured checks unit_file_state == 'enabled' but these services report 'static', causing redundant start attempts on every reload. Fix by adding [Install] WantedBy=sonic.target to all affected service templates, consistent with other container services like dhcp_relay, swss, syncd, and teamd that already have this section. Fixes: sonic-net#25931 Signed-off-by: Storm Liang <[email protected]>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR adds the missing [Install] section with WantedBy=sonic.target to 9 container service templates that were affected by the systemd-sonic-generator rework (PR #23340, cherry-picked to 202511 in PR #24988). Without this section, the generator couldn't create sonic.target.wants/ symlinks for these services, leading to start-limit-hit failures after multiple config reloads (issue #25931).
Changes:
- Added
[Install]section withWantedBy=sonic.targetto 9 container service templates (pmon, gnmi, snmp, telemetry, otel, sflow, mgmt-framework, lldp, bmp) to match the pattern already used by other container services like dhcp_relay, swss, syncd, and teamd.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| files/build_templates/pmon.service.j2 | Add [Install] section with WantedBy=sonic.target |
| files/build_templates/gnmi.service.j2 | Add [Install] section with WantedBy=sonic.target |
| files/build_templates/snmp.service.j2 | Add [Install] section with WantedBy=sonic.target |
| files/build_templates/telemetry.service.j2 | Add [Install] section with WantedBy=sonic.target |
| files/build_templates/otel.service.j2 | Add [Install] section with WantedBy=sonic.target |
| files/build_templates/sflow.service.j2 | Add [Install] section with WantedBy=sonic.target |
| files/build_templates/mgmt-framework.service.j2 | Add [Install] section with WantedBy=sonic.target |
| files/build_templates/per_namespace/lldp.service.j2 | Add [Install] section with WantedBy=sonic.target |
| files/build_templates/per_namespace/bmp.service.j2 | Add [Install] section with WantedBy=sonic.target |
Cross-reference: Related fix in sonic-utilitiesRelated PR: sonic-net/sonic-utilities#4314 by @stephenxs fixes the same symptom from the sonic-utilities side by adding Root cause analysisAfter deeper investigation, we found the issue has three contributing factors:
Why 202505 doesn't have this issueOn 202505, How the two PRs complement each other
Both PRs are valid fixes. This PR is the more complete fix (addresses all 3 factors), while sonic-utilities#4314 provides defense-in-depth for the |
Skip test_load_minigraph_with_golden_config when issue #25931 is open. This test performs 4 consecutive config reloads which causes pmon to hit start-limit-hit due to missing sonic.target.wants/ symlinks after the systemd-sonic-generator rework (sonic-net/sonic-buildimage#23340). The test leaves pmon in a bad state (start-limit-hit), which can affect subsequent tests in the nightly run. Fix PRs: - sonic-net/sonic-buildimage#25932 (add [Install] to service templates) - sonic-net/sonic-utilities#4314 (fix _reset_failed_services) The skip will auto-resolve when sonic-net/sonic-buildimage#25931 is closed. Signed-off-by: Storm Liang <[email protected]>
| RestartSec=30 | ||
|
|
||
| [Install] | ||
| WantedBy=sonic.target |
There was a problem hiding this comment.
Some of the services that don't have WantedBy=sonic.target appear to be intentionally delayed after port init (i.e. start after everything else has started). This might break this.
/var/log/syslog.2.gz:2026 Mar 5 17:27:43.300869 vlab-01 INFO featured: Feature is gnmi delayed for port init
/var/log/syslog.2.gz:2026 Mar 5 17:27:43.951951 vlab-01 INFO featured: Feature is lldp delayed for port init
/var/log/syslog.2.gz:2026 Mar 5 17:27:45.212249 vlab-01 INFO featured: Feature is mgmt-framework delayed for port init
/var/log/syslog.2.gz:2026 Mar 5 17:27:47.239979 vlab-01 INFO featured: Feature is pmon delayed for port init
/var/log/syslog.2.gz:2026 Mar 5 17:27:48.117732 vlab-01 INFO featured: Feature is sflow delayed for port init
/var/log/syslog.2.gz:2026 Mar 5 17:27:48.526676 vlab-01 INFO featured: Feature is snmp delayed for port init
/var/log/syslog.2.gz:2026 Mar 5 17:27:50.220405 vlab-01 INFO featured: Updating delayed features after port initializatio
Both of these match 202505 behavior, and are not new for 202511. |
|
Thanks @saiarcot895 for the review — both points are valid. On the 202505 behavior being the sameYou're correct. After deeper investigation we confirmed that on 202505:
The actual differentiator between 202505 and 202511 is in \eatured\'s \�nable_feature()\:
On the port-init delay concernGood catch — this is a valid concern. These services are intentionally delayed by \eatured\ until \PortInitDone\. Adding \WantedBy=sonic.target\ could cause systemd to start them immediately as part of \sonic.target\, bypassing the port-init delay logic. Given these points, I think the safer approach is:
I'll close this PR in favor of the sonic-utilities approach. Thanks for the thorough review! |
Skip test_load_minigraph_with_golden_config when issue #25931 is open. This test performs 4 consecutive config reloads which causes pmon to hit start-limit-hit due to missing sonic.target.wants/ symlinks after the systemd-sonic-generator rework (sonic-net/sonic-buildimage#23340). The test leaves pmon in a bad state (start-limit-hit), which can affect subsequent tests in the nightly run. Fix PRs: - sonic-net/sonic-buildimage#25932 (add [Install] to service templates) - sonic-net/sonic-utilities#4314 (fix _reset_failed_services) The skip will auto-resolve when sonic-net/sonic-buildimage#25931 is closed. Signed-off-by: Storm Liang <[email protected]>
…c-net#22775) Skip test_load_minigraph_with_golden_config when issue #25931 is open. This test performs 4 consecutive config reloads which causes pmon to hit start-limit-hit due to missing sonic.target.wants/ symlinks after the systemd-sonic-generator rework (sonic-net/sonic-buildimage#23340). The test leaves pmon in a bad state (start-limit-hit), which can affect subsequent tests in the nightly run. Fix PRs: - sonic-net/sonic-buildimage#25932 (add [Install] to service templates) - sonic-net/sonic-utilities#4314 (fix _reset_failed_services) The skip will auto-resolve when sonic-net/sonic-buildimage#25931 is closed. Signed-off-by: Storm Liang <[email protected]> Signed-off-by: mssonicbld <[email protected]>
Cherry-pick of PR sonic-net#22775 to 202511, rebased on latest 202511 to include the duplicate key fix from PR sonic-net#22796. The test performs multiple config reloads that cause pmon start-limit-hit due to missing sonic.target symlinks after systemd-sonic-generator rework. Fix PRs: sonic-net/sonic-buildimage#25932, sonic-net/sonic-utilities#4314 Tracking issue: sonic-net/sonic-buildimage#25931 Signed-off-by: Storm Liang <[email protected]> Co-authored-by: Copilot <[email protected]>
…fig (#22830) Cherry-pick of PR #22775 to 202511, rebased on latest 202511 to include the duplicate key fix from PR #22796. The test performs multiple config reloads that cause pmon start-limit-hit due to missing sonic.target symlinks after systemd-sonic-generator rework. Fix PRs: sonic-net/sonic-buildimage#25932, sonic-net/sonic-utilities#4314 Tracking issue: sonic-net/sonic-buildimage#25931 Co-authored-by: Copilot <[email protected]>
…c-net#22775) Skip test_load_minigraph_with_golden_config when issue #25931 is open. This test performs 4 consecutive config reloads which causes pmon to hit start-limit-hit due to missing sonic.target.wants/ symlinks after the systemd-sonic-generator rework (sonic-net/sonic-buildimage#23340). The test leaves pmon in a bad state (start-limit-hit), which can affect subsequent tests in the nightly run. Fix PRs: - sonic-net/sonic-buildimage#25932 (add [Install] to service templates) - sonic-net/sonic-utilities#4314 (fix _reset_failed_services) The skip will auto-resolve when sonic-net/sonic-buildimage#25931 is closed. Signed-off-by: Storm Liang <[email protected]>
…c-net#22775) Skip test_load_minigraph_with_golden_config when issue #25931 is open. This test performs 4 consecutive config reloads which causes pmon to hit start-limit-hit due to missing sonic.target.wants/ symlinks after the systemd-sonic-generator rework (sonic-net/sonic-buildimage#23340). The test leaves pmon in a bad state (start-limit-hit), which can affect subsequent tests in the nightly run. Fix PRs: - sonic-net/sonic-buildimage#25932 (add [Install] to service templates) - sonic-net/sonic-utilities#4314 (fix _reset_failed_services) The skip will auto-resolve when sonic-net/sonic-buildimage#25931 is closed. Signed-off-by: Storm Liang <[email protected]>
…c-net#22775) Skip test_load_minigraph_with_golden_config when issue #25931 is open. This test performs 4 consecutive config reloads which causes pmon to hit start-limit-hit due to missing sonic.target.wants/ symlinks after the systemd-sonic-generator rework (sonic-net/sonic-buildimage#23340). The test leaves pmon in a bad state (start-limit-hit), which can affect subsequent tests in the nightly run. Fix PRs: - sonic-net/sonic-buildimage#25932 (add [Install] to service templates) - sonic-net/sonic-utilities#4314 (fix _reset_failed_services) The skip will auto-resolve when sonic-net/sonic-buildimage#25931 is closed. Signed-off-by: Storm Liang <[email protected]> Signed-off-by: Mihut Aronovici <[email protected]>
…c-net#22775) Skip test_load_minigraph_with_golden_config when issue #25931 is open. This test performs 4 consecutive config reloads which causes pmon to hit start-limit-hit due to missing sonic.target.wants/ symlinks after the systemd-sonic-generator rework (sonic-net/sonic-buildimage#23340). The test leaves pmon in a bad state (start-limit-hit), which can affect subsequent tests in the nightly run. Fix PRs: - sonic-net/sonic-buildimage#25932 (add [Install] to service templates) - sonic-net/sonic-utilities#4314 (fix _reset_failed_services) The skip will auto-resolve when sonic-net/sonic-buildimage#25931 is closed. Signed-off-by: Storm Liang <[email protected]> Signed-off-by: selldinesh <[email protected]>
…c-net#22775) Skip test_load_minigraph_with_golden_config when issue #25931 is open. This test performs 4 consecutive config reloads which causes pmon to hit start-limit-hit due to missing sonic.target.wants/ symlinks after the systemd-sonic-generator rework (sonic-net/sonic-buildimage#23340). The test leaves pmon in a bad state (start-limit-hit), which can affect subsequent tests in the nightly run. Fix PRs: - sonic-net/sonic-buildimage#25932 (add [Install] to service templates) - sonic-net/sonic-utilities#4314 (fix _reset_failed_services) The skip will auto-resolve when sonic-net/sonic-buildimage#25931 is closed. Signed-off-by: Storm Liang <[email protected]> Signed-off-by: Abhishek <[email protected]>
…c-net#22775) Skip test_load_minigraph_with_golden_config when issue #25931 is open. This test performs 4 consecutive config reloads which causes pmon to hit start-limit-hit due to missing sonic.target.wants/ symlinks after the systemd-sonic-generator rework (sonic-net/sonic-buildimage#23340). The test leaves pmon in a bad state (start-limit-hit), which can affect subsequent tests in the nightly run. Fix PRs: - sonic-net/sonic-buildimage#25932 (add [Install] to service templates) - sonic-net/sonic-utilities#4314 (fix _reset_failed_services) The skip will auto-resolve when sonic-net/sonic-buildimage#25931 is closed. Signed-off-by: Storm Liang <[email protected]> Signed-off-by: Venkata Gouri Rajesh Etla <[email protected]>
Why I did it
After the systemd-sonic-generator rework (PR #23340), the generator only creates
sonic.target.wants/symlinks for services that have an explicit[Install]section withWantedBy=. Nine container services (pmon, lldp, gnmi, snmp, telemetry, otel, sflow, bmp, mgmt-framework) useBindsTo=sonic.targetin[Unit]but lacked an[Install]section, so the generator skipped creating symlinks for them.This caused two problems:
_reset_failed_services()in sonic-utilities iteratessystemctl list-dependencies --plain sonic.targetand never resets rate limits for these services, causingstart-limit-hitafter multiple config reloads.featureddaemon checksunit_file_state == 'enabled'but these services now reportstatic(no[Install]= static), causing redundantsystemctl startcalls on every reload.The most visible symptom is pmon hitting
start-limit-hitduring tests that perform multiple config reloads (e.g.,test_load_minigraph_with_golden_config).Fixes #25931
Work item tracking
How I did it
Added
[Install]section withWantedBy=sonic.targetto all 9 affected service templates, consistent with other container services (dhcp_relay, swss, syncd, teamd, etc.) that already have this section.Affected templates:
files/build_templates/pmon.service.j2files/build_templates/gnmi.service.j2files/build_templates/snmp.service.j2files/build_templates/telemetry.service.j2files/build_templates/otel.service.j2files/build_templates/sflow.service.j2files/build_templates/mgmt-framework.service.j2files/build_templates/per_namespace/lldp.service.j2files/build_templates/per_namespace/bmp.service.j2How to verify it
sonic.targetdependencies:systemctl list-dependencies --plain sonic.target | grep pmonUnitFileStateis no longerstatic:systemctl show pmon.service --property=UnitFileStatetest_load_minigraph_with_golden_config— pmon should not hitstart-limit-hitWorkaround verified on testbed: Manually creating the symlinks on a live DUT confirmed the fix resolves the issue.
Which release branch to backport (provide reason below if selected)
The bug was introduced by the systemd-sonic-generator rework cherry-picked to 202511 via PR #24988.
Tested branch (Please provide the tested image version)
Description for the changelog
Add missing [Install] WantedBy=sonic.target to 9 container service templates to fix start-limit-hit failures after config reloads.
A picture of a cute animal (not mandatory but encouraged)
🦔