[action] [PR:24219] retry the chassis db cleanup operations#1730
Merged
arlakshm merged 1 commit intoAzure:202405from Oct 17, 2025
Merged
[action] [PR:24219] retry the chassis db cleanup operations#1730arlakshm merged 1 commit intoAzure:202405from
arlakshm merged 1 commit intoAzure:202405from
Conversation
<!--
Please make sure you've read and understood our contributing guidelines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md
** Make sure all your commits include a signature generated with `git commit -s` **
If this is a bug fix, make sure your description includes "fixes #xxxx", or
"closes #xxxx" or "resolves #xxxx"
Please provide the following information:
-->
#### Why I did it
When running load_minigraph or reloading configuration on the linecards, the `interface-config.service` restarts, which causes the midplane interface to flap. If swss.sh on the linecards deletes state from chassis_db, some states may not be cleaned up correctly, while others are successfully removed. For example, cleanup for `SYSTEM_NEIGHBOR` or `SYSTEM_INTF` may fail, but SYSTEM_LAG cleanup might succeed. This can lead to inconsistent lag IDs for the remote LC.
##### Work item tracking
- Microsoft ADO **35454463**
#### How I did it
Add logic to retry in swss.sh script.
#### How to verify it
Run test to do load_minigraph on all the linecards and check for the logs to for remove lag failure for Lags on remote LC.
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 202205
- [ ] 202211
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
#### Tested branch (Please provide the tested image version)
<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->
- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->
#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->
<!--
Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->
#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->
#### A picture of a cute animal (not mandatory but encouraged)
Collaborator
Author
|
Original PR: sonic-net/sonic-buildimage#24219 |
Collaborator
Author
|
/azp run |
9 tasks
|
Azure Pipelines successfully started running 1 pipeline(s). |
arlakshm
approved these changes
Oct 17, 2025
liushilongbuaa
pushed a commit
that referenced
this pull request
Mar 25, 2026
…tically (#25251) #### Why I did it src/sonic-sairedis ``` * 8c3d40d5 - (HEAD -> master, origin/master, origin/HEAD) Add PORT_PHY_ATTR flex counter support (#1674) (9 hours ago) [Dhanasekar Rathinavel] * 5966a71b - Update SAI Header to latest (#1742) (2 days ago) [Tejaswini Chadaga] * 2752ad6c - [multi-asic][Mellanox] add support for Mellanox multi-asic (#1683) (3 days ago) [Yakiv Huryk] * c4e3c142 - Fix deadlock between syncd and orchagent syncd during initialization failure (#1723) (7 days ago) [DavidZagury] * 50c5626c - Fix dash meter COUNTERS_DB keys to use VID instead of RID (#1725) (8 days ago) [Mukesh Moopath Velayudhan] * 7632eebb - [Mellanox] Add phcsync activation for mellanox platforms. (#1734) (9 days ago) [Zili Bombach] * 9eeffe38 - fix docker slave name (#1730) (10 days ago) [yijingyan2] * 90670166 - Use matching slave container from the branch we're building against (#1743) (10 days ago) [Saikrishna Arcot] ``` #### How I did it #### How to verify it #### Description for the changelog
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why I did it
When running load_minigraph or reloading configuration on the linecards, the
interface-config.servicerestarts, which causes the midplane interface to flap. If swss.sh on the linecards deletes state from chassis_db, some states may not be cleaned up correctly, while others are successfully removed. For example, cleanup forSYSTEM_NEIGHBORorSYSTEM_INTFmay fail, but SYSTEM_LAG cleanup might succeed. This can lead to inconsistent lag IDs for the remote LC.Work item tracking
How I did it
Add logic to retry in swss.sh script.
How to verify it
Run test to do load_minigraph on all the linecards and check for the logs to for remove lag failure for Lags on remote LC.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)