retry the chassis db cleanup operations#24219
Conversation
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@saksarav-nokia @ysmanman, @mlok-nokia, @arista-nwolfe, please help review... |
|
@arlakshm , The code changes LGTM. However, i am unable to understand when we could hit this issue. If IMM can't reach the CHASSIS_APP_DB, then the following code waits till it is reachable right? |
Hi @saksarav-nokia, during load_minigraph the networking service is restarted, which happens parallely so the midplane interface can flap after this code is executed |
|
Cherry-pick PR to msft-202405: Azure/sonic-buildimage-msft#1730 |
When running load_minigraph or reloading configuration on the linecards, the interface-config.service restarts, which causes the midplane interface to flap. If swss.sh on the linecards deletes state from chassis_db, some states may not be cleaned up correctly, while others are successfully removed. For example, cleanup for SYSTEM_NEIGHBOR or SYSTEM_INTF may fail, but SYSTEM_LAG cleanup might succeed. This can lead to inconsistent lag IDs for the remote LC. Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com> Signed-off-by: Feng Pan <fenpan@microsoft.com>
Why I did it
When running load_minigraph or reloading configuration on the linecards, the
interface-config.servicerestarts, which causes the midplane interface to flap. If swss.sh on the linecards deletes state from chassis_db, some states may not be cleaned up correctly, while others are successfully removed. For example, cleanup forSYSTEM_NEIGHBORorSYSTEM_INTFmay fail, but SYSTEM_LAG cleanup might succeed. This can lead to inconsistent lag IDs for the remote LC.Work item tracking
How I did it
Add logic to retry in swss.sh script.
How to verify it
Run test to do load_minigraph on all the linecards and check for the logs to for remove lag failure for Lags on remote LC.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)