Skip to content

[Mux] Clear bulkers when rolling back mux switchover#3788

Merged
prsunny merged 4 commits intosonic-net:masterfrom
theasianpianist:mux-rollback-clear-bulk
Jul 25, 2025
Merged

[Mux] Clear bulkers when rolling back mux switchover#3788
prsunny merged 4 commits intosonic-net:masterfrom
theasianpianist:mux-rollback-clear-bulk

Conversation

@theasianpianist
Copy link
Copy Markdown
Contributor

What I did
When a switchover failure is detected in MuxOrch, clear relevant bulkers to provide a clean slate for the rollback process.

Why I did it
In certain failure scenarios, if an exception is thrown inside the bulker, it's possible that the bulker is not cleared and still contains data in creating_entries or removing_entries. When the rollback process begins, these entries will be programmed to the SAI a second time, which is a) incorrect b) could potentially trigger the same exception second time.

How I verified it
Run the MuxRollbackTest.StandbyToActiveExceptionRollbackToStandby test

Details if related

@theasianpianist theasianpianist requested a review from prsunny as a code owner July 25, 2025 01:40
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@prsunny prsunny merged commit 71219b0 into sonic-net:master Jul 25, 2025
15 checks passed
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202505: #3855

@yejianquan
Copy link
Copy Markdown

add 202505 label to see whether it fix the unit test failure in
#3843

https://dev.azure.com/mssonic/build/_build/results?buildId=935100&view=logs&jobId=80130f07-32fc-5c19-809b-35051dcc5ef5&j=80130f07-32fc-5c19-809b-35051dcc5ef5&t=cef16533-2186-5392-d596-cb28cc45bbb3

[----------] Global test environment tear-down
[==========] 233 tests from 42 test suites ran. (6182 ms total)
[  PASSED  ] 232 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] MuxRollbackTest.StandbyToActiveExceptionRollbackToStandby

@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to msft-202506: Azure/sonic-swss.msft#152

Janetxxx pushed a commit to Janetxxx/sonic-swss that referenced this pull request Nov 10, 2025
What I did
When a switchover failure is detected in MuxOrch, clear relevant bulkers to provide a clean slate for the rollback process.

Why I did it
In certain failure scenarios, if an exception is thrown inside the bulker, it's possible that the bulker is not cleared and still contains data in creating_entries or removing_entries. When the rollback process begins, these entries will be programmed to the SAI a second time, which is a) incorrect b) could potentially trigger the same exception second time.

How I verified it
Run the MuxRollbackTest.StandbyToActiveExceptionRollbackToStandby test
balanokia pushed a commit to balanokia/sonic-swss that referenced this pull request Nov 17, 2025
What I did
When a switchover failure is detected in MuxOrch, clear relevant bulkers to provide a clean slate for the rollback process.

Why I did it
In certain failure scenarios, if an exception is thrown inside the bulker, it's possible that the bulker is not cleared and still contains data in creating_entries or removing_entries. When the rollback process begins, these entries will be programmed to the SAI a second time, which is a) incorrect b) could potentially trigger the same exception second time.

How I verified it
Run the MuxRollbackTest.StandbyToActiveExceptionRollbackToStandby test
theasianpianist added a commit to theasianpianist/sonic-swss that referenced this pull request Feb 4, 2026
What I did
When a switchover failure is detected in MuxOrch, clear relevant bulkers to provide a clean slate for the rollback process.

Why I did it
In certain failure scenarios, if an exception is thrown inside the bulker, it's possible that the bulker is not cleared and still contains data in creating_entries or removing_entries. When the rollback process begins, these entries will be programmed to the SAI a second time, which is a) incorrect b) could potentially trigger the same exception second time.

How I verified it
Run the MuxRollbackTest.StandbyToActiveExceptionRollbackToStandby test

Signed-off-by: Lawrence Lee <[email protected]>
baorliu pushed a commit to baorliu/sonic-swss that referenced this pull request Feb 23, 2026
What I did
When a switchover failure is detected in MuxOrch, clear relevant bulkers to provide a clean slate for the rollback process.

Why I did it
In certain failure scenarios, if an exception is thrown inside the bulker, it's possible that the bulker is not cleared and still contains data in creating_entries or removing_entries. When the rollback process begins, these entries will be programmed to the SAI a second time, which is a) incorrect b) could potentially trigger the same exception second time.

How I verified it
Run the MuxRollbackTest.StandbyToActiveExceptionRollbackToStandby test

Signed-off-by: Baorong Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants