[action] [PR:22895] Fix test_add_rack flakiness by waiting for BGP convergence before DB comparison#22938
Open
mssonicbld wants to merge 1 commit intosonic-net:202511from
Open
[action] [PR:22895] Fix test_add_rack flakiness by waiting for BGP convergence before DB comparison#22938mssonicbld wants to merge 1 commit intosonic-net:202511from
mssonicbld wants to merge 1 commit intosonic-net:202511from
Conversation
…comparison (sonic-net#22895) * Fix test_add_rack flakiness by waiting for BGP convergence before DB comparison The test_add_rack test was failing ~1% of runs with 'DB compare failed after adding T0 via generic patch updater'. Root cause: DB comparison ran before BGP sessions established, causing app-db route entry mismatches. Changes: - Add is_bgp_session_established() helper that returns bool for use with wait_until retry mechanism - Move BGP session convergence check BEFORE DB comparison in generic_patch_add_t0(), so app-db routes are populated before comparing against baseline - BGP check now retries with wait_until instead of bare assert Co-authored-by: Copilot <[email protected]> Signed-off-by: Storm Liang <[email protected]> * Remove unused chk_bgp_session import to fix flake8 F401 Signed-off-by: Storm Liang <[email protected]> --------- Signed-off-by: Storm Liang <[email protected]> Co-authored-by: Copilot <[email protected]> Signed-off-by: mssonicbld <[email protected]>
Collaborator
Author
|
Original PR: #22895 |
Collaborator
Author
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
|
/azp run |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
|
/azpw run |
Collaborator
Author
|
Retrying failed(or canceled) jobs... |
Collaborator
Author
|
Build not found. Please close and reopen the PR or rebase your branch to trigger a new build. |
Collaborator
Author
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run |
Collaborator
Author
|
Retrying failed(or canceled) jobs... |
Collaborator
Author
|
Retrying failed(or canceled) stages in build 1072424: ✅Stage Test:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
\ est_add_rack\ was failing ~1% of PR and baseline runs with:
\
AssertionError: DB compare failed after adding T0 via generic patch updater
\\
Root Cause
In \generic_patch_add_t0(), the DB comparison (config-db, app-db, state-db) ran before BGP sessions established. After applying a JSON patch to add T0 config (BGP_NEIGHBOR, INTERFACE, PORT, etc.), BGP peers need time to converge and populate app-db route entries. The DB comparison would repeatedly fail its 5-minute retry window because app-db routes hadn't settled.
The BGP session check was placed after the DB comparison and used a bare \�ssert\ with no retry — so it never had a chance to gate the comparison.
Evidence (Kusto, last 30 days on master baseline+PR)
Changes
Before (broken ordering)
\
Apply patch → 60s pause → DB comparison (5min retry) ❌ → BGP check (no retry) ❌
\\
After (fixed ordering)
\
Apply patch → 60s pause → Wait BGP Established (5min retry) ✅ → DB comparison (5min retry) ✅
\\