Fix test_add_rack flakiness by waiting for BGP convergence before DB comparison#22895
Merged
StormLiangMS merged 2 commits intosonic-net:masterfrom Mar 13, 2026
Merged
Conversation
…comparison The test_add_rack test was failing ~1% of runs with 'DB compare failed after adding T0 via generic patch updater'. Root cause: DB comparison ran before BGP sessions established, causing app-db route entry mismatches. Changes: - Add is_bgp_session_established() helper that returns bool for use with wait_until retry mechanism - Move BGP session convergence check BEFORE DB comparison in generic_patch_add_t0(), so app-db routes are populated before comparing against baseline - BGP check now retries with wait_until instead of bare assert Co-authored-by: Copilot <[email protected]> Signed-off-by: Storm Liang <[email protected]>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
Author
|
hi @xincunli-sonic @vaibhavhd could you help to review? This is to improve the flakyness of this case. |
Signed-off-by: Storm Liang <[email protected]>
xincunli-sonic
approved these changes
Mar 12, 2026
Collaborator
Author
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
|
@StormLiangMS the format of the description is not good enough, you can ask your agency to use this .github/PULL_REQUEST_TEMPLATE.md to generate PR. |
mssonicbld
pushed a commit
to mssonicbld/sonic-mgmt
that referenced
this pull request
Mar 13, 2026
…comparison (sonic-net#22895) * Fix test_add_rack flakiness by waiting for BGP convergence before DB comparison The test_add_rack test was failing ~1% of runs with 'DB compare failed after adding T0 via generic patch updater'. Root cause: DB comparison ran before BGP sessions established, causing app-db route entry mismatches. Changes: - Add is_bgp_session_established() helper that returns bool for use with wait_until retry mechanism - Move BGP session convergence check BEFORE DB comparison in generic_patch_add_t0(), so app-db routes are populated before comparing against baseline - BGP check now retries with wait_until instead of bare assert Co-authored-by: Copilot <[email protected]> Signed-off-by: Storm Liang <[email protected]> * Remove unused chk_bgp_session import to fix flake8 F401 Signed-off-by: Storm Liang <[email protected]> --------- Signed-off-by: Storm Liang <[email protected]> Co-authored-by: Copilot <[email protected]> Signed-off-by: mssonicbld <[email protected]>
Collaborator
|
Cherry-pick PR to 202511: #22938 |
selldinesh
pushed a commit
to selldinesh/sonic-mgmt
that referenced
this pull request
Mar 16, 2026
…comparison (sonic-net#22895) * Fix test_add_rack flakiness by waiting for BGP convergence before DB comparison The test_add_rack test was failing ~1% of runs with 'DB compare failed after adding T0 via generic patch updater'. Root cause: DB comparison ran before BGP sessions established, causing app-db route entry mismatches. Changes: - Add is_bgp_session_established() helper that returns bool for use with wait_until retry mechanism - Move BGP session convergence check BEFORE DB comparison in generic_patch_add_t0(), so app-db routes are populated before comparing against baseline - BGP check now retries with wait_until instead of bare assert Co-authored-by: Copilot <[email protected]> Signed-off-by: Storm Liang <[email protected]> * Remove unused chk_bgp_session import to fix flake8 F401 Signed-off-by: Storm Liang <[email protected]> --------- Signed-off-by: Storm Liang <[email protected]> Co-authored-by: Copilot <[email protected]> Signed-off-by: selldinesh <[email protected]>
abhishek-nexthop
pushed a commit
to nexthop-ai/sonic-mgmt
that referenced
this pull request
Mar 17, 2026
…comparison (sonic-net#22895) * Fix test_add_rack flakiness by waiting for BGP convergence before DB comparison The test_add_rack test was failing ~1% of runs with 'DB compare failed after adding T0 via generic patch updater'. Root cause: DB comparison ran before BGP sessions established, causing app-db route entry mismatches. Changes: - Add is_bgp_session_established() helper that returns bool for use with wait_until retry mechanism - Move BGP session convergence check BEFORE DB comparison in generic_patch_add_t0(), so app-db routes are populated before comparing against baseline - BGP check now retries with wait_until instead of bare assert Co-authored-by: Copilot <[email protected]> Signed-off-by: Storm Liang <[email protected]> * Remove unused chk_bgp_session import to fix flake8 F401 Signed-off-by: Storm Liang <[email protected]> --------- Signed-off-by: Storm Liang <[email protected]> Co-authored-by: Copilot <[email protected]> Signed-off-by: Abhishek <[email protected]>
vrajeshe
pushed a commit
to vrajeshe/sonic-mgmt
that referenced
this pull request
Mar 23, 2026
…comparison (sonic-net#22895) * Fix test_add_rack flakiness by waiting for BGP convergence before DB comparison The test_add_rack test was failing ~1% of runs with 'DB compare failed after adding T0 via generic patch updater'. Root cause: DB comparison ran before BGP sessions established, causing app-db route entry mismatches. Changes: - Add is_bgp_session_established() helper that returns bool for use with wait_until retry mechanism - Move BGP session convergence check BEFORE DB comparison in generic_patch_add_t0(), so app-db routes are populated before comparing against baseline - BGP check now retries with wait_until instead of bare assert Co-authored-by: Copilot <[email protected]> Signed-off-by: Storm Liang <[email protected]> * Remove unused chk_bgp_session import to fix flake8 F401 Signed-off-by: Storm Liang <[email protected]> --------- Signed-off-by: Storm Liang <[email protected]> Co-authored-by: Copilot <[email protected]> Signed-off-by: Venkata Gouri Rajesh Etla <[email protected]>
Contributor
|
Hi @StormLiangMS , This PR unconditionally checks both IPv4 and IPv6 BGP sessions in assert wait_until(DB_COMP_WAIT_TIME, 20, 0, is_bgp_session_established,
duthost, tor_data["ip"]["remote"]), \
"BGP IPv4 session for {} not established before DB comparison".format(tor_data["ip"]["remote"])
assert wait_until(DB_COMP_WAIT_TIME, 20, 0, is_bgp_session_established,
duthost, tor_data["ipv6"]["remote"].lower()), \
"BGP IPv6 session for {} not established before DB comparison".format(tor_data["ipv6"]["remote"].lower())On IPv6-only topologies (e.g. I submit PR #23241 to fix the issue |
weiguo-nvidia
added a commit
to weiguo-nvidia/sonic-mgmt
that referenced
this pull request
Mar 24, 2026
PR sonic-net#22895 added BGP session convergence wait before DB comparison in generic_patch_add_t0(), but unconditionally checks both IPv4 and IPv6 BGP sessions. On IPv6-only topologies (e.g. t1-isolated-v6-d56u1-lag), tor_data["ip"]["remote"] is empty, causing is_bgp_session_established() to fail. Fix by checking whether each neighbor IP exists before waiting for the BGP session, consistent with the chk_any_bgp_session() approach from PR sonic-net#21591. Change-Id: I694fd17b7b49b4ce4ece53e0521773b71f301b70 Signed-off-by: weiguo-nvidia <[email protected]>
12 tasks
yxieca
pushed a commit
that referenced
this pull request
Mar 26, 2026
…#23241) What is the motivation for this PR?\nPR #22895 added BGP session convergence wait before DB comparison in generic_patch_add_t0(), but unconditionally checks both IPv4 and IPv6 BGP sessions. On IPv6-only topologies (e.g. t1-isolated-v6-d56u1-lag), tor_data["ip"]["remote"] is empty, causing is_bgp_session_established() to fail.\n\nHow did you do it?\nFix by checking whether each neighbor IP exists before waiting for the BGP session, consistent with the chk_any_bgp_session() approach from PR #21591.\n\nHow did you verify/test it?\nRegression test pass\n\nSigned-off-by:\nweiguo-nvidia <[email protected]>
ravaliyel
pushed a commit
to ravaliyel/sonic-mgmt
that referenced
this pull request
Mar 27, 2026
…comparison (sonic-net#22895) * Fix test_add_rack flakiness by waiting for BGP convergence before DB comparison The test_add_rack test was failing ~1% of runs with 'DB compare failed after adding T0 via generic patch updater'. Root cause: DB comparison ran before BGP sessions established, causing app-db route entry mismatches. Changes: - Add is_bgp_session_established() helper that returns bool for use with wait_until retry mechanism - Move BGP session convergence check BEFORE DB comparison in generic_patch_add_t0(), so app-db routes are populated before comparing against baseline - BGP check now retries with wait_until instead of bare assert Co-authored-by: Copilot <[email protected]> Signed-off-by: Storm Liang <[email protected]> * Remove unused chk_bgp_session import to fix flake8 F401 Signed-off-by: Storm Liang <[email protected]> --------- Signed-off-by: Storm Liang <[email protected]> Co-authored-by: Copilot <[email protected]>
ravaliyel
pushed a commit
to ravaliyel/sonic-mgmt
that referenced
this pull request
Mar 27, 2026
…sonic-net#23241) What is the motivation for this PR?\nPR sonic-net#22895 added BGP session convergence wait before DB comparison in generic_patch_add_t0(), but unconditionally checks both IPv4 and IPv6 BGP sessions. On IPv6-only topologies (e.g. t1-isolated-v6-d56u1-lag), tor_data["ip"]["remote"] is empty, causing is_bgp_session_established() to fail.\n\nHow did you do it?\nFix by checking whether each neighbor IP exists before waiting for the BGP session, consistent with the chk_any_bgp_session() approach from PR sonic-net#21591.\n\nHow did you verify/test it?\nRegression test pass\n\nSigned-off-by:\nweiguo-nvidia <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
\ est_add_rack\ was failing ~1% of PR and baseline runs with:
\
AssertionError: DB compare failed after adding T0 via generic patch updater
\\
Root Cause
In \generic_patch_add_t0(), the DB comparison (config-db, app-db, state-db) ran before BGP sessions established. After applying a JSON patch to add T0 config (BGP_NEIGHBOR, INTERFACE, PORT, etc.), BGP peers need time to converge and populate app-db route entries. The DB comparison would repeatedly fail its 5-minute retry window because app-db routes hadn't settled.
The BGP session check was placed after the DB comparison and used a bare \�ssert\ with no retry — so it never had a chance to gate the comparison.
Evidence (Kusto, last 30 days on master baseline+PR)
Changes
Before (broken ordering)
\
Apply patch → 60s pause → DB comparison (5min retry) ❌ → BGP check (no retry) ❌
\\
After (fixed ordering)
\
Apply patch → 60s pause → Wait BGP Established (5min retry) ✅ → DB comparison (5min retry) ✅
\\