[bgp/agg]: Add BGP aggregate address test cases for Config Persistence and Recovery #23347
Open
shixizhang wants to merge 2 commits intosonic-net:masterfrom
Open
[bgp/agg]: Add BGP aggregate address test cases for Config Persistence and Recovery #23347shixizhang wants to merge 2 commits intosonic-net:masterfrom
shixizhang wants to merge 2 commits intosonic-net:masterfrom
Conversation
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
f123c6f to
e967b7f
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
e967b7f to
27700c7
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Co-authored-by: Copilot <[email protected]> Signed-off-by: Shixi Zhang <[email protected]>
27700c7 to
391c099
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Replace safe_reboot=True with explicit service and BGP session recovery, matching the pattern from test_bgp_session.py. safe_reboot=True calls wait_critical_processes() which requires every process in every container to be healthy — this fails on VS when fpmsyncd crashes during warm reboot. The lighter approach: 1. reboot() handles SSH reconnect and warmboot finalizer 2. Explicit critical_services_fully_started wait (480s, 120s initial delay) 3. Explicit BGP session wait This avoids the hard wait_critical_processes check while still validating that the DUT recovers enough for aggregate address verification. Co-authored-by: Copilot <[email protected]> Signed-off-by: Shixi Zhang <[email protected]>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary:
Add new test file
test_bgp_aggregate_address_resilience.py(Test Group 5) that validates BGP aggregate-address configuration persistence and recovery across various disruption scenarios. These 5 new test cases verify that aggregate address configuration written via GCU survives BGP container restarts, config reloads, cold reboots, warm reboots, and BBR state transitions.New test cases:
test_aggregate_persists_bgp_container_restart: Aggregate config survives BGP container restart; CONFIG_DB + STATE_DB + FRR are consistent after recovery.test_aggregate_persists_config_reload: Aggregate config (with summary-only=true) survives config save + config reload.test_aggregate_persists_config_save_and_reboot: IPv6 aggregate config survives config save + cold reboot.test_aggregate_bbr_required_inactive_persists_bgp_restart: BBR-required aggregate stays inactive after BGP restart when BBR is disabled; activates once BBR is enabled.test_aggregate_persists_warm_reboot: Aggregate config survives warm reboot.Type of change
Back port request
Approach
What is the motivation for this PR?
Existing BGP aggregate-address tests cover configuration validation and route propagation behavior, but there are no tests verifying that aggregate address configuration persists across operational disruptions such as BGP container restarts, config reloads, and device reboots. This PR fills that gap by adding resilience tests that validate CONFIG_DB, STATE_DB, and FRR running-config consistency after each disruption type.
How did you do it?
test_bgp_aggregate_address_resilience.pyreusing existing helpers and fixtures fromtest_bgp_aggregate_address.py(AggregateCfg,gcu_add_aggregate,gcu_remove_aggregate,verify_bgp_aggregate_consistence,verify_bgp_aggregate_cleanup,dump_db, and thesetup_teardowncheckpoint/rollback fixture).bgp_neighborsfixture to discover BGP neighbor IPs for session-state polling after disruptions.wait_for_aggregate_state()helper to handle the asynchronous bgpcfgd STATE_DB population after disruptions.finallyblocks with graceful fallback to checkpoint rollback.How did you verify/test it?
Ran all test cases on a physical m1-48 testbed with Arista EOS neighbors.

Any platform specific information?
No platform-specific dependencies. Tests use GCU for configuration and standard SONiC reboot/reload utilities, which are platform-agnostic.
Supported testbed topology if it's a new test case?
t1, m1 (declared via
@pytest.mark.topology("t1", "m1"))Documentation
Aligned with BGP-Aggregate-Address test plan