[bgp/agg]: Add BGP aggregate address test cases for Config Persistence and Recovery by shixizhang · Pull Request #23347 · sonic-net/sonic-mgmt

shixizhang · 2026-03-26T11:46:02Z

Description of PR

Summary:
Add new test file test_bgp_aggregate_address_resilience.py (Test Group 5) that validates BGP aggregate-address configuration persistence and recovery across various disruption scenarios. These 5 new test cases verify that aggregate address configuration written via GCU survives BGP container restarts, config reloads, cold reboots, warm reboots, and BBR state transitions.

New test cases:

TC 5.1 test_aggregate_persists_bgp_container_restart: Aggregate config survives BGP container restart; CONFIG_DB + STATE_DB + FRR are consistent after recovery.
TC 5.2 test_aggregate_persists_config_reload: Aggregate config (with summary-only=true) survives config save + config reload.
TC 5.3 test_aggregate_persists_config_save_and_reboot: IPv6 aggregate config survives config save + cold reboot.
TC 5.4 test_aggregate_bbr_required_inactive_persists_bgp_restart: BBR-required aggregate stays inactive after BGP restart when BBR is disabled; activates once BBR is enabled.
TC 5.5 test_aggregate_persists_warm_reboot: Aggregate config survives warm reboot.

Type of change

Back port request

Approach

What is the motivation for this PR?

Existing BGP aggregate-address tests cover configuration validation and route propagation behavior, but there are no tests verifying that aggregate address configuration persists across operational disruptions such as BGP container restarts, config reloads, and device reboots. This PR fills that gap by adding resilience tests that validate CONFIG_DB, STATE_DB, and FRR running-config consistency after each disruption type.

How did you do it?

Created test_bgp_aggregate_address_resilience.py reusing existing helpers and fixtures from test_bgp_aggregate_address.py (AggregateCfg, gcu_add_aggregate, gcu_remove_aggregate, verify_bgp_aggregate_consistence, verify_bgp_aggregate_cleanup, dump_db, and the setup_teardown checkpoint/rollback fixture).
Added a bgp_neighbors fixture to discover BGP neighbor IPs for session-state polling after disruptions.
Pre-disruption verification only checks CONFIG_DB (GCU write is synchronous). Post-disruption verification checks the full stack (CONFIG_DB + STATE_DB + FRR) after bgpcfgd has re-processed the config.
Added wait_for_aggregate_state() helper to handle the asynchronous bgpcfgd STATE_DB population after disruptions.
All test cases include proper cleanup in finally blocks with graceful fallback to checkpoint rollback.

How did you verify/test it?

Ran all test cases on a physical m1-48 testbed with Arista EOS neighbors.

Any platform specific information?

No platform-specific dependencies. Tests use GCU for configuration and standard SONiC reboot/reload utilities, which are platform-agnostic.

Supported testbed topology if it's a new test case?

t1, m1 (declared via @pytest.mark.topology("t1", "m1"))

Documentation

Aligned with BGP-Aggregate-Address test plan

mssonicbld · 2026-03-26T11:46:09Z

/azp run

azure-pipelines · 2026-03-26T11:46:24Z