Skip to content

bgpd: Fix BGP-LS initial TED sync and cleanup on peer deactivation#21286

Open
cscarpitta wants to merge 3 commits intoFRRouting:masterfrom
cscarpitta:fix_bgp_ls_initial_db_sync
Open

bgpd: Fix BGP-LS initial TED sync and cleanup on peer deactivation#21286
cscarpitta wants to merge 3 commits intoFRRouting:masterfrom
cscarpitta:fix_bgp_ls_initial_db_sync

Conversation

@cscarpitta
Copy link
Contributor

This PR fixes two issues in the BGP-LS code.

Initial sync: After registering with the LS database, BGP-LS never requests a sync, leaving the TED empty until the IGP sends unsolicited updates. Fix by calling `ls_request_sync() after registration.

Peer deactivation: When the last BGP-LS peer is deactivated, self-originated routes are not withdrawn and the TED is not cleared, leaving stale state. Fix by adding bgp_ls_withdraw_ted() and calling it from peer_deactivate().

@frrbot frrbot bot added bgp tests Topotests, make check, etc labels Mar 22, 2026
@greptile-apps
Copy link

greptile-apps bot commented Mar 22, 2026

Greptile Summary

This PR fixes two correctness gaps in the BGP-LS subsystem: the TED was never populated on startup because the initial sync was never requested, and stale self-originated routes plus TED state were left behind when the last BGP-LS peer was deactivated.

  • Initial sync: ls_request_sync() is now called immediately after ls_register() in bgp_ls_register(). Failure is non-fatal (logged as a warning) and registration is still marked successful, which is the right trade-off.
  • LS_MSG_EVENT_SYNC dispatch: LS_MSG_EVENT_SYNC was missing from all three per-object switch statements (bgp_ls_process_vertex, bgp_ls_process_edge, bgp_ls_process_subnet) and from the edge-level bidirectional-linking condition in bgp_ls_process_message. All four sites now treat SYNC identically to ADD, matching the intended semantics.
  • Peer deactivation cleanup: New bgp_ls_ted_clear() (static) and bgp_ls_withdraw_ted() (exported) clean up the RIB and TED when the last peer goes away. bgp_clear_route(bgp->peer_self, AFI_BGP_LS, SAFI_BGP_LS) correctly targets only self-originated paths; bgp_ls_ted_clear uses frr_each_safe for safe in-place deletion. The call in bgpd.c happens before bgp_ls_unregister(), ensuring withdrawals are issued while the zebra subscription is still live.
  • Test coverage: Two new topotests (test_bgp_ls_peer_deactivate / test_bgp_ls_peer_reactivate) exercise the full deactivate-then-reactivate cycle on both the producer (r2) and consumer (rr) routers.

Confidence Score: 4/5

  • PR is safe to merge; both fixes are correct and well-targeted, with topotest coverage for the new behaviour.
  • The logic is sound: ls_request_sync is placed correctly after registration, LS_MSG_EVENT_SYNC is now handled consistently across all dispatch sites, and bgp_ls_withdraw_ted correctly uses bgp_clear_route(peer_self, …) to hit only self-originated paths before clearing the TED. The one noteworthy design point — bgp_clear_route enqueues work asynchronously while bgp_ls_ted_clear runs synchronously — is functionally safe because the withdrawal work queue does not consult TED state. Score of 4 rather than 5 only because this is new non-trivial state-management code that benefits from a second pair of eyes before landing.
  • bgpd/bgp_ls_ted.c — new bgp_ls_withdraw_ted function and SYNC dispatch additions are the most impactful changes and deserve careful review.

Important Files Changed

Filename Overview
bgpd/bgp_ls.c Adds ls_request_sync() call after successful ls_register() to trigger initial TED population; correctly continues even on sync failure with a warning log.
bgpd/bgp_ls_ted.c Adds LS_MSG_EVENT_SYNC to all switch-case handlers (vertex/edge/subnet) and edge-level bidirectional linking logic; introduces bgp_ls_ted_clear() and bgp_ls_withdraw_ted() for bulk TED cleanup on peer deactivation. bgp_clear_route is async while bgp_ls_ted_clear is synchronous — functionally safe but worth noting.
bgpd/bgp_ls_ted.h Exports the new bgp_ls_withdraw_ted() declaration with a clear docstring describing its contract.
bgpd/bgpd.c Calls bgp_ls_withdraw_ted() before bgp_ls_unregister() in the last-peer deactivation path — correct ordering ensures routes are withdrawn before the zebra subscription is dropped.
tests/topotests/bgp_link_state/test_bgp_link_state.py Adds check_bgp_ls_empty() helper and two new end-to-end tests: test_bgp_ls_peer_deactivate and test_bgp_ls_peer_reactivate, following existing file patterns.

Sequence Diagram

sequenceDiagram
    participant Peer as BGP-LS Peer
    participant bgpd as bgpd (peer_deactivate)
    participant LSTD as bgp_ls_withdraw_ted
    participant RIB as BGP RIB (async work queue)
    participant TED as ls_ted
    participant Zebra as Zebra LS DB

    Note over Peer,Zebra: --- Initial Registration ---
    Peer->>bgpd: activate (first peer)
    bgpd->>Zebra: ls_register()
    bgpd->>Zebra: ls_request_sync()
    Zebra-->>bgpd: LS_MSG_EVENT_SYNC messages
    bgpd->>TED: ls_vertex/edge/subnet add
    bgpd->>RIB: bgp_ls_originate_node/link/prefix

    Note over Peer,Zebra: --- Peer Deactivation (last peer) ---
    Peer->>bgpd: deactivate (last peer)
    bgpd->>LSTD: bgp_ls_withdraw_ted(bgp)
    LSTD->>RIB: bgp_clear_route(peer_self, AFI_BGP_LS, SAFI_BGP_LS)
    RIB-->>Peer: BGP UPDATE withdrawals (async)
    LSTD->>TED: bgp_ls_ted_clear() — remove all vertices/edges/subnets
    bgpd->>Zebra: bgp_ls_unregister()

    Note over Peer,Zebra: --- Re-activation ---
    Peer->>bgpd: activate (new peer)
    bgpd->>Zebra: ls_register()
    bgpd->>Zebra: ls_request_sync()
    Zebra-->>bgpd: LS_MSG_EVENT_SYNC messages
    bgpd->>TED: re-populate
    bgpd->>RIB: re-originate all NLRIs
Loading

Reviews (1): Last reviewed commit: "tests: Verify BGP-LS routes withdrawn on..." | Re-trigger Greptile

After registering with the LS database, no initial sync is requested,
so the TED remains empty until the IGP sends unsolicited updates.
Any topology changes that occurred before registration are
permanently missed and never originated as BGP-LS NLRIs.

Additionally, LS_MSG_EVENT_SYNC messages are not handled in the TED
processors, so any sync response from zebra is silently dropped.

Request a sync via ls_request_sync() immediately after registration,
following the same pattern used by other TED consumers such as pathd.
Handle LS_MSG_EVENT_SYNC alongside LS_MSG_EVENT_ADD in
bgp_ls_process_vertex(), bgp_ls_process_edge(), and
bgp_ls_process_subnet().

Signed-off-by: Carmine Scarpitta <[email protected]>
When the last BGP-LS peer is deactivated, locally originated BGP-LS
routes are not withdrawn from the RIB, leaving stale routes on peers.
The TED is also not cleared, so the next registration re-originates
on top of stale state.

Add bgp_ls_withdraw_ted() which removes all self-originated paths via
bgp_clear_route() and clears all TED entries. Call it in
peer_deactivate() when the last BGP-LS peer is being deactivated,
before unregistering from the LS database.

Signed-off-by: Carmine Scarpitta <[email protected]>
Add test_bgp_ls_peer_deactivate() to verify that deactivating the
last BGP-LS peer on r2 withdraws all locally originated routes on
r2 and clears all received routes on rr.

Add test_bgp_ls_peer_reactivate() to verify that reactivating the
peer triggers a fresh TED sync, re-originates all BGP-LS NLRIs on
r2, and re-advertises them to rr.

Signed-off-by: Carmine Scarpitta <[email protected]>
@cscarpitta cscarpitta force-pushed the fix_bgp_ls_initial_db_sync branch from a4c3b6e to 431b681 Compare March 22, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bgp master size/L tests Topotests, make check, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant