Skip to content

Update EVPN prefix routes properly instead of withdraw/install#18158

Merged
riw777 merged 3 commits intoFRRouting:masterfrom
chdxD1:evpn-no-withdraw-for-update
Apr 22, 2025
Merged

Update EVPN prefix routes properly instead of withdraw/install#18158
riw777 merged 3 commits intoFRRouting:masterfrom
chdxD1:evpn-no-withdraw-for-update

Conversation

@chdxD1
Copy link
Copy Markdown
Contributor

@chdxD1 chdxD1 commented Feb 14, 2025

Today bgpd sends down withdraw and install to zebra when a EVPN prefix route is updated. There was no mechanism in zebra to track such changes and references to prefixes on EVPN nexthops.

The commits implement the respective tracking in zebra and stops sending withdraws for updates from bgpd.

@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Feb 15, 2025

There is one caveat after looking at this again - EVPN MH uses the same functions but slightly different, doing reference counting already in bgpd.

@chdxD1 chdxD1 force-pushed the evpn-no-withdraw-for-update branch 2 times, most recently from ea2dd4f to f451ea1 Compare February 17, 2025 11:37
@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Feb 17, 2025

The behaviour for EVPN MH / A-D routes should now be the same again. I've also squashed the zebra commits into one now and fixed the code style issues hinted by CI.

@donaldsharp
Copy link
Copy Markdown
Member

donaldsharp commented Feb 18, 2025

Why is zebra responsible for the refcounting? This seems like a bgp problem not a zebra problem. As a note -> the typical pattern in zebra is to just do what it is told. This breaks that pattern.

@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Feb 18, 2025

As far as I understood zebra is already "refcounting" today, it tracks the prefixes for a given EVPN VTEP / nexthop. zebra is being told to add / update or withdraw a route, not an evpn nexthop, therefore it has to take care of managing the references for it.
With this change it tracks the count of individual nexthops per route, making the add / update action actually working without having to know about the specifics for EVPN in e.g. bgpd that EVPN routes have to be withdrawn first.

@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Feb 25, 2025

Can we follow up on this? There is also #18240 which might be fixed by this.

I am happy to rework this PR but I am not sure if refcounting can be moved to bgpd.

@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Mar 4, 2025

@donaldsharp maybe I misunderstood something in the interaction between bgpd and zebra, but:

  • bgpd passes zapi_routes to zebra. Except for EVPN MAC-IP (type 2) with ESI routes bgpd does not track next-hop groups (bgp_evpn_path_es_use_nhg)
  • Therefore bgpd always passes the full set of next-hops with the zapi_route to zebra (within ZEBRA_ROUTE_ADD), referenced in the code as legacy-exploded multipath a couple of times
  • So in this case there isn't ref counting in bgpd at all

zebra needs to find out if, for a given EVPN IP-Prefix (type 5) / L3-VNI nexthop, it needs to a) keep it b) update it c) remove it (when ZEBRA_ROUTE_DEL is called).

Before this PR bgpd always sent a delete before an add for every update to a bgpd route to zebra. This ensured that there are no leftover L3-VNI next-hop when a route next-hop changes (e.g. old next-hop 1.1.1.1, new next-hop 2.2.2.2 --> old next-hop was removed first, then new next-hop installed). However this also applies to other updates that do not change next-hops (e.g. old next-hop 1.1.1.1, new next-hop 1.1.1.1, L3-VNI next-hop is removed first, then re-added after). This also sends del/add operations to the Kernel, leading to brief traffic interruptions.

This PR is tracking if a L3-VNI next-hop is still used by route_entries (by counting the paths), removing it when the paths reach zero.

It is based on my understanding of bgpd and zebra that zebra does indeed do refcounting, especially for next-hop groups (see above, bgpd only does that in certain, limited cases): zebra_nhg_increment_ref / zebra_nhg_decrement_ref. Because this is ultimately linked to next-hops it makes sense to include refcounting in L3-VNI next-hops as well.

@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Mar 4, 2025

ci:rerun

@ton31337
Copy link
Copy Markdown
Member

Can we test it?

@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Mar 21, 2025

I have two ideas:

  1. Testing that pathcounting is working properly
  2. Testing that no withdraw is ever sent to the Kernel for EVPN prefixes (except VTEP changes) when routes get updated

Testing 1. should be straight forward, but should already be covered by existing topotests as well (e.g. the one I implemented here: #18325 and the others checking for RMACs). Testing 2. would require spinning up ip monitor (or directly some netlink socket) inside the respective router namespace.

@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Mar 21, 2025

For 1 it probably makes sense to include the pathcount in the expected JSONs, validating that as well

@chdxD1 chdxD1 force-pushed the evpn-no-withdraw-for-update branch from f451ea1 to 74713a2 Compare March 25, 2025 16:21
@github-actions github-actions bot added size/L and removed size/M labels Mar 25, 2025
@chdxD1 chdxD1 force-pushed the evpn-no-withdraw-for-update branch from 74713a2 to 1d98182 Compare March 26, 2025 08:39
@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Mar 26, 2025

ci:rerun unrelated

@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Mar 26, 2025

so - I think let's get this reviewed first before I try to satisfy some unrelated topotests :D

Copy link
Copy Markdown
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good ...

@riw777
Copy link
Copy Markdown
Member

riw777 commented Apr 1, 2025

seeing if we can get the ci to pass ... the lint error needs to be fixed up

@chdxD1 chdxD1 force-pushed the evpn-no-withdraw-for-update branch from 1d98182 to 70b5f4c Compare April 1, 2025 18:30
@github-actions github-actions bot added the rebase PR needs rebase label Apr 1, 2025
@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Apr 1, 2025

forgot to run black, should be fixed now

chdxD1 added 2 commits April 1, 2025 20:32
Previously bgpd needed to send a withdraw followed by an install
to update an EVPN prefix route. With refcount tracking in zebra
this is no longer needed

Signed-off-by: Christopher Dziomba <[email protected]>
With bgpd no longer sending withdraws for EVPN prefix routes
zebra needs to track the path/ref count of prefixes for an EVPN
nexthop. When a route is updated the count is first increased
with the new paths and then decreased with the paths of the
existing/"same" route entry.
However the same evpn route methods are used for EVPN MH as well,
where bgpd already tracks the references. It is expected that an
ADD operation for the respective A-D routes is handled as an
upsert, a DEL operation should really remove the respective A-D
reference on a next-hop. For this the old behaviour (no path/ref
counting in zebra) is preserved.

Signed-off-by: Christopher Dziomba <[email protected]>
@chdxD1 chdxD1 force-pushed the evpn-no-withdraw-for-update branch from 70b5f4c to 85d1700 Compare April 1, 2025 18:32
@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Apr 1, 2025

That force-push after rebase was probably too fast for CI?

@mwinter-osr
Copy link
Copy Markdown
Member

CI:rerun Rerunning the CI after fix on "[CI] Verify Source" incorrectly reporting bad status

@riw777
Copy link
Copy Markdown
Member

riw777 commented Apr 8, 2025

are we waiting on a topo test for the counts?

@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Apr 9, 2025

are we waiting on a topo test for the counts?

pathCounts are checked in _test_router_check_evpn_next_hop (which gets called a couple of times inside the tests, also during the flap tests). From my perspective: no

_test_rmac_present(dut)

# Enable dataplane logs in FRR
dut.vtysh_cmd("configure terminal\nno debug zebra dplane detailed\n")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this testing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's for the test in lines 604-606 but good catch, it should be without the no in front.

Copy link
Copy Markdown
Contributor Author

@chdxD1 chdxD1 Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the no. I've also made sure that the test fails when 068a00f is reverted.

@chdxD1 chdxD1 force-pushed the evpn-no-withdraw-for-update branch from 85d1700 to 1f304cf Compare April 14, 2025 07:47
Adding tests in the bgp_evpn_rt5 topology to cover the changed
bgp -> zebra interaction that does no longer rely on withdrawing and
then re-installing the route. The newly introduced pathCount of EVPN
next-hops is checked. In addition the log is checked for MAC_DELETE or
NEIGH_DELETE during multipath flaps that must no longer be present for
the test to succeed.

Signed-off-by: Christopher Dziomba <[email protected]>
@chdxD1 chdxD1 force-pushed the evpn-no-withdraw-for-update branch from 1f304cf to 4e5d3b6 Compare April 14, 2025 18:10
@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Apr 14, 2025

Okay, on slower systems (like the CI) the clear might not be completed until I check for multipath convergence. This lead to the CI failure in https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO3D12I386-8602. I now check for the bgp session to be a) established again and b) That the established epoch has changed. This should make the test independent of the system speed.

@chdxD1
Copy link
Copy Markdown
Contributor Author

chdxD1 commented Apr 14, 2025

CI:rerun unrelated

@riw777 riw777 merged commit e525972 into FRRouting:master Apr 22, 2025
13 checks passed
dawkopagh pushed a commit to dawkopagh/frr that referenced this pull request Mar 17, 2026
…pdate

Update EVPN prefix routes properly instead of withdraw/install

(cherry picked from commit e525972)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants