Skip to content

bgpd: RD caching#21266

Draft
soumyar-roy wants to merge 4 commits intoFRRouting:masterfrom
soumyar-roy:soumya/cacherd
Draft

bgpd: RD caching#21266
soumyar-roy wants to merge 4 commits intoFRRouting:masterfrom
soumyar-roy:soumya/cacherd

Conversation

@soumyar-roy
Copy link
Contributor

@soumyar-roy soumyar-roy commented Mar 19, 2026

RD Caching - Summary
Problem: FRR auto-derives EVPN Route Distinguishers (RDs) as Type 1 format (RouterID:2-byte-number). Since VNIs are 24-bit (up to 16M) but the 2-byte field only holds up to 65535, FRR can't embed the VNI directly. Instead it uses a sequential index from bf_assign_index() which is non-deterministic across restarts. This means after a bgpd restart, VRFs and VNIs get different RD values, causing EVPN peers to see them as entirely new routes rather than updates to existing ones — leading to traffic disruption during warm/graceful restart.
Solution: Persist the rd_id assignments to disk so they survive bgpd restarts.
How it works:
State files (/var/lib/frr/.bgp_vrf_rd.txt and .bgp_vni_rd.txt) store the mapping of VRF name → rd_id and VNI → rd_id.
On startup, the state files are loaded into in-memory hash tables. The corresponding bits in the rd_idspace bitfield are reserved so new allocations don't collide.
When a VRF/VNI is configured, it looks up the hash first — if a cached rd_id exists, it reuses it. Otherwise it allocates a new one and appends to the state file.
When a VRF/VNI is deleted, its entry is removed from the hash and the state file is rewritten from the hash (atomic write via tmp + rename).
Orphan cleanup: After config is fully loaded, a bgp_config_end hook walks both hashes and removes entries that were never claimed (VRFs/VNIs that were in the state file but no longer in the running config). Their rd_id bits are freed and state files are rewritten.
Result: RDs remain stable across restarts, enabling seamless EVPN graceful restart without traffic loss

UT log>>>

Part1: Trigger GR and see, RD doesn’t trigger across reboot

VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    bordertor-11# exit
    root@bordertor-11:mgmt:/var/home/cumulus# sudo kill -9 pidof zebra
    sudo kill -9 pidof bgpd
    sudo systemctl stop frr
    root@bordertor-11:mgmt:/var/home/cumulus# sudo sed -i -e 's/cumulus_mlag/cumulus_mlag -K 60/g' /etc/frr/daemons
    sudo systemctl start frr
    sudo sed -i -e 's/cumulus_mlag -K 60/cumulus_mlag/g' /etc/frr/daemons
    root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    bordertor-11# exit
    root@bordertor-11:mgmt:/var/home/cumulus# sudo kill -9 pidof zebra
    sudo kill -9 pidof bgpd
    sudo systemctl stop frr
    root@bordertor-11:mgmt:/var/home/cumulus# sudo sed -i -e 's/cumulus_mlag/cumulus_mlag -K 60/g' /etc/frr/daemons
    sudo systemctl start frr
    sudo sed -i -e 's/cumulus_mlag -K 60/cumulus_mlag/g' /etc/frr/daemons
    root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    bordertor-11# exit
    root@bordertor-11:mgmt:/var/home/cumulus# sudo kill -9 pidof zebra
    sudo kill -9 pidof bgpd
    sudo systemctl stop frr
    root@bordertor-11:mgmt:/var/home/cumulus# sudo sed -i -e 's/cumulus_mlag/cumulus_mlag -K 60/g' /etc/frr/daemons
    sudo systemctl start frr
    sudo sed -i -e 's/cumulus_mlag -K 60/cumulus_mlag/g' /etc/frr/daemons
    root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    bordertor-11# exit
    root@bordertor-11:mgmt:/var/home/cumulus# sudo kill -9 pidof zebra
    sudo kill -9 pidof bgpd
    sudo systemctl stop frr
    root@bordertor-11:mgmt:/var/home/cumulus# sudo sed -i -e 's/cumulus_mlag/cumulus_mlag -K 60/g' /etc/frr/daemons
    sudo systemctl start frr
    sudo sed -i -e 's/cumulus_mlag -K 60/cumulus_mlag/g' /etc/frr/daemons
    root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    bordertor-11# sh run

bordertor-11# exit
root@bordertor-11:mgmt:/var/home/cumulus# sudo kill -9 pidof zebra
sudo kill -9 pidof bgpd
sudo systemctl stop frr
root@bordertor-11:mgmt:/var/home/cumulus# sudo sed -i -e 's/cumulus_mlag/cumulus_mlag -K 60/g' /etc/frr/daemons
sudo systemctl start frr
sudo sed -i -e 's/cumulus_mlag -K 60/cumulus_mlag/g' /etc/frr/daemons
root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    bordertor-11# exit
    root@bordertor-11:mgmt:/var/home/cumulus# sudo kill -9 pidof zebra
    sudo kill -9 pidof bgpd
    sudo systemctl stop frr
    root@bordertor-11:mgmt:/var/home/cumulus# sudo sed -i -e 's/cumulus_mlag/cumulus_mlag -K 60/g' /etc/frr/daemons
    sudo systemctl start frr
    sudo sed -i -e 's/cumulus_mlag -K 60/cumulus_mlag/g' /etc/frr/daemons
    root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    bordertor-11# exit
    root@bordertor-11:mgmt:/var/home/cumulus# sudo sed -i -e 's/cumulus_mlag/cumulus_mlag -K 60/g' /etc/frr/daemons
    sudo systemctl start frr
    sudo sed -i -e 's/cumulus_mlag -K 60/cumulus_mlag/g' /etc/frr/daemons
    root@bordertor-11:mgmt:/var/home/cumulus# sudo sed -i -e 's/cumulus_mlag/cumulus_mlag -K 60/g' /etc/frr/daemons
    sudo systemctl start frr
    sudo sed -i -e 's/cumulus_mlag -K 60/cumulus_mlag/g' /etc/frr/daemons
    root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    bordertor-11#

Part 2>>>>>>>>>>>>>>>>>>Config based trigger>>>>>>>
bordertor-11# sh run
Building configuration...

Current configuration:
!
frr version 10.0.3
frr defaults datacenter
hostname bordertor-11
log file /var/log/frr/bgpd.log
log syslog informational
log timestamp precision 6
service integrated-vtysh-config
!
ip prefix-list LOCAL_HOST_VRF1 seq 1 permit 50.1.110.0/24
ip prefix-list LOCAL_HOST_VRF1 seq 2 deny any
ip prefix-list LOCAL_HOST_VRF2 seq 1 permit 60.1.110.0/24
ip prefix-list LOCAL_HOST_VRF2 seq 2 deny any
!
ipv6 prefix-list LOCAL_HOST_VRF1_v6 seq 1 permit 2050:1:1:110::/64
ipv6 prefix-list LOCAL_HOST_VRF1_v6 seq 2 deny any
ipv6 prefix-list LOCAL_HOST_VRF2_v6 seq 1 permit 2060:1:1:110::/64
ipv6 prefix-list LOCAL_HOST_VRF2_v6 seq 2 deny any
!
vrf mgmt
exit-vrf
!
vrf vrf1
vni 104001
exit-vrf
!
vrf vrf2
vni 104002
exit-vrf
!
router bgp 660000
bgp router-id 6.0.0.1
bgp graceful-restart
bgp bestpath as-path multipath-relax
bgp bestpath compare-routerid
neighbor TOR_LEAF_SPINE peer-group
neighbor TOR_LEAF_SPINE advertisement-interval 0
neighbor TOR_LEAF_SPINE timers 3 10
neighbor TOR_LEAF_SPINE timers connect 5
neighbor TOR_LEAF_SPINE capability extended-nexthop
neighbor swp1 interface peer-group TOR_LEAF_SPINE
neighbor swp1 remote-as external
neighbor swp1 advertisement-interval 0
neighbor swp1 timers 3 10
neighbor swp1 timers connect 5
neighbor swp2 interface peer-group TOR_LEAF_SPINE
neighbor swp2 remote-as external
neighbor swp2 advertisement-interval 0
neighbor swp2 timers 3 10
neighbor swp2 timers connect 5
!
address-family ipv4 unicast
redistribute connected route-map ALLOW_LOBR
neighbor TOR_LEAF_SPINE allowas-in
neighbor swp1 allowas-in
neighbor swp2 allowas-in
maximum-paths 16
maximum-paths ibgp 64
exit-address-family
!
address-family l2vpn evpn
neighbor TOR_LEAF_SPINE activate
advertise-all-vni
exit-address-family
exit
!
router bgp 660000 vrf vrf1
bgp router-id 144.1.1.2
no bgp network import-check
neighbor 144.1.1.1 remote-as external
neighbor 144.1.1.1 advertisement-interval 0
neighbor 144.1.1.1 timers 3 9
neighbor 144.1.1.1 timers connect 10
neighbor 155.1.1.1 remote-as external
neighbor 155.1.1.1 advertisement-interval 0
neighbor 155.1.1.1 timers 3 9
neighbor 155.1.1.1 timers connect 10
neighbor 2144:1:1:1::1 remote-as external
neighbor 2144:1:1:1::1 advertisement-interval 0
neighbor 2144:1:1:1::1 timers 3 9
neighbor 2144:1:1:1::1 timers connect 10
neighbor 2155:1:1:1::1 remote-as external
neighbor 2155:1:1:1::1 advertisement-interval 0
neighbor 2155:1:1:1::1 timers 3 9
neighbor 2155:1:1:1::1 timers connect 10
!
address-family ipv4 unicast
network 50.1.1.112/32
redistribute connected route-map HOST_ALLOW_1
maximum-paths 64
maximum-paths ibgp 64
exit-address-family
!
address-family ipv6 unicast
network 2050:1:1:1::112/128
redistribute connected route-map HOST_ALLOW_1_v6
neighbor 2144:1:1:1::1 activate
neighbor 2155:1:1:1::1 activate
maximum-paths 64
maximum-paths ibgp 64
exit-address-family
!
address-family l2vpn evpn
advertise ipv4 unicast
advertise ipv6 unicast
exit-address-family
exit
!
router bgp 660000 vrf vrf2
bgp router-id 144.1.1.6
no bgp network import-check
neighbor 144.1.1.5 remote-as external
neighbor 144.1.1.5 advertisement-interval 0
neighbor 144.1.1.5 timers 3 9
neighbor 144.1.1.5 timers connect 10
neighbor 155.1.1.5 remote-as external
neighbor 155.1.1.5 advertisement-interval 0
neighbor 155.1.1.5 timers 3 9
neighbor 155.1.1.5 timers connect 10
neighbor 2144:1:1:2::5 remote-as external
neighbor 2144:1:1:2::5 advertisement-interval 0
neighbor 2144:1:1:2::5 timers 3 9
neighbor 2144:1:1:2::5 timers connect 10
neighbor 2155:1:1:2::5 remote-as external
neighbor 2155:1:1:2::5 advertisement-interval 0
neighbor 2155:1:1:2::5 timers 3 9
neighbor 2155:1:1:2::5 timers connect 10
!
address-family ipv4 unicast
network 60.1.1.111/32
redistribute connected route-map HOST_ALLOW_2
maximum-paths 64
maximum-paths ibgp 64
exit-address-family
!
address-family ipv6 unicast
network 2060:1:1:1::111/128
redistribute connected route-map HOST_ALLOW_2_v6
neighbor 2144:1:1:2::5 activate
neighbor 2155:1:1:2::5 activate
maximum-paths 64
maximum-paths ibgp 64
exit-address-family
!
address-family l2vpn evpn
advertise ipv4 unicast
advertise ipv6 unicast
exit-address-family
exit
!
route-map ALLOW_LO permit 10
match interface lo
!
route-map ALLOW_LOBR permit 10
match interface lo
!
route-map ALLOW_LOBR permit 20
match interface br_l3vni
!
route-map HOST_ALLOW_1 permit 1
match ip address prefix-list LOCAL_HOST_VRF1
!
route-map HOST_ALLOW_1_v6 permit 1
match ipv6 address prefix-list LOCAL_HOST_VRF1_v6
!
route-map HOST_ALLOW_2 permit 1
match ip address prefix-list LOCAL_HOST_VRF2
!
route-map HOST_ALLOW_2_v6 permit 1
match ipv6 address prefix-list LOCAL_HOST_VRF2_v6
!
end

root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vrf_rd.txt
default 1
vrf1 2
vrf2 9
root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vni_rd.txt
1000114 3
1000112 4
1000113 5
1000111 6
root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    --- RESULT 3b: PASS --- (same as TEST 2)
    104001 L3 with vrf1 rd_id=2, orphan cleaned, all L2 VNIs unchanged
    ================================================================================

================================================================================
TEST Full VRF removal — stop FRR, remove BOTH the VRF definition block
("vrf vrf1 / vni 104001 / exit-vrf") AND "router bgp 660000 vrf vrf1"
from frr.conf. Start FRR. Then add both back, restart FRR.
Expect: vrf1 BGP instance is gone so vrf1 entry is removed from VRF
state file. 104001 demotes L3->L2 with rd_id=7.
On re-add, vrf1 gets rd_id=2 (lowest free bit).

--- Step 4a: Remove VRF block + BGP VRF instance from config, stop/start FRR ---
bordertor-11# exit
root@bordertor-11:mgmt:/var/home/cumulus# vi /etc/frr/frr.conf<<<<<remove vrf vrf1 from global and BGP
root@bordertor-11:mgmt:/var/home/cumulus# systemctl stop frr
root@bordertor-11:mgmt:/var/home/cumulus# systemctl start frr
root@bordertor-11:mgmt:/var/home/cumulus#
root@bordertor-11:mgmt:/var/home/cumulus#
root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vni_rd.txt
1000114 3
1000112 4
1000113 5
104001 7
1000111 6
root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vrf_rd.txt
default 1
vrf2 9<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<vrf1 rd gone as expected
root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 5
Number of L3 VNIs: 1
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 104001 L2 6.0.0.1:7 4640:104001 4640:104001 vrf1
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    --- RESULT 4a: PASS ---
    104001 L2 with rd_id=7. VRF file has no vrf1 (BGP VRF instance removed).
    vrf1 rd_id=2 bit is free. All L2 VNIs unchanged.
    ================================================================================

--- Step 4b: Add vrf1 back to config, restart FRR ---
bordertor-11# exit
root@bordertor-11:mgmt:/var/home/cumulus# vi /etc/frr/frr.conf<<<<<<<<<Add back vrf vrf1
root@bordertor-11:mgmt:/var/home/cumulus# systemctl restart frr
root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vrf_rd.txt
default 1
vrf2 9
vrf1 2<<<<<<<<<<<<Rd shows up again fro vrf 1
root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vni_rd.txt
1000114 3
1000112 4
1000113 5
1000111 6
root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1 <<<<<vrf1 with rd 2 reappears
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    vrf1 got rd_id=2 (bf_assign_index gave lowest free bit).
    104001 L3 with RD 144.1.1.2:2, orphan cleaned. All L2 VNIs unchanged.
    ================================================================================

bordertor-11# sh run
Building configuration...

Current configuration:
!
frr version 10.0.3
frr defaults datacenter
hostname bordertor-11
log file /var/log/frr/bgpd.log
log syslog informational
log timestamp precision 6
service integrated-vtysh-config
!
ip prefix-list LOCAL_HOST_VRF1 seq 1 permit 50.1.110.0/24
ip prefix-list LOCAL_HOST_VRF1 seq 2 deny any
ip prefix-list LOCAL_HOST_VRF2 seq 1 permit 60.1.110.0/24
ip prefix-list LOCAL_HOST_VRF2 seq 2 deny any
!
ipv6 prefix-list LOCAL_HOST_VRF1_v6 seq 1 permit 2050:1:1:110::/64
ipv6 prefix-list LOCAL_HOST_VRF1_v6 seq 2 deny any
ipv6 prefix-list LOCAL_HOST_VRF2_v6 seq 1 permit 2060:1:1:110::/64
ipv6 prefix-list LOCAL_HOST_VRF2_v6 seq 2 deny any
!
vrf mgmt
exit-vrf
!
vrf vrf1
vni 104001
exit-vrf
!
vrf vrf2
vni 104002
exit-vrf
!
router bgp 660000
bgp router-id 6.0.0.1
bgp graceful-restart
bgp bestpath as-path multipath-relax
bgp bestpath compare-routerid
neighbor TOR_LEAF_SPINE peer-group
neighbor TOR_LEAF_SPINE advertisement-interval 0
neighbor TOR_LEAF_SPINE timers 3 10
neighbor TOR_LEAF_SPINE timers connect 5
neighbor TOR_LEAF_SPINE capability extended-nexthop
neighbor swp1 interface peer-group TOR_LEAF_SPINE
neighbor swp1 remote-as external
neighbor swp1 advertisement-interval 0
neighbor swp1 timers 3 10
neighbor swp1 timers connect 5
neighbor swp2 interface peer-group TOR_LEAF_SPINE
neighbor swp2 remote-as external
neighbor swp2 advertisement-interval 0
neighbor swp2 timers 3 10
neighbor swp2 timers connect 5
!
address-family ipv4 unicast
redistribute connected route-map ALLOW_LOBR
neighbor TOR_LEAF_SPINE allowas-in
neighbor swp1 allowas-in
neighbor swp2 allowas-in
maximum-paths 16
maximum-paths ibgp 64
exit-address-family
!
address-family l2vpn evpn
neighbor TOR_LEAF_SPINE activate
advertise-all-vni
exit-address-family
exit
!
router bgp 660000 vrf vrf1
bgp router-id 144.1.1.2
no bgp network import-check
neighbor 144.1.1.1 remote-as external
neighbor 144.1.1.1 advertisement-interval 0
neighbor 144.1.1.1 timers 3 9
neighbor 144.1.1.1 timers connect 10
neighbor 155.1.1.1 remote-as external
neighbor 155.1.1.1 advertisement-interval 0
neighbor 155.1.1.1 timers 3 9
neighbor 155.1.1.1 timers connect 10
neighbor 2144:1:1:1::1 remote-as external
neighbor 2144:1:1:1::1 advertisement-interval 0
neighbor 2144:1:1:1::1 timers 3 9
neighbor 2144:1:1:1::1 timers connect 10
neighbor 2155:1:1:1::1 remote-as external
neighbor 2155:1:1:1::1 advertisement-interval 0
neighbor 2155:1:1:1::1 timers 3 9
neighbor 2155:1:1:1::1 timers connect 10
!
address-family ipv4 unicast
network 50.1.1.112/32
redistribute connected route-map HOST_ALLOW_1
maximum-paths 64
maximum-paths ibgp 64
exit-address-family
!
address-family ipv6 unicast
network 2050:1:1:1::112/128
redistribute connected route-map HOST_ALLOW_1_v6
neighbor 2144:1:1:1::1 activate
neighbor 2155:1:1:1::1 activate
maximum-paths 64
maximum-paths ibgp 64
exit-address-family
!
address-family l2vpn evpn
advertise ipv4 unicast
advertise ipv6 unicast
exit-address-family
exit
!
router bgp 660000 vrf vrf2
bgp router-id 144.1.1.6
no bgp network import-check
neighbor 144.1.1.5 remote-as external
neighbor 144.1.1.5 advertisement-interval 0
neighbor 144.1.1.5 timers 3 9
neighbor 144.1.1.5 timers connect 10
neighbor 155.1.1.5 remote-as external
neighbor 155.1.1.5 advertisement-interval 0
neighbor 155.1.1.5 timers 3 9
neighbor 155.1.1.5 timers connect 10
neighbor 2144:1:1:2::5 remote-as external
neighbor 2144:1:1:2::5 advertisement-interval 0
neighbor 2144:1:1:2::5 timers 3 9
neighbor 2144:1:1:2::5 timers connect 10
neighbor 2155:1:1:2::5 remote-as external
neighbor 2155:1:1:2::5 advertisement-interval 0
neighbor 2155:1:1:2::5 timers 3 9
neighbor 2155:1:1:2::5 timers connect 10
!
address-family ipv4 unicast
network 60.1.1.111/32
redistribute connected route-map HOST_ALLOW_2
maximum-paths 64
maximum-paths ibgp 64
exit-address-family
!
address-family ipv6 unicast
network 2060:1:1:1::111/128
redistribute connected route-map HOST_ALLOW_2_v6
neighbor 2144:1:1:2::5 activate
neighbor 2155:1:1:2::5 activate
maximum-paths 64
maximum-paths ibgp 64
exit-address-family
!
address-family l2vpn evpn
advertise ipv4 unicast
advertise ipv6 unicast
exit-address-family
exit
!
route-map ALLOW_LO permit 10
match interface lo
!
route-map ALLOW_LOBR permit 10
match interface lo
!
route-map ALLOW_LOBR permit 20
match interface br_l3vni
!
route-map HOST_ALLOW_1 permit 1
match ip address prefix-list LOCAL_HOST_VRF1
!
route-map HOST_ALLOW_1_v6 permit 1
match ipv6 address prefix-list LOCAL_HOST_VRF1_v6
!
route-map HOST_ALLOW_2 permit 1
match ip address prefix-list LOCAL_HOST_VRF2
!
route-map HOST_ALLOW_2_v6 permit 1
match ipv6 address prefix-list LOCAL_HOST_VRF2_v6
!
end

--- Second VNI check + state files ---
bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    bordertor-11# exit
    root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vni_rd.txt
    1000114 3
    1000112 4
    1000113 5
    1000111 6
    root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vrf_rd.txt
    default 1
    vrf2 9
    vrf1 2
    ================================================================================

================================================================================
TEST Runtime VTY removal — remove vrf1 via VTY commands at runtime
(no vni, no router bgp vrf vrf1), then restart FRR.
Expect: 104001 demotes L3->L2 with rd_id=7, vrf1 removed from VRF
file. After restart, config still has vrf1 (frr.conf not
changed via "write memory" yet), so vrf1 comes back with
rd_id=2.

root@bordertor-11:mgmt:/var/home/cumulus# exit
exit
cumulus@bordertor-11:mgmt:~$ sudo bash
root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

--- Step 6a: Runtime VTY removal of vrf1 ---
bordertor-11# conf
bordertor-11(config)# no router bgp 660000 vrf vrf1
% Please unconfigure l3vni 104001
bordertor-11(config)# vrf vrf1
bordertor-11(config-vrf)# no vni 104001
bordertor-11(config-vrf)# exit
bordertor-11(config)# no router bgp 660000 vrf vrf1
bordertor-11(config)# exit
bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 5
Number of L3 VNIs: 1
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 104001 L2 6.0.0.1:7 4640:104001 4640:104001 vrf1
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2 <<<<<<<vrf 1 disappeared
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    bordertor-11# exit
    root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vrf_rd.txt
    default 1
    vrf2 9<<<<<<<vrf 1 disappeared
    root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vni_rd.txt
    1000114 3
    1000112 4
    1000113 5
    1000111 6
    104001 7
  • Runtime VTY deletion works: 104001 L2 with rd_id=7, vrf1 removed from
    VRF file, 104001=7 added to VNI file. All L2 VNIs unchanged.
    ================================================================================

--- Step 6b: Restart FRR (frr.conf still has vrf1 — not saved in config file) ---
root@bordertor-11:mgmt:/var/home/cumulus# systemctl resatrt frr
Unknown command verb resatrt.
root@bordertor-11:mgmt:/var/home/cumulus# systemctl restart frr
root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vrf_rd.txt
default 1
vrf2 9
vrf1 2<<<<<<<<as we didn’t do write memory, it rightly reappeared
root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vni_rd.txt
1000114 3
1000112 4
1000113 5
1000111 6

--- Verify: "sh run" shows vrf1 restored from frr.conf (was not saved) ---
root@bordertor-11:mgmt:/var/home/cumulus# sh run
sh: 0: cannot open run: No such file
root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# sh run
Building configuration...

Current configuration:
!
frr version 10.0.3
frr defaults datacenter
hostname bordertor-11
log file /var/log/frr/bgpd.log
log syslog informational
log timestamp precision 6
service integrated-vtysh-config
!
ip prefix-list LOCAL_HOST_VRF1 seq 1 permit 50.1.110.0/24
ip prefix-list LOCAL_HOST_VRF1 seq 2 deny any
ip prefix-list LOCAL_HOST_VRF2 seq 1 permit 60.1.110.0/24
ip prefix-list LOCAL_HOST_VRF2 seq 2 deny any
!
ipv6 prefix-list LOCAL_HOST_VRF1_v6 seq 1 permit 2050:1:1:110::/64
ipv6 prefix-list LOCAL_HOST_VRF1_v6 seq 2 deny any
ipv6 prefix-list LOCAL_HOST_VRF2_v6 seq 1 permit 2060:1:1:110::/64
ipv6 prefix-list LOCAL_HOST_VRF2_v6 seq 2 deny any
!
vrf mgmt
exit-vrf
!
vrf vrf1
vni 104001
exit-vrf
!
vrf vrf2
vni 104002
exit-vrf
!
router bgp 660000
bgp router-id 6.0.0.1
bgp graceful-restart
bgp bestpath as-path multipath-relax
bgp bestpath compare-routerid
neighbor TOR_LEAF_SPINE peer-group
neighbor TOR_LEAF_SPINE advertisement-interval 0
neighbor TOR_LEAF_SPINE timers 3 10
neighbor TOR_LEAF_SPINE timers connect 5
neighbor TOR_LEAF_SPINE capability extended-nexthop
neighbor swp1 interface peer-group TOR_LEAF_SPINE
neighbor swp1 remote-as external
neighbor swp1 advertisement-interval 0
neighbor swp1 timers 3 10
neighbor swp1 timers connect 5
neighbor swp2 interface peer-group TOR_LEAF_SPINE
neighbor swp2 remote-as external
neighbor swp2 advertisement-interval 0
neighbor swp2 timers 3 10
neighbor swp2 timers connect 5
!
address-family ipv4 unicast
redistribute connected route-map ALLOW_LOBR
neighbor TOR_LEAF_SPINE allowas-in
neighbor swp1 allowas-in
neighbor swp2 allowas-in
maximum-paths 16
maximum-paths ibgp 64
exit-address-family
!
address-family l2vpn evpn
neighbor TOR_LEAF_SPINE activate
advertise-all-vni
exit-address-family
exit
!
router bgp 660000 vrf vrf1
bgp router-id 144.1.1.2
no bgp network import-check
neighbor 144.1.1.1 remote-as external
neighbor 144.1.1.1 advertisement-interval 0
neighbor 144.1.1.1 timers 3 9
neighbor 144.1.1.1 timers connect 10
neighbor 155.1.1.1 remote-as external
neighbor 155.1.1.1 advertisement-interval 0
neighbor 155.1.1.1 timers 3 9
neighbor 155.1.1.1 timers connect 10
neighbor 2144:1:1:1::1 remote-as external
neighbor 2144:1:1:1::1 advertisement-interval 0
neighbor 2144:1:1:1::1 timers 3 9
neighbor 2144:1:1:1::1 timers connect 10
neighbor 2155:1:1:1::1 remote-as external
neighbor 2155:1:1:1::1 advertisement-interval 0
neighbor 2155:1:1:1::1 timers 3 9
neighbor 2155:1:1:1::1 timers connect 10
!
address-family ipv4 unicast
network 50.1.1.112/32
redistribute connected route-map HOST_ALLOW_1
maximum-paths 64
maximum-paths ibgp 64
exit-address-family
!
address-family ipv6 unicast
network 2050:1:1:1::112/128
redistribute connected route-map HOST_ALLOW_1_v6
neighbor 2144:1:1:1::1 activate
neighbor 2155:1:1:1::1 activate
maximum-paths 64
maximum-paths ibgp 64
exit-address-family
!
address-family l2vpn evpn
advertise ipv4 unicast
advertise ipv6 unicast
exit-address-family
exit
!
router bgp 660000 vrf vrf2
bgp router-id 144.1.1.6
no bgp network import-check
neighbor 144.1.1.5 remote-as external
neighbor 144.1.1.5 advertisement-interval 0
neighbor 144.1.1.5 timers 3 9
neighbor 144.1.1.5 timers connect 10
neighbor 155.1.1.5 remote-as external
neighbor 155.1.1.5 advertisement-interval 0
neighbor 155.1.1.5 timers 3 9
neighbor 155.1.1.5 timers connect 10
neighbor 2144:1:1:2::5 remote-as external
neighbor 2144:1:1:2::5 advertisement-interval 0
neighbor 2144:1:1:2::5 timers 3 9
neighbor 2144:1:1:2::5 timers connect 10
neighbor 2155:1:1:2::5 remote-as external
neighbor 2155:1:1:2::5 advertisement-interval 0
neighbor 2155:1:1:2::5 timers 3 9
neighbor 2155:1:1:2::5 timers connect 10
!
address-family ipv4 unicast
network 60.1.1.111/32
redistribute connected route-map HOST_ALLOW_2
maximum-paths 64
maximum-paths ibgp 64
exit-address-family
!
address-family ipv6 unicast
network 2060:1:1:1::111/128
redistribute connected route-map HOST_ALLOW_2_v6
neighbor 2144:1:1:2::5 activate
neighbor 2155:1:1:2::5 activate
maximum-paths 64
maximum-paths ibgp 64
exit-address-family
!
address-family l2vpn evpn
advertise ipv4 unicast
advertise ipv6 unicast
exit-address-family
exit
!
route-map ALLOW_LO permit 10
match interface lo
!
route-map ALLOW_LOBR permit 10
match interface lo
!
route-map ALLOW_LOBR permit 20
match interface br_l3vni
!
route-map HOST_ALLOW_1 permit 1
match ip address prefix-list LOCAL_HOST_VRF1
!
route-map HOST_ALLOW_1_v6 permit 1
match ipv6 address prefix-list LOCAL_HOST_VRF1_v6
!
route-map HOST_ALLOW_2 permit 1
match ip address prefix-list LOCAL_HOST_VRF2
!
route-map HOST_ALLOW_2_v6 permit 1
match ipv6 address prefix-list LOCAL_HOST_VRF2_v6
!
end
bordertor-11# conf
bordertor-11(config)# exit
bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1

  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1

  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2

  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2

  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1

  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    bordertor-11# exit
    root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vni_rd.txt
    1000114 3
    1000112 4
    1000113 5
    1000111 6
    root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vrf_rd.txt
    default 1
    vrf2 9
    vrf1 2

    vrf1 restored from frr.conf with rd_id=2. 104001 back to L3 with
    RD 144.1.1.2:2. Orphan cleaned. All L2 VNIs unchanged.
    ================================================================================

================================================================================
TEST : VTY removal + "write memory" + restart + VTY re-add.
This tests the full operational workflow:
1) Remove vrf1 via VTY at runtime
2) "write memory" to persist removal to frr.conf
3) Restart FRR (config has no vrf1)
4) Re-add vrf1 via VTY at runtime
Expect: Step 3 keeps 104001 as L2 (rd_id=7) since config lacks vrf1.
Step 4 promotes 104001 back to L3 with rd_id=2 (new assignment).

--- Step 7a: Runtime VTY removal + write memory ---
root@bordertor-11:mgmt:/var/home/cumulus# exit
exit
cumulus@bordertor-11:mgmt:~$ sudo bash
root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# conf
bordertor-11(config)# vrf vrf1
bordertor-11(config-vrf)# no vni 104001
bordertor-11(config-vrf)# exit
bordertor-11(config)# no router bgp 660000 vrf vrf1
bordertor-11(config)# exit
bordertor-11# write memory
Note: this version of vtysh never writes vtysh.conf
Building Configuration...
Integrated configuration saved to /etc/frr/frr.conf
[OK]
bordertor-11# exit
root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 5
Number of L3 VNIs: 1
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1

  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1

  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2

  • 104001 L2 6.0.0.1:7 4640:104001 4640:104001 vrf1

  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2

  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
    bordertor-11# exit
    root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vrf_rd.txt
    default 1
    vrf2 9<<<<<vrf 1 disappeared
    root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vni_rd.txt
    1000114 3
    1000112 4
    1000113 5
    1000111 6
    104001 7

    VTY removal + write memory: 104001 L2 with rd_id=7, vrf1 removed from
    VRF file, config saved to frr.conf without vrf1.
    ================================================================================

Restart FRR (frr.conf has no vrf1, saved by write memory) ---
root@bordertor-11:mgmt:/var/home/cumulus# systemctl restart frr
root@bordertor-11:mgmt:/var/home/cumulus#
root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vrf_rd.txt
default 1
vrf2 9<<<<<vrf 1 doesn’t show up anymore
root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vni_rd.txt
1000114 3
1000112 4
1000113 5
1000111 6
104001 7
root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 5
Number of L3 VNIs: 1
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 104001 L2 6.0.0.1:7 4640:104001 4640:104001 vrf1
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
  • Restart with saved config: 104001 stays L2 with rd_id=7 (persistent).
    No vrf1 in VRF file (correctly absent). VNI 104001=7 persisted.
    ================================================================================

--- Step 7c: Re-add vrf1 + full BGP config via VTY at runtime ---
bordertor-11# conf
bordertor-11(config)# vrf vrf1
bordertor-11(config-vrf)# vni 104001
bordertor-11(config-vrf)# exit-vrf
bordertor-11(config)# !
bordertor-11(config)# router bgp 660000 vrf vrf1
bordertor-11(config-router)# bgp router-id 144.1.1.2
bordertor-11(config-router)# no bgp network import-check
bordertor-11(config-router)# neighbor 144.1.1.1 remote-as external
bordertor-11(config-router)# neighbor 144.1.1.1 advertisement-interval 0
bordertor-11(config-router)# neighbor 144.1.1.1 timers 3 9
bordertor-11(config-router)# neighbor 144.1.1.1 timers connect 10
bordertor-11(config-router)# neighbor 155.1.1.1 remote-as external
bordertor-11(config-router)# neighbor 155.1.1.1 advertisement-interval 0
bordertor-11(config-router)# neighbor 155.1.1.1 timers 3 9
bordertor-11(config-router)# neighbor 155.1.1.1 timers connect 10
bordertor-11(config-router)# neighbor 2144:1:1:1::1 remote-as external
bordertor-11(config-router)# neighbor 2144:1:1:1::1 advertisement-interval 0
bordertor-11(config-router)# neighbor 2144:1:1:1::1 timers 3 9
bordertor-11(config-router)# neighbor 2144:1:1:1::1 timers connect 10
bordertor-11(config-router)# neighbor 2155:1:1:1::1 remote-as external
bordertor-11(config-router)# neighbor 2155:1:1:1::1 advertisement-interval 0
bordertor-11(config-router)# neighbor 2155:1:1:1::1 timers 3 9
bordertor-11(config-router)# neighbor 2155:1:1:1::1 timers connect 10
bordertor-11(config-router)# !
bordertor-11(config-router)# address-family ipv4 unicast
bordertor-11(config-router-af)# network 50.1.1.112/32
bordertor-11(config-router-af)# redistribute connected route-map HOST_ALLOW_1
bordertor-11(config-router-af)# maximum-paths 64
bordertor-11(config-router-af)# maximum-paths ibgp 64
bordertor-11(config-router-af)# exit-address-family
bordertor-11(config-router)# !
bordertor-11(config-router)# address-family ipv6 unicast
bordertor-11(config-router-af)# network 2050:1:1:1::112/128
bordertor-11(config-router-af)# redistribute connected route-map HOST_ALLOW_1_v6
bordertor-11(config-router-af)# neighbor 2144:1:1:1::1 activate
bordertor-11(config-router-af)# neighbor 2155:1:1:1::1 activate
bordertor-11(config-router-af)# maximum-paths 64
bordertor-11(config-router-af)# maximum-paths ibgp 64
bordertor-11(config-router-af)# exit-address-family
bordertor-11(config-router)# !
bordertor-11(config-router)# address-family l2vpn evpn
bordertor-11(config-router-af)# advertise ipv4 unicast
bordertor-11(config-router-af)# advertise ipv6 unicast
bordertor-11(config-router-af)# exit-address-family
bordertor-11(config-router)# exit
bordertor-11(config)# !
bordertor-11(config)# exit
bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1

  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1

  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2

  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2

  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2

  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1 <<<vrf1 came back
    bordertor-11# exit
    root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vni_rd.txt
    1000114 3
    1000112 4
    1000113 5
    1000111 6
    root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vrf_rd.txt
    default 1
    vrf2 9
    vrf1 2

    VTY re-add: vrf1 got rd_id=2 (bf_assign_index gave lowest free bit).
    104001 promoted L2->L3 with RD 144.1.1.2:2. Orphan "104001 7" cleaned.
    All L2 VNIs unchanged.
    ================================================================================

================================================================================
TEST : Final stability — "write memory" to persist re-added vrf1,
restart FRR, verify RDs are identical.
Expect: All RDs unchanged after restart. Final steady state.

root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

bordertor-11# write memory
Note: this version of vtysh never writes vtysh.conf
Building Configuration...
Integrated configuration saved to /etc/frr/frr.conf
[OK]
bordertor-11# vtysh
% Unknown command: vtysh
bordertor-11# exit
root@bordertor-11:mgmt:/var/home/cumulus# systemctl restart frr
root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vrf_rd.txt
default 1
vrf2 9
vrf1 2
root@bordertor-11:mgmt:/var/home/cumulus# cat /var/lib/frr/.bgp_vni_rd.txt
1000114 3
1000112 4
1000113 5
1000111 6
root@bordertor-11:mgmt:/var/home/cumulus# vtysh

Hello, this is FRRouting (version 10.0.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

--- First check after restart ---
bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1
  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1
  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2
  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2
  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2
  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1

--- Second check (idempotency verification) ---
bordertor-11# show bgp l2vpn evpn vni
Advertise Gateway Macip: Disabled
Advertise SVI Macip: Disabled
Advertise All VNI flag: Enabled
BUM flooding: Head-end replication
VXLAN flooding: Enabled
Number of L2 VNIs: 4
Number of L3 VNIs: 2
Flags: * - Kernel
VNI Type RD Import RT Export RT MAC-VRF Site-of-Origin Tenant VRF

  • 1000114 L2 6.0.0.1:3 4640:1000114 4640:1000114 vrf1

  • 1000112 L2 6.0.0.1:4 4640:1000112 4640:1000112 vrf1

  • 1000113 L2 6.0.0.1:5 4640:1000113 4640:1000113 vrf2

  • 1000111 L2 6.0.0.1:6 4640:1000111 4640:1000111 vrf2

  • 104002 L3 144.1.1.6:9 4640:104002 4640:104002 vrf2

  • 104001 L3 144.1.1.2:2 4640:104001 4640:104001 vrf1
    bordertor-11#

    Both checks identical. Final steady state achieved.
    All RDs stable: vrf1=2, vrf2=9, 1000114=3, 1000112=4, 1000113=5, 1000111=6
    104001 L3 inheriting vrf1 rd_id=2, 104002 L3 inheriting vrf2 rd_id=9

souroy@souroy-mlt ~ %

@frrbot frrbot bot added bgp bugfix tests Topotests, make check, etc labels Mar 19, 2026
@soumyar-roy soumyar-roy marked this pull request as draft March 19, 2026 23:38
@greptile-apps
Copy link

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR implements RD (Route Distinguisher) caching for EVPN VRFs and L2VNIs so that rd_id assignments survive bgpd restarts. Without this, auto-derived Type-1 RDs are reassigned non-deterministically after a restart, causing EVPN peers to treat re-advertised routes as new rather than updates — breaking graceful/warm restart.

The implementation writes two state files (/var/lib/frr/.bgp_vrf_rd.txt and .bgp_vni_rd.txt) mapping VRF name/VNI to their rd_id, loads them at startup into in-memory hashes, and performs atomic rewrites (write-to-tmp + rename) on deletion. An orphan-cleanup hook removes stale entries after config load. The approach is well-structured, but two correctness issues and one style concern need addressing before merge:

  • Memory leak on exit: hash_clean_and_free is called with a NULL free-callback for both RD-state hashes in bgp_exit, leaving all remaining vrf_rd_state_entry (and their heap-allocated name fields) and vni_rd_state_entry objects unreleased. A hash_vrf_rd_state_entry_free helper was defined in bgpd.c for exactly this purpose but was never wired up.
  • Missing mkdir for VNI state path: bgp_evpn_log_vni_rd_to_statefile opens the VNI file for append without first ensuring the parent directory exists, unlike the VRF counterpart. On a fresh system this silently drops all VNI RD persistence.
  • O(n) file rewrites on bulk deletion: Each individual bgp_evpn_free call triggers a full state-file rewrite, producing O(n) rewrites when many VNIs are freed at once (e.g., disabling advertise-all-vni).

Confidence Score: 2/5

  • Not safe to merge as-is — two memory-management bugs need fixing before merge.
  • The feature logic is sound and well-structured, but there is a confirmed memory leak (NULL free-callback to hash_clean_and_free while a dedicated free helper sits unused), and a silent failure path for VNI state persistence on systems where the state directory does not yet exist. Both issues are straightforward to fix but represent correctness/reliability regressions relative to the status quo.
  • bgpd/bgp_main.c (NULL free_func leak), bgpd/bgp_evpn.c (missing mkdir + O(n) rewrites), bgpd/bgpd.c (unused hash_vrf_rd_state_entry_free)

Important Files Changed

Filename Overview
bgpd/bgpd.c Core RD-caching logic: adds VRF/VNI state loading, persisting, and orphan cleanup. Key issue: hash_vrf_rd_state_entry_free is defined but never passed to hash_clean_and_free, causing memory leaks on exit.
bgpd/bgp_evpn.c VNI RD-caching logic: adds bgp_evpn_assign_rd_id_for_vni and rewrite functions. Missing mkdir before VNI state-file creation, and O(n) rewrites on bulk VNI deletion.
bgpd/bgp_main.c Cleanup of RD-state hashes in bgp_exit passes NULL free_func, leaking all remaining vrf_rd_state_entry and vni_rd_state_entry objects on shutdown.
bgpd/bgpd.h Adds vrf_rd_state and vni_rd_state hash pointers to bgp_master, plus new struct definitions and state file name macros. Clean and straightforward.
bgpd/bgp_evpn.h Adds extern declaration for bgp_evpn_rewrite_vni_rd_statefile. Minimal and correct.
tests/topotests/bgp_evpn_gr/test_bgp_evpn_gr.py Adds three new topotests for RD state file creation, persistence across restart, and orphan cleanup. Tests use hardcoded /var/lib/frr paths and leave a PE1_minimal.conf artifact, but logic is sound.

Sequence Diagram

sequenceDiagram
    participant FS as State Files<br/>(disk)
    participant BM as bgp_master<br/>(vrf/vni_rd_state hashes)
    participant RD as rd_idspace<br/>(bitfield)
    participant BGP as bgp instance<br/>/ bgpevpn

    Note over FS,BGP: Startup (bgp_master_init)
    FS->>BM: bgp_load_vrf_rd_statefile()<br/>load name→rd_id entries
    FS->>BM: bgp_load_vni_rd_statefile()<br/>load vni→rd_id entries
    BM->>RD: bf_set_bit() for each loaded rd_id

    Note over FS,BGP: VRF/VNI configured (bgp_create / bgp_evpn_new)
    BGP->>BM: lookup cached rd_id
    alt cache hit
        BM-->>BGP: reuse rd_id, mark used=true
        BM->>RD: bf_set_bit() if not already set
    else cache miss
        BM->>RD: bf_assign_index() allocate new rd_id
        BGP->>FS: append new entry to state file
        BGP->>BM: insert new entry into hash
    end

    Note over FS,BGP: bgp_config_end hook (orphan cleanup)
    BM->>BM: collect entries where used==false
    BM->>RD: bf_release_index() for each orphan
    BM->>FS: rewrite state files (orphans removed)

    Note over FS,BGP: VRF/VNI deleted (bgp_free / bgp_evpn_free)
    BGP->>BM: hash_release + free entry
    BM->>FS: rewrite state file (entry removed)
    BGP->>RD: bf_release_index()

    Note over FS,BGP: Shutdown (bgp_exit)
    BM->>BM: hash_clean_and_free(vrf_rd_state, NULL) ⚠️ leaks entries
    BM->>BM: hash_clean_and_free(vni_rd_state, NULL) ⚠️ leaks entries
    RD->>RD: bf_free(rd_idspace)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: bgpd/bgp_main.c
Line: 245-248

Comment:
**Memory leak: NULL free callback leaks hash entry data**

`hash_clean_and_free` only frees the hash bucket structs themselves when the `free_func` argument is `NULL` — it does **not** call any destructor on `hb->data`. This means:
- Every `vrf_rd_state_entry` (including its heap-allocated `name` field) left in `vrf_rd_state` at exit is leaked.
- Every `vni_rd_state_entry` left in `vni_rd_state` at exit is leaked.

This is evidenced by `hash_vrf_rd_state_entry_free` being defined in `bgpd.c` but **never called** — it was clearly written to serve as this callback but was accidentally omitted. The correct calls should be:

```c
if (bm->vrf_rd_state)
    hash_clean_and_free(&bm->vrf_rd_state,
                        (void (*)(void *))hash_vrf_rd_state_entry_free);
if (bm->vni_rd_state)
    hash_clean_and_free(&bm->vni_rd_state, XFREE_BGP_VNI_STATE_OR_SIMILAR);
```

For `vni_rd_state`, since `vni_rd_state_entry` has no nested heap pointers, you only need to `XFREE(MTYPE_BGP, entry)` on each element. Both entry types will currently be leaked on every clean shutdown.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: bgpd/bgp_evpn.c
Line: 136-150

Comment:
**Missing directory creation before VNI state-file append**

`bgp_evpn_log_vni_rd_to_statefile` opens the VNI state file for append without first ensuring the parent directory exists. In contrast, the analogous VRF function `bgp_log_vrf_rd_to_statefile` in `bgpd.c` explicitly calls `(void)mkdir(frr_libstatedir, 0755)` before opening the file.

If bgpd starts on a system where `frr_libstatedir` (`/var/lib/frr`) does not yet exist (e.g., a fresh install before the VRF state path creates it), the `fopen(..., "a")` call will fail silently (only an error is logged), and the VNI RD mapping will never be persisted — defeating the entire purpose of the feature for L2VNIs.

The `mkdir` call should be added here, mirroring the VRF implementation:

```c
static void bgp_evpn_log_vni_rd_to_statefile(uint32_t vni, uint16_t rd_id)
{
    char path[512];
    FILE *fp;

    /* Ensure state dir exists (best-effort) */
    (void)mkdir(frr_libstatedir, 0755);

    snprintf(path, sizeof(path), "%s/%s", frr_libstatedir, BGP_EVPN_VNI_RD_STATEFILE);
    ...
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: bgpd/bgp_evpn.c
Line: 6858-6868

Comment:
**O(n) full state-file rewrites during bulk VNI deletion**

`bgp_evpn_free` calls `bgp_evpn_rewrite_vni_rd_statefile()` (a full rewrite) for every individual VNI deletion. When many VNIs are freed in one pass — for example, when `advertise-all-vni` is disabled or a BGP instance is torn down via `bgp_evpn_cleanup` — this results in O(n) complete file rewrites for n VNIs, each iterating the shrinking hash and doing a rename-over.

Consider deferring the rewrite: remove the `hash_release`/`XFREE` step immediately (mark the entry as "pending removal" or simply batch the deletes), then do a single rewrite at the end of the batch operation, or use the same `bm->terminating` guard pattern but extended to a "bulk-deletion-in-progress" flag. At minimum, the file rewrite could be made conditional on whether the hash actually changed since the last write.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: bgpd/bgpd.c
Line: 303-310

Comment:
**Defined but effectively unused `hash_vrf_rd_state_entry_free`**

This function is defined but is not used anywhere — it is not passed to `hash_clean_and_free` in `bgp_exit` (which instead passes `NULL`). It is also not used during orphan cleanup in `bgp_rd_state_cleanup_after_config` (where the free is inlined manually).

This is exactly the helper that should be passed as the `free_func` argument to `hash_clean_and_free` in `bgp_main.c`. Leaving it unused is a red flag and results in the memory leak described in the `bgp_main.c` comment above. Consider either using it consistently or removing it and documenting the expected cleanup path.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "tests: add topotests..."

bgpd/bgp_main.c Outdated
Comment on lines +245 to +248
if (bm->vrf_rd_state)
hash_clean_and_free(&bm->vrf_rd_state, NULL);
if (bm->vni_rd_state)
hash_clean_and_free(&bm->vni_rd_state, NULL);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Memory leak: NULL free callback leaks hash entry data

hash_clean_and_free only frees the hash bucket structs themselves when the free_func argument is NULL — it does not call any destructor on hb->data. This means:

  • Every vrf_rd_state_entry (including its heap-allocated name field) left in vrf_rd_state at exit is leaked.
  • Every vni_rd_state_entry left in vni_rd_state at exit is leaked.

This is evidenced by hash_vrf_rd_state_entry_free being defined in bgpd.c but never called — it was clearly written to serve as this callback but was accidentally omitted. The correct calls should be:

if (bm->vrf_rd_state)
    hash_clean_and_free(&bm->vrf_rd_state,
                        (void (*)(void *))hash_vrf_rd_state_entry_free);
if (bm->vni_rd_state)
    hash_clean_and_free(&bm->vni_rd_state, XFREE_BGP_VNI_STATE_OR_SIMILAR);

For vni_rd_state, since vni_rd_state_entry has no nested heap pointers, you only need to XFREE(MTYPE_BGP, entry) on each element. Both entry types will currently be leaked on every clean shutdown.

Prompt To Fix With AI
This is a comment left during a code review.
Path: bgpd/bgp_main.c
Line: 245-248

Comment:
**Memory leak: NULL free callback leaks hash entry data**

`hash_clean_and_free` only frees the hash bucket structs themselves when the `free_func` argument is `NULL` — it does **not** call any destructor on `hb->data`. This means:
- Every `vrf_rd_state_entry` (including its heap-allocated `name` field) left in `vrf_rd_state` at exit is leaked.
- Every `vni_rd_state_entry` left in `vni_rd_state` at exit is leaked.

This is evidenced by `hash_vrf_rd_state_entry_free` being defined in `bgpd.c` but **never called** — it was clearly written to serve as this callback but was accidentally omitted. The correct calls should be:

```c
if (bm->vrf_rd_state)
    hash_clean_and_free(&bm->vrf_rd_state,
                        (void (*)(void *))hash_vrf_rd_state_entry_free);
if (bm->vni_rd_state)
    hash_clean_and_free(&bm->vni_rd_state, XFREE_BGP_VNI_STATE_OR_SIMILAR);
```

For `vni_rd_state`, since `vni_rd_state_entry` has no nested heap pointers, you only need to `XFREE(MTYPE_BGP, entry)` on each element. Both entry types will currently be leaked on every clean shutdown.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +136 to +150
}
}

/*
* Append a single line with L2 VNI and RD-ID info to a file in frr_libstatedir.
* Line format: "<vni> <rd_id>\n".
*/
static void bgp_evpn_log_vni_rd_to_statefile(uint32_t vni, uint16_t rd_id)
{
char path[512];
FILE *fp;

snprintf(path, sizeof(path), "%s/%s", frr_libstatedir, BGP_EVPN_VNI_RD_STATEFILE);

fp = fopen(path, "a");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing directory creation before VNI state-file append

bgp_evpn_log_vni_rd_to_statefile opens the VNI state file for append without first ensuring the parent directory exists. In contrast, the analogous VRF function bgp_log_vrf_rd_to_statefile in bgpd.c explicitly calls (void)mkdir(frr_libstatedir, 0755) before opening the file.

If bgpd starts on a system where frr_libstatedir (/var/lib/frr) does not yet exist (e.g., a fresh install before the VRF state path creates it), the fopen(..., "a") call will fail silently (only an error is logged), and the VNI RD mapping will never be persisted — defeating the entire purpose of the feature for L2VNIs.

The mkdir call should be added here, mirroring the VRF implementation:

static void bgp_evpn_log_vni_rd_to_statefile(uint32_t vni, uint16_t rd_id)
{
    char path[512];
    FILE *fp;

    /* Ensure state dir exists (best-effort) */
    (void)mkdir(frr_libstatedir, 0755);

    snprintf(path, sizeof(path), "%s/%s", frr_libstatedir, BGP_EVPN_VNI_RD_STATEFILE);
    ...
Prompt To Fix With AI
This is a comment left during a code review.
Path: bgpd/bgp_evpn.c
Line: 136-150

Comment:
**Missing directory creation before VNI state-file append**

`bgp_evpn_log_vni_rd_to_statefile` opens the VNI state file for append without first ensuring the parent directory exists. In contrast, the analogous VRF function `bgp_log_vrf_rd_to_statefile` in `bgpd.c` explicitly calls `(void)mkdir(frr_libstatedir, 0755)` before opening the file.

If bgpd starts on a system where `frr_libstatedir` (`/var/lib/frr`) does not yet exist (e.g., a fresh install before the VRF state path creates it), the `fopen(..., "a")` call will fail silently (only an error is logged), and the VNI RD mapping will never be persisted — defeating the entire purpose of the feature for L2VNIs.

The `mkdir` call should be added here, mirroring the VRF implementation:

```c
static void bgp_evpn_log_vni_rd_to_statefile(uint32_t vni, uint16_t rd_id)
{
    char path[512];
    FILE *fp;

    /* Ensure state dir exists (best-effort) */
    (void)mkdir(frr_libstatedir, 0755);

    snprintf(path, sizeof(path), "%s/%s", frr_libstatedir, BGP_EVPN_VNI_RD_STATEFILE);
    ...
```

How can I resolve this? If you propose a fix, please make it concise.

bgpd/bgp_evpn.c Outdated
Comment on lines +6858 to +6868
/* Remove this VNI's entry from cache; rewrite file only on normal deletion */
if (bm->vni_rd_state) {
struct vni_rd_state_entry key = { .vni = vpn->vni, .rd_id = 0 };
struct vni_rd_state_entry *found = hash_lookup(bm->vni_rd_state, &key);
if (found) {
hash_release(bm->vni_rd_state, found);
XFREE(MTYPE_BGP, found);
}
}
if (!bm->terminating)
bgp_evpn_rewrite_vni_rd_statefile();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 O(n) full state-file rewrites during bulk VNI deletion

bgp_evpn_free calls bgp_evpn_rewrite_vni_rd_statefile() (a full rewrite) for every individual VNI deletion. When many VNIs are freed in one pass — for example, when advertise-all-vni is disabled or a BGP instance is torn down via bgp_evpn_cleanup — this results in O(n) complete file rewrites for n VNIs, each iterating the shrinking hash and doing a rename-over.

Consider deferring the rewrite: remove the hash_release/XFREE step immediately (mark the entry as "pending removal" or simply batch the deletes), then do a single rewrite at the end of the batch operation, or use the same bm->terminating guard pattern but extended to a "bulk-deletion-in-progress" flag. At minimum, the file rewrite could be made conditional on whether the hash actually changed since the last write.

Prompt To Fix With AI
This is a comment left during a code review.
Path: bgpd/bgp_evpn.c
Line: 6858-6868

Comment:
**O(n) full state-file rewrites during bulk VNI deletion**

`bgp_evpn_free` calls `bgp_evpn_rewrite_vni_rd_statefile()` (a full rewrite) for every individual VNI deletion. When many VNIs are freed in one pass — for example, when `advertise-all-vni` is disabled or a BGP instance is torn down via `bgp_evpn_cleanup` — this results in O(n) complete file rewrites for n VNIs, each iterating the shrinking hash and doing a rename-over.

Consider deferring the rewrite: remove the `hash_release`/`XFREE` step immediately (mark the entry as "pending removal" or simply batch the deletes), then do a single rewrite at the end of the batch operation, or use the same `bm->terminating` guard pattern but extended to a "bulk-deletion-in-progress" flag. At minimum, the file rewrite could be made conditional on whether the hash actually changed since the last write.

How can I resolve this? If you propose a fix, please make it concise.

bgpd/bgpd.c Outdated
Comment on lines +303 to +310
static void hash_vrf_rd_state_entry_free(struct vrf_rd_state_entry *entry)
{
if (!entry)
return;
if (entry->name)
XFREE(MTYPE_BGP_NAME, entry->name);
XFREE(MTYPE_BGP, entry);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Defined but effectively unused hash_vrf_rd_state_entry_free

This function is defined but is not used anywhere — it is not passed to hash_clean_and_free in bgp_exit (which instead passes NULL). It is also not used during orphan cleanup in bgp_rd_state_cleanup_after_config (where the free is inlined manually).

This is exactly the helper that should be passed as the free_func argument to hash_clean_and_free in bgp_main.c. Leaving it unused is a red flag and results in the memory leak described in the bgp_main.c comment above. Consider either using it consistently or removing it and documenting the expected cleanup path.

Prompt To Fix With AI
This is a comment left during a code review.
Path: bgpd/bgpd.c
Line: 303-310

Comment:
**Defined but effectively unused `hash_vrf_rd_state_entry_free`**

This function is defined but is not used anywhere — it is not passed to `hash_clean_and_free` in `bgp_exit` (which instead passes `NULL`). It is also not used during orphan cleanup in `bgp_rd_state_cleanup_after_config` (where the free is inlined manually).

This is exactly the helper that should be passed as the `free_func` argument to `hash_clean_and_free` in `bgp_main.c`. Leaving it unused is a red flag and results in the memory leak described in the `bgp_main.c` comment above. Consider either using it consistently or removing it and documenting the expected cleanup path.

How can I resolve this? If you propose a fix, please make it concise.

@ton31337
Copy link
Member

ton31337 commented Mar 20, 2026

What if we have a knob to specify the type for RD encoding (e.g.: evpn rd-format type (0-2))? Yes, that has a limitation too for ASN, but maybe that would be enough in most of the cases?

Or if this is a "requirement" (type 1 with the loopback), then isn't it possible to carry VNI as a large/extended community instead of carrying in NLRI?

@soumyar-roy soumyar-roy force-pushed the soumya/cacherd branch 2 times, most recently from 5c4bfb8 to 06a7696 Compare March 20, 2026 18:08
Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is exactly the sort of change that really must have a design for review - adding statefulness to the bgp daemon is a significant change. it seems (at first look) out of proportion to the issue described. if unpredictable values are a problem ... configure stable values as part of provisioning/orchestration. if the existing format is an issue, then ... consider other formats that might work better. those are the kinds of things that should be explored in a design proposal.

@soumyar-roy soumyar-roy force-pushed the soumya/cacherd branch 3 times, most recently from 9971817 to 9e19950 Compare March 20, 2026 19:44
1. Fix memory leak on duplicate hash entries: when loading VRF/VNI
   state files, duplicate entries could cause the newly allocated
   entry to be silently ignored. Free the unused allocation when a
   duplicate is detected.

2. Reserve rd_id bits during statefile load: call bf_set_bit() while
   reading state files so cached rd_ids are not reassigned to new
   VRFs or VNIs.

3. Replace line-by-line statefile removal with full rewrite from
   in-memory hash: instead of reading the file, filtering one entry,
   and writing back, rewrite the entire file from the hash via
   bgp_rewrite_vrf_rd_statefile() and bgp_evpn_rewrite_vni_rd_statefile().
   This is simpler and avoids format-parsing fragility.

4. Add 'used' flag to vrf_rd_state_entry and vni_rd_state_entry:
   set the flag when an entry is consumed during config replay so
   orphan detection is possible after config load.

5. Post-config orphan cleanup: register bgp_rd_state_cleanup_after_config
   on the bgp_config_end hook. After config is fully loaded, walk both
   hashes and remove entries whose 'used' flag is still false (VRFs/VNIs
   that were in the state file but no longer in frr.conf). Release their
   rd_id bits and rewrite the state files.

6. Proper hash cleanup at shutdown: free vrf_rd_state and vni_rd_state
   hashes in bgp_exit() to avoid memory leaks on termination.

Signed-off-by: Soumya Roy <[email protected]>
Verify that RD state files are created correctly at startup, RD values
remain stable across bgpd restarts, and orphan entries are cleaned
from state files when VRFs are removed from config.

Signed-off-by: Soumya Roy <[email protected]>
Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants