-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[T2][202405] Zebra process consuming a large amount of memory resulting in OOM kernel panics #20337
Description
On full T2 devices in 202405 Arista is seeing the zebra process in FRR consume a large amount of memory (10x compared to 202205).
202405:
root@cmp206-4:~# docker exec -it bgp0 bash
root@cmp206-4:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.2 38116 32024 pts/0 Ss+ 21:23 0:01 /usr/bin/python3 /usr/local/bin/supervisord
root 44 0.1 0.2 131684 31888 pts/0 Sl 21:23 0:06 python3 /usr/bin/supervisor-proc-exit-listener --container-name bgp
root 47 0.0 0.0 230080 4164 pts/0 Sl 21:23 0:00 /usr/sbin/rsyslogd -n -iNONE
frr 51 27.5 8.1 2018736 1283692 pts/0 Sl 21:23 16:57 /usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp
202205:
root@cmp210-3:~# docker exec -it bgp bash
root@cmp210-3:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 30524 26232 pts/0 Ss+ 21:59 0:00 /usr/bin/python3 /usr/local/bin/supervisord
root 26 0.0 0.1 30808 25712 pts/0 S 21:59 0:00 python3 /usr/bin/supervisor-proc-exit-listener --container-name bgp
root 27 0.0 0.0 220836 3764 pts/0 Sl 21:59 0:00 /usr/sbin/rsyslogd -n -iNONE
frr 31 9.7 0.7 730360 128852 pts/0 Sl 21:59 2:32 /usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M fpm -M snmp
This results in the system having very low amounts of free memory:
> free -m
total used free shared buff/cache available
Mem: 15388 15304 158 284 481 83
If we run a command which causes zebra to consume even more memory like show ip route it can cause kernel panics due to OOM:
[74531.234009] Kernel panic - not syncing: Out of memory: compulsory panic_on_oom is enabled
[74531.260707] CPU: 1 PID: 735 Comm: auditd Kdump: loaded Tainted: G OE 6.1.0-11-2-amd64 #1 Debian 6.1.38-4
[74531.313431] Call Trace:
[74531.365891] <TASK>
[74531.418342] dump_stack_lvl+0x44/0x5c
[74531.470844] panic+0x118/0x2ed
[74531.523334] out_of_memory.cold+0x67/0x7e
When we look at the show memory in FRR we see the max nexthops is significantly higher on 202405 than 202205.
202405:
show memory
Memory statistics for zebra:
Total heap allocated: > 2GB
--- qmem libfrr ---
Type : Current# Size Total Max# MaxBytes
Nexthop : 1669 160 280536 8113264 1363218720 # ASIC0
Nexthop : 1535 160 258120 2097270 352476288 # ASIC1
202205:
show memory
Memory statistics for zebra:
Total heap allocated: 72 MiB
--- qmem libfrr ---
Type : Current# Size Total Max# MaxBytes
Nexthop : 1173 152 178312 36591 5563080
NOTES:
-both 202205 and 202405 have the same number of routes installed
-we also seen an increase on t2-min topologies but the absolute memory usage is at least half of what T2 is seeing so we aren't seeing OOMs on t2-min
-the FRR version changed between 202205=FRRouting 8.2.2 and 202405=FRRouting 8.5.4
Metadata
Metadata
Assignees
Labels
Type
Projects
Status