Skip to content

Bug: staticd crashes on SRv6 configuration removal #22690

@Yakiv-Huryk

Description

@Yakiv-Huryk

Is it platform specific

generic

Importance or Severity

Critical

Description of the bug

When processing SRv6 configuration (locator and sid) removal, the staticd access freed memory (locator), which can lead to a crash.

When removing the locator and then the sid, the following happens:

  1. On locator removal, the locator is freed in the static_zebra_process_srv6_locator_delete(), while the reference to it is still present in the sid (sid->locator)
    https://github.com/Azure/sonic-buildimage-msft/blob/07d7dd051f63b28492baf66456cff8531f6b2f36/src/sonic-frr/patch/0079-staticd-add-support-for-srv6.patch#L1749
  2. Then, during sid removal, we access the locator in the static_zebra_srv6_sid_uninstall()
    https://github.com/Azure/sonic-buildimage-msft/blob/07d7dd051f63b28492baf66456cff8531f6b2f36/src/sonic-frr/patch/0079-staticd-add-support-for-srv6.patch#L1433-L1435
  3. In a rare case when the freed locator family is AF_INET(2) and the prefixlen(sid->locator->block_bits_length) is > 32, it triggers the assert in the apply_mask() -> apply_mask_ipv4() -> masklen2ip()

Backtraces for the above.

Deleting the locator (the locator is valid since it's not freed yet)

(gdb) bt
#0  static_zebra_srv6_sid_uninstall (sid=sid@entry=0x55778d3eeeb0) at ../staticd/static_zebra.c:743
#1  0x00005577519f2c86 in static_zebra_process_srv6_locator_delete (cmd=<optimized out>, zclient=<optimized out>, length=<optimized out>, vrf_id=<optimized out>) at ../staticd/static_zebra.c:1141
#2  0x00007f8e7538505a in zclient_read (thread=<optimized out>) at ../lib/zclient.c:4624
#3  0x00007f8e7536f841 in event_call (thread=thread@entry=0x7ffd5e231ce0) at ../lib/event.c:2011
#4  0x00007f8e7531a540 in frr_run (master=0x55778d259e70) at ../lib/libfrr.c:1212
#5  0x00005577519efcfe in main (argc=3, argv=0x7ffd5e231f38, envp=<optimized out>) at ../staticd/static_main.c:192
(gdb) p *sid->locator
$25 = {name = "loc_bug", '\000' <repeats 248 times>, prefix = {family = 10 '\n', prefixlen = 48, prefix = {__in6_u = {__u6_addr8 = "0\000\000\001\000\020\000\000\000\000\000\000\000\000\000", __u6_addr16 = {48, 256, 4096, 0, 0, 0, 0, 0}, __u6_addr32 = {16777264,
          4096, 0, 0}}}}, block_bits_length = 32 ' ', node_bits_length = 16 '\020', function_bits_length = 16 '\020', argument_bits_length = 0 '\000', flags = 1 '\001'}

Deleting the sid (the locator is freed and invalid now)

(gdb) bt
#0  static_zebra_srv6_sid_uninstall (sid=sid@entry=0x55778d3eeeb0) at ../staticd/static_zebra.c:743
#1  0x00005577519fd979 in static_srv6_sid_del (sid=0x55778d3eeeb0) at ../staticd/static_srv6.c:166
#2  0x00005577519fd28c in routing_control_plane_protocols_control_plane_protocol_staticd_segment_routing_srv6_local_sids_sid_destroy (args=<optimized out>) at ../staticd/static_nb_config.c:1451
#3  0x00007f8e75338ecd in nb_callback_destroy (errmsg_len=2, errmsg=0x7ffd5e22faa0 "", dnode=0x55778d3eb360, event=NB_EV_APPLY, nb_node=<optimized out>, context=0x55778d3d75d0) at ../lib/northbound.c:1363
#4  nb_callback_configuration (context=context@entry=0x55778d3d75d0, event=event@entry=NB_EV_APPLY, change=change@entry=0x55778d3db870, errmsg=errmsg@entry=0x7ffd5e22faa0 "", errmsg_len=errmsg_len@entry=8191) at ../lib/northbound.c:1670
#5  0x00007f8e7533989e in nb_transaction_process (errmsg_len=8191, errmsg=0x7ffd5e22faa0 "", transaction=0x55778d3d75d0, event=NB_EV_APPLY) at ../lib/northbound.c:1794
#6  nb_candidate_commit_apply (transaction=0x55778d3d75d0, save_transaction=save_transaction@entry=true, transaction_id=transaction_id@entry=0x55778d3ee360, errmsg=errmsg@entry=0x7ffd5e22faa0 "", errmsg_len=errmsg_len@entry=8191) at ../lib/northbound.c:1131
#7  0x00007f8e753284a4 in mgmt_be_txn_proc_cfgapply (txn=txn@entry=0x55778d3ee300) at ../lib/mgmt_be_client.c:714
#8  0x00007f8e7532898d in mgmt_be_process_cfg_apply (txn_id=<optimized out>, client_ctx=0x55778d37e5a0) at ../lib/mgmt_be_client.c:754
#9  mgmt_be_client_handle_msg (be_msg=0x55778d3e9110, client_ctx=0x55778d37e5a0) at ../lib/mgmt_be_client.c:807
#10 mgmt_be_client_process_msg (version=<optimized out>, conn=0x55778d37e5a0, len=<optimized out>, data=<optimized out>) at ../lib/mgmt_be_client.c:1021
#11 mgmt_be_client_process_msg (version=<optimized out>, data=<optimized out>, len=<optimized out>, conn=0x55778d37e5a0) at ../lib/mgmt_be_client.c:993
#12 0x00007f8e7532bd55 in mgmt_msg_procbufs (ms=ms@entry=0x55778d37e5a8, handle_msg=0x7f8e75328720 <mgmt_be_client_process_msg>, user=user@entry=0x55778d37e5a0, debug=<optimized out>) at ../lib/mgmt_msg.c:193
#13 0x00007f8e7532be07 in msg_conn_proc_msgs (thread=<optimized out>) at ../lib/mgmt_msg.c:526
#14 0x00007f8e7536f841 in event_call (thread=thread@entry=0x7ffd5e231ce0) at ../lib/event.c:2011
#15 0x00007f8e7531a540 in frr_run (master=0x55778d259e70) at ../lib/libfrr.c:1212
#16 0x00005577519efcfe in main (argc=3, argv=0x7ffd5e231f38, envp=<optimized out>) at ../staticd/static_main.c:192
(gdb) p *sid->locator
$26 = {
  name = "\200\2704\215wU\000\000\000\000\000\000\000\000\000\000\220a2\215wU\000\0000\000\000\001\000\020\000\000\000\000\000\000\000\000\000\0000", '\000' <repeats 15 times>, "!\001\000\000\000\000\000\000\320\375\017u\216\177\000\000\320\375\017u\216\177\000\000 \000\000\000\000\000\000\000 \000\000\000\000\000\000\000\237#F\332rU\000\000\000\000\000\000\000\000\000\000@\000\000\000\000\000\000\000P\000\000\000\000\000\000\000_#F\332rU\000\000\300\374\017u\216\177", '\000' <repeats 58 times>..., prefix = {family = 64 '@',
    prefixlen = 0, prefix = {__in6_u = {__u6_addr8 = "P\000\000\000\000\000\000\000\237'F\332rU\000", __u6_addr16 = {80, 0, 0, 0, 10143, 55878, 21874, 0}, __u6_addr32 = {80, 0, 3662030751, 21874}}}}, block_bits_length = 0 '\000', node_bits_length = 0 '\000',
  function_bits_length = 0 '\000', argument_bits_length = 0 '\000', flags = 0 '\000'}

The crash example. This is from a core dump, not related to the above backtraces.

Core was generated by `/usr/lib/frr/staticd -A 127.0.0.1'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f69dd9d5ebc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f69dd9d5ebc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f69dd986fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f69ddd7a8cc in core_handler (signo=6, siginfo=0x7fffe8c50470, context=<optimized out>) at ../lib/sigevent.c:248
#3  <signal handler called>
#4  0x00007f69dd9d5ebc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f69dd986fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f69dd971472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007f69dddac3c9 in _zlog_assert_failed (xref=xref@entry=0x7f69dde544a0 <_xref.7>, extra=extra@entry=0x0) at ../lib/zlog.c:670
#8  0x00007f69ddd67234 in masklen2ip (masklen=<optimized out>, netmask=<optimized out>) at ../lib/prefix.c:707
#9  masklen2ip (masklen=<optimized out>, netmask=<optimized out>) at ../lib/prefix.c:705
#10 0x00007f69ddd67288 in apply_mask_ipv4 (p=0x7fffe8c50b40) at ../lib/prefix.c:737
#11 0x00007f69ddd675ed in apply_mask (pu=..., pu@entry=...) at ../lib/prefix.c:872
#12 0x0000558b837b087d in static_zebra_srv6_sid_uninstall (sid=sid@entry=0x558bb3e18570) at ../staticd/static_zebra.c:831
#13 0x0000558b837bb979 in static_srv6_sid_del (sid=0x558bb3e18570) at ../staticd/static_srv6.c:166
#14 0x0000558b837bb28c in routing_control_plane_protocols_control_plane_protocol_staticd_segment_routing_srv6_local_sids_sid_destroy (args=<optimized out>) at ../staticd/static_nb_config.c:1451
#15 0x00007f69ddd56ecd in nb_callback_destroy (errmsg_len=2, errmsg=0x7fffe8c511c0 "", dnode=0x558bb3e20bc0, event=NB_EV_APPLY, nb_node=<optimized out>, context=0x558bb3e0f610) at ../lib/northbound.c:1363
#16 nb_callback_configuration (context=context@entry=0x558bb3e0f610, event=event@entry=NB_EV_APPLY, change=change@entry=0x558bb3d95120, errmsg=errmsg@entry=0x7fffe8c511c0 "", errmsg_len=errmsg_len@entry=8191) at ../lib/northbound.c:1670
#17 0x00007f69ddd5789e in nb_transaction_process (errmsg_len=8191, errmsg=0x7fffe8c511c0 "", transaction=0x558bb3e0f610, event=NB_EV_APPLY) at ../lib/northbound.c:1794
#18 nb_candidate_commit_apply (transaction=0x558bb3e0f610, save_transaction=save_transaction@entry=true, transaction_id=transaction_id@entry=0x558bb3d95110, errmsg=errmsg@entry=0x7fffe8c511c0 "", errmsg_len=errmsg_len@entry=8191) at ../lib/northbound.c:1131
#19 0x00007f69ddd464a4 in mgmt_be_txn_proc_cfgapply (txn=txn@entry=0x558bb3d950b0) at ../lib/mgmt_be_client.c:714
#20 0x00007f69ddd4698d in mgmt_be_process_cfg_apply (txn_id=<optimized out>, client_ctx=0x558bb3d3b5a0) at ../lib/mgmt_be_client.c:754
#21 mgmt_be_client_handle_msg (be_msg=0x558bb3e0db20, client_ctx=0x558bb3d3b5a0) at ../lib/mgmt_be_client.c:807
#22 mgmt_be_client_process_msg (version=<optimized out>, conn=0x558bb3d3b5a0, len=<optimized out>, data=<optimized out>) at ../lib/mgmt_be_client.c:1021
#23 mgmt_be_client_process_msg (version=<optimized out>, data=<optimized out>, len=<optimized out>, conn=0x558bb3d3b5a0) at ../lib/mgmt_be_client.c:993
#24 0x00007f69ddd49d55 in mgmt_msg_procbufs (ms=ms@entry=0x558bb3d3b5a8, handle_msg=0x7f69ddd46720 <mgmt_be_client_process_msg>, user=user@entry=0x558bb3d3b5a0, debug=<optimized out>) at ../lib/mgmt_msg.c:193
#25 0x00007f69ddd49e07 in msg_conn_proc_msgs (thread=<optimized out>) at ../lib/mgmt_msg.c:526
#26 0x00007f69ddd8d841 in event_call (thread=thread@entry=0x7fffe8c53400) at ../lib/event.c:2011
#27 0x00007f69ddd38540 in frr_run (master=0x558bb3c16e70) at ../lib/libfrr.c:1212
#28 0x0000558b837adcfe in main (argc=3, argv=0x7fffe8c53658, envp=<optimized out>) at ../staticd/static_main.c:192
(gdb) f 12
#12 0x0000558b837b087d in static_zebra_srv6_sid_uninstall (sid=sid@entry=0x558bb3e18570) at ../staticd/static_zebra.c:831
831     ../staticd/static_zebra.c: No such file or directory.
(gdb) p locator_block
$1 = {family = 2 '\002', prefixlen = 255, prefix = {__in6_u = {__u6_addr8 = "\377\377\377\377\377\377\377\377\005\000\000\000\005\000\000", __u6_addr16 = {65535, 65535, 65535, 65535, 5, 0, 5, 0}, __u6_addr32 = {4294967295, 4294967295, 5, 5}}}}
(gdb)

Steps to Reproduce

The crash is hard to catch since it requires specific values in the freed memory, but for debugging, this is the example:

sonic-db-cli CONFIG_DB HSET SRV6_MY_LOCATORS\|loc_bug prefix 3000:1:10::
sonic-db-cli CONFIG_DB HSET SRV6_MY_SIDS\|loc_bug\|3000:1:10::/48 action uN decap_dscp_mode pipe

sonic-db-cli CONFIG_DB DEL SRV6_MY_LOCATORS\|loc_bug
sonic-db-cli CONFIG_DB DEL SRV6_MY_SIDS\|loc_bug\|3000:1:10::/48

Note that it is only reproducible when the sid is removed after the locator.

Actual Behavior and Expected Behavior

The staticd should not crash.
I don't know if it's valid from frr pov to remove the sid after the locator, maybe only the opposite is correct, then the bgpcfgd should be aligned.

Relevant log output

Output of show version, show techsupport

Attach files (if any)

No response

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions