-
Notifications
You must be signed in to change notification settings - Fork 23
address: fix use after free #342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughReplaces in-place nexthop vector growth with an RCU-safe clone-and-replace pattern in IPv4 and IPv6 address control paths. Adds gr_rcu.h and, where previously gr_vec_add modified the existing nh vector, the code now clones the vector, appends the new nexthop to the clone, swaps the pointer, performs an RCU QSBR synchronize, and then frees the old vector. No public APIs or exported declarations were changed; control flow and error handling remain otherwise unchanged. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
fec83b9 to
5e85098
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
modules/ip/control/address.c (1)
112-120: Skip QSBR when there’s nothing old to retireIf
nhs_oldisNULL(first address on the iface), we wait for every datapath thread even though there is no buffer to reclaim. Wrap the synchronize/free pair so we only pay that price when an actual old vector exists.gr_vec_add(nhs_copy, nh); ifaddrs->nh = nhs_copy; - // Once all datapath workers have seen the new clone, free the old one. - rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID); - gr_vec_free(nhs_old); + if (nhs_old != NULL) { + // Once all datapath workers have seen the new clone, free the old one. + rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID); + gr_vec_free(nhs_old); + }modules/ip6/control/address.c (1)
142-150: Guard QSBR against the no-op caseSame note as IPv4: when the previous hoplist pointer is
NULL, we block on QSBR even though there is no buffer to retire. Add the simple guard around synchronize/free in both IPv6 blocks to skip the wait on first insert.gr_vec_add(nhs_copy, nh); maddrs->nh = nhs_copy; - // Once all datapath workers have seen the new clone, free the old one. - rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID); - gr_vec_free(nhs_old); + if (nhs_old != NULL) { + // Once all datapath workers have seen the new clone, free the old one. + rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID); + gr_vec_free(nhs_old); + }gr_vec_add(nhs_copy, nh); addrs->nh = nhs_copy; - // Once all datapath workers have seen the new clone, free the old one. - rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID); - gr_vec_free(nhs_old); + if (nhs_old != NULL) { + // Once all datapath workers have seen the new clone, free the old one. + rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID); + gr_vec_free(nhs_old); + }Also applies to: 240-248
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modules/ip/control/address.c(2 hunks)modules/ip6/control/address.c(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/ip/control/address.cmodules/ip6/control/address.c
🧬 Code graph analysis (2)
modules/ip/control/address.c (1)
modules/infra/datapath/main_loop.c (1)
gr_datapath_rcu(300-302)
modules/ip6/control/address.c (1)
modules/infra/datapath/main_loop.c (1)
gr_datapath_rcu(300-302)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
- GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
- GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
- GitHub Check: deb
- GitHub Check: rpm
5e85098 to
6490a48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
modules/ip6/control/address.c (2)
141-153: Good: RCU-safe publish of multicast nh vectorClone/append/swap + QSBR then free-old is correct and addresses the UAF.
Consider publishing to readers only after installing the Ethernet filter, or rollback on failure. Today, if iface_add_eth_addr() fails, the membership is still published and refcount incremented. If you want, I can propose a small reordering patch.
242-253: Good: RCU-safe publish of unicast nh vectorPattern mirrors v4/mcast; ordering is correct. LGTM.
This clone/append/swap + QSBR/free sequence is duplicated across v4/v6. Consider a small helper (e.g., hoplist_append_rcu(hoplist*, struct nexthop*)) to reduce repetition and keep future fixes in one place.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modules/ip/control/address.c(2 hunks)modules/ip6/control/address.c(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/ip/control/address.cmodules/ip6/control/address.c
🧠 Learnings (1)
📓 Common learnings
Learnt from: rjarry
PR: DPDK/grout#312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in modules/ip/control/route.c takes ownership of the nexthop reference by calling nexthop_incref(nh) at the beginning and properly calls nexthop_decref(nh) on any failure paths via the fail: label, so callers don't need to manually decrement the reference count when rib4_insert fails.
🧬 Code graph analysis (2)
modules/ip/control/address.c (1)
modules/infra/datapath/main_loop.c (1)
gr_datapath_rcu(300-302)
modules/ip6/control/address.c (1)
modules/infra/datapath/main_loop.c (1)
gr_datapath_rcu(300-302)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
- GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
- GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
- GitHub Check: deb
- GitHub Check: rpm
🔇 Additional comments (2)
modules/ip/control/address.c (1)
13-13: Include for QSBR usage is correctNeeded for gr_datapath_rcu() usage. LGTM.
modules/ip6/control/address.c (1)
13-13: QSBR include addedRequired for synchronize call. LGTM.
| // Clone it and append to the clone. | ||
| gr_vec struct nexthop **nhs_copy = gr_vec_clone(ifaddrs->nh); | ||
| gr_vec struct nexthop **nhs_old = ifaddrs->nh; | ||
| gr_vec_add(nhs_copy, nh); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gr_vec_clone will to do a malloc, gr_vec_add will do a realloc, it could be optimized:
gr_vec_grow(nhs_copy, gr_vec_len(nhs_old) +1) ;
gr_vec_extend(nhs_copy, nh_old);
gr_vec_add(nhs_copy, nh);
no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could change this to:
gr_vec struct nexthop **nhs_copy = NULL;
gr_vec struct nexthop **nhs_old = ifaddrs->nh;
gr_vec_cap_set(nhs_copy, gr_vec_len(nhs_old) + 1);
gr_vec_extend(nhs_copy, ifaddrs->nh);
gr_vec_add(nhs_copy, nh);
ifaddrs->nh = nhs_copy;But it is less readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really mind. Let me know what you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this one, it is still readable, more efficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| if (nhs_old != NULL) { | ||
| // Once all datapath workers have seen the new clone, free the old one. | ||
| rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID); | ||
| gr_vec_free(nhs_old); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a real gc, not doing synchronize barrier every where seems more and more needed for grout...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We will soon need to implement such a thing.
When adding an address to an interface, we use gr_vec_add() which may
call realloc() internally and may free the old memory area. This can
cause use after free bugs as detected by libasan in the CI:
+ grcli interface add port gm2dgn1 devargs net_tap1,iface=gm2dgn1 mac f0:0d:ac:dc:00:01
...
DEBUG: GROUT: iface_event: iface event [0xacdc0001] POST_ADD triggered for iface gm2dgn1.
INFO: GROUT: mcast6_addr_add: gm2dgn1: joining multicast group ff02::1:ffdc:1
INFO: GROUT: mcast6_addr_add: gm2dgn1: joining multicast group ff01::1
INFO: GROUT: mcast6_addr_add: gm2dgn1: joining multicast group ff02::1
INFO: GROUT: mcast6_addr_add: gm2dgn1: joining multicast group ff01::2
NOTICE: GROUT: mcast6_addr_add: gm2dgn1: joining multicast group ff02::2
GROUT: trace_log_packet: [rx gm2dgn1] f0:0d:ac:dc:00:01 > 33:33:00:00:00:16 / IPv6 :: > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
=================================================================
==90746==ERROR: AddressSanitizer: heap-use-after-free on address 0x504000006390 at pc 0x560cd400d1b7 bp 0x7f6cbdfd7760 sp 0x7f6cbdfd7750
READ of size 8 at 0x504000006390 thread T4
#0 0x560cd400d1b6 in __gr_vec_hdr ../main/gr_vec.h:34
DPDK#1 0x560cd400d284 in gr_vec_len ../main/gr_vec.h:50
DPDK#2 0x560cd400dc98 in mcast6_get_member ../modules/ip6/control/address.c:92
DPDK#3 0x560cd404d30e in ip6_input_process ../modules/ip6/datapath/ip6_input.c:96
DPDK#4 0x560cd3f8ef05 in __rte_node_process ../subprojects/dpdk/lib/graph/rte_graph_worker_common.h:207
DPDK#5 0x560cd3f8ef05 in rte_graph_walk_rtc ../subprojects/dpdk/lib/graph/rte_graph_model_rtc.h:42
DPDK#6 0x560cd3f8f78f in rte_graph_walk ../subprojects/dpdk/lib/graph/rte_graph_worker.h:38
DPDK#7 0x560cd3f91c1c in gr_datapath_loop ../modules/infra/datapath/main_loop.c:252
...
0x504000006390 is located 0 bytes inside of 48-byte region [0x504000006390,0x5040000063c0)
freed by thread T0 here:
#0 0x7f6d498fc778 in realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:85
DPDK#1 0x560cd400d364 in __gr_vec_grow ../main/gr_vec.h:69
DPDK#2 0x560cd400e68c in mcast6_addr_add ../modules/ip6/control/address.c:141
DPDK#3 0x560cd4010bc6 in ip6_iface_event_handler ../modules/ip6/control/address.c:373
DPDK#4 0x560cd3f0209e in gr_event_push ../main/event.c:24
DPDK#5 0x560cd3f2d9b0 in iface_create ../modules/infra/control/iface.c:134
DPDK#6 0x560cd3f08c1a in iface_add ../modules/infra/api/iface.c:33
...
Prevent this from happening by cloning the address vector before adding
anything to it. Add the new address to the clone and make it visible to
datapath threads.
Use the RCU to ensure all threads have seen the new vector and *then*
free the old vector.
Link: https://github.com/DPDK/grout/actions/runs/18442009960/job/52543656749#step:9:4363
Signed-off-by: Robin Jarry <[email protected]>
6490a48 to
f88e527
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
modules/ip6/control/address.c (1)
245-257: RCU-safe pattern correctly applied to interface addresses.The implementation is correct and consistent with the other occurrences.
Note: The same clone-append-replace pattern now appears in three locations (once in modules/ip/control/address.c, twice here). Consider extracting a helper function like
gr_vec_clone_and_append_rcu_safe(old_vec, new_element)to reduce duplication.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modules/ip/control/address.c(2 hunks)modules/ip6/control/address.c(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/ip6/control/address.cmodules/ip/control/address.c
🧠 Learnings (1)
📓 Common learnings
Learnt from: rjarry
PR: DPDK/grout#312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in modules/ip/control/route.c takes ownership of the nexthop reference by calling nexthop_incref(nh) at the beginning and properly calls nexthop_decref(nh) on any failure paths via the fail: label, so callers don't need to manually decrement the reference count when rib4_insert fails.
🧬 Code graph analysis (2)
modules/ip6/control/address.c (2)
main/gr_vec.h (1)
gr_vec_len(47-51)modules/infra/datapath/main_loop.c (1)
gr_datapath_rcu(300-302)
modules/ip/control/address.c (2)
main/gr_vec.h (1)
gr_vec_len(47-51)modules/infra/datapath/main_loop.c (1)
gr_datapath_rcu(300-302)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
- GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
- GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
- GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: deb
- GitHub Check: rpm
🔇 Additional comments (4)
modules/ip/control/address.c (2)
13-13: Include required for RCU operations.The gr_rcu.h include is necessary for the gr_datapath_rcu() function used in the RCU-safe vector update pattern below.
112-124: RCU-safe clone-append-replace eliminates UAF.The implementation correctly fixes the heap-use-after-free by:
- Pre-allocating capacity to avoid double allocation
- Cloning via gr_vec_extend from the old vector
- Adding the new nexthop
- Publishing the new vector
- RCU synchronizing to ensure datapath threads observe the change
- Freeing the old vector only after synchronization
The optimization (gr_vec_cap_set + gr_vec_extend vs. gr_vec_clone + gr_vec_add) from past reviews has been properly applied.
modules/ip6/control/address.c (2)
13-13: Include required for RCU operations.The gr_rcu.h include is necessary for the gr_datapath_rcu() function used in the RCU-safe vector updates below.
143-155: RCU-safe pattern correctly applied to multicast addresses.The clone-append-replace pattern with RCU synchronization is correctly implemented, matching the IPv4 implementation. This eliminates the UAF risk when updating the multicast address vector.
When adding an address to an interface, we use
gr_vec_add()which may callrealloc()internally and may free the old memory area. This can cause use after free bugs as detected by libasan in the CI:Prevent this from happening by cloning the address vector before adding anything to it. Add the new address to the clone and make it visible to datapath threads.
Use the RCU to ensure all threads have seen the new vector and then free the old vector.
Link: https://github.com/DPDK/grout/actions/runs/18442009960/job/52543656749#step:9:4363
Summary by CodeRabbit
Bug Fixes
Refactor