Skip to content

Conversation

@aharivel
Copy link
Collaborator

@aharivel aharivel commented Dec 2, 2025

Summary

Fixes #403 - IPv4/IPv6 forwarding test packet loss with physical VFs

Problem

When multiple packets arrive simultaneously for the same on-link destination, they all trigger concurrent
nexthop resolution. This caused a race condition where:

  1. Packet 1: nh4_lookup() returns NULL → creates nexthop A → rib4_insert() succeeds
  2. Packet 2: nh4_lookup() returns NULL (timing) → creates nexthop B → rib4_insert() returns EEXIST →
    packet dropped
  3. Packet 3: Similar behavior

Result: 33% packet loss and 1000+ ms latency on initial connection establishment.

Solution

Move rib4_insert() inside the nexthop creation block:

if (remote == NULL) {
    remote = nexthop_new(...);
    rib4_insert(...);  // Only first packet tries insertion
}

Now only the packet that creates the nexthop attempts route insertion. Subsequent packets find the existing
nexthop via lookup and skip creation/insertion entirely.

Changes

  • Commit 1: ip: insert route only when creating nexthop - Fixes IPv4
  • Commit 2: ip6: insert route only when creating nexthop - Applies same fix to IPv6

Summary by CodeRabbit

  • Bug Fixes

    • Improved IPv4 and IPv6 route insertion error handling during nexthop creation to ensure failures are detected and handled immediately.
    • Ensured ARP probe behavior remains stable and uninterrupted.
  • Refactor

    • Consolidated duplicate route creation so associated IPv4 (/32) and IPv6 (/128) routes are created once during nexthop setup.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 2, 2025

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

In nexthop handlers, route insertion is moved to immediately after creating a new remote nexthop: nh4_unreachable_cb now inserts the /32 via rib4_insert at remote creation time, and nh6_unreachable_cb inserts the /128 via rib6_insert at remote creation time, with allocation and insertion error handling. Duplicate later insertions and the intermediate remote_l3 variable were removed; after creation nh is set to the remote and l3 is derived via nexthop_info_l3(remote). arp_probe_input_cb behavior and public signatures are unchanged.

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main fix: addressing 33% packet loss during concurrent nexthop resolution, which is the primary change across both IPv4 and IPv6 modules.
Linked Issues check ✅ Passed The PR moves rib4/rib6_insert() inside nexthop-creation blocks to prevent race conditions causing EEXIST errors and packet loss, directly resolving issue #403's documented symptoms.
Out of Scope Changes check ✅ Passed All changes are scoped to nexthop resolution race condition fix; no unrelated modifications detected in IPv4 and IPv6 control modules.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ad268e and 3f5dcc8.

📒 Files selected for processing (2)
  • modules/ip/control/nexthop.c (1 hunks)
  • modules/ip6/control/nexthop.c (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
modules/ip/control/nexthop.c (1)

71-71: rib4_insert error handling now fully ignored – consider documenting intent or distinguishing EEXIST

Dropping the error checks here fixes the EEXIST race/drop issue and is safe wrt nexthop refcounts given rib4_insert’s ownership semantics (it handles its own ref on failure; callers don’t need to decref). Based on learnings, this is correct.

However, all other failures (e.g., ENOMEM) also become silent and we still proceed as if the host route was installed, which may make future debugging harder.

Suggest either:

  • Add a short comment at each call saying that failures (including EEXIST) are intentionally ignored because the route is just a cache and forwarding can proceed with the nexthop, or
  • Optionally, keep a cheap branch to log non‑EEXIST errors while still treating EEXIST as success.

Not a blocker, but would improve clarity for future readers.

Also applies to: 145-145

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 668067a and b4819c7.

📒 Files selected for processing (1)
  • modules/ip/control/nexthop.c (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/ip/control/nexthop.c
🧠 Learnings (7)
📓 Common learnings
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in modules/ip/control/route.c takes ownership of the nexthop reference by calling nexthop_incref(nh) at the beginning and properly calls nexthop_decref(nh) on any failure paths via the fail: label, so callers don't need to manually decrement the reference count when rib4_insert fails.
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in the IP module takes ownership of the nexthop reference and automatically calls nexthop_decref(nh) on failure paths, so callers don't need to manually decrement the reference count when rib4_insert fails.
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function takes ownership of the nexthop reference and automatically calls nexthop_decref(nh) on failure paths, so callers don't need to manually decrement the reference count when rib4_insert fails.
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: frr/rt_grout.c:355-361
Timestamp: 2025-09-25T07:52:17.403Z
Learning: The nexthop_new() function in FRR's zebra codebase cannot fail and return NULL - it will abort() the process if memory allocation fails, so null checks after calling nexthop_new() are unnecessary.
Learnt from: maxime-leroy
Repo: DPDK/grout PR: 309
File: modules/srv6/datapath/srv6_local.c:97-101
Timestamp: 2025-09-09T09:22:31.596Z
Learning: In SRv6 IPv6 extension header parsing, when is_ipv6_ext[proto] is true, rte_ipv6_get_next_ext() will not return a negative value, making error checking unnecessary. An assert can be used as a defensive measure for future DPDK compatibility.
Learnt from: rjarry
Repo: DPDK/grout PR: 305
File: modules/ip6/control/route.c:407-413
Timestamp: 2025-08-27T15:33:22.299Z
Learning: DPDK rte_rib6_get_nxt() with RTE_RIB6_GET_NXT_ALL flag does not yield the default route ::/0 if configured. The explicit rte_rib6_lookup_exact() call for the default route is necessary to ensure complete route enumeration.
📚 Learning: 2025-09-22T09:21:51.749Z
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in modules/ip/control/route.c takes ownership of the nexthop reference by calling nexthop_incref(nh) at the beginning and properly calls nexthop_decref(nh) on any failure paths via the fail: label, so callers don't need to manually decrement the reference count when rib4_insert fails.

Applied to files:

  • modules/ip/control/nexthop.c
📚 Learning: 2025-09-22T09:21:51.749Z
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in the IP module takes ownership of the nexthop reference and automatically calls nexthop_decref(nh) on failure paths, so callers don't need to manually decrement the reference count when rib4_insert fails.

Applied to files:

  • modules/ip/control/nexthop.c
📚 Learning: 2025-09-22T09:21:51.749Z
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function takes ownership of the nexthop reference and automatically calls nexthop_decref(nh) on failure paths, so callers don't need to manually decrement the reference count when rib4_insert fails.

Applied to files:

  • modules/ip/control/nexthop.c
📚 Learning: 2025-09-25T07:52:17.403Z
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: frr/rt_grout.c:355-361
Timestamp: 2025-09-25T07:52:17.403Z
Learning: The nexthop_new() function in FRR's zebra codebase cannot fail and return NULL - it will abort() the process if memory allocation fails, so null checks after calling nexthop_new() are unnecessary.

Applied to files:

  • modules/ip/control/nexthop.c
📚 Learning: 2025-08-27T15:33:22.299Z
Learnt from: rjarry
Repo: DPDK/grout PR: 305
File: modules/ip6/control/route.c:407-413
Timestamp: 2025-08-27T15:33:22.299Z
Learning: DPDK rte_rib6_get_nxt() with RTE_RIB6_GET_NXT_ALL flag does not yield the default route ::/0 if configured. The explicit rte_rib6_lookup_exact() call for the default route is necessary to ensure complete route enumeration.

Applied to files:

  • modules/ip/control/nexthop.c
📚 Learning: 2025-09-09T09:22:31.596Z
Learnt from: maxime-leroy
Repo: DPDK/grout PR: 309
File: modules/srv6/datapath/srv6_local.c:97-101
Timestamp: 2025-09-09T09:22:31.596Z
Learning: In SRv6 IPv6 extension header parsing, when is_ipv6_ext[proto] is true, rte_ipv6_get_next_ext() will not return a negative value, making error checking unnecessary. An assert can be used as a defensive measure for future DPDK compatibility.

Applied to files:

  • modules/ip/control/nexthop.c
🧬 Code graph analysis (1)
modules/ip/control/nexthop.c (1)
modules/ip/control/route.c (1)
  • rib4_insert (206-214)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: deb
  • GitHub Check: rpm

Copy link
Collaborator

@christophefontaine christophefontaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the same modification be done for rib6_insert ?

modules/ip6/control/address.c
modules/ip6/control/nexthop.c

And the Fixes should be a full link, instead of only the GitHub id.
Fixes: https://github.com/DPDK/grout/issues/403

@aharivel aharivel force-pushed the fix-403-nexthop-race branch from b4819c7 to c37f741 Compare December 2, 2025 14:53
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
modules/ip/control/nexthop.c (1)

150-150: Consider handling rib4_insert errors instead of fully ignoring them

Here we now ignore the return value of rib4_insert(iface->vrf_id, sip, 32, GR_NH_ORIGIN_INTERNAL, nh);. While this avoids treating -EEXIST as fatal in ARP races, it also silently ignores other failures (e.g. OOM, RIB issues), giving no log signal and leaving the system without a host route in those cases.

A lighter approach would be to special‑case the benign race and still surface unexpected errors, e.g.:

-        rib4_insert(iface->vrf_id, sip, 32, GR_NH_ORIGIN_INTERNAL, nh);
+        int ret = rib4_insert(iface->vrf_id, sip, 32, GR_NH_ORIGIN_INTERNAL, nh);
+        if (ret < 0 && ret != -EEXIST) {
+            LOG(ERR, "ip4_route_insert: %s", strerror(errno));
+            goto free;
+        }

This preserves the fix for -EEXIST races while keeping visibility and fail‑fast behavior for real insertion errors. Based on learnings about rib4_insert ownership/cleanup behavior.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b4819c7 and c37f741.

📒 Files selected for processing (2)
  • modules/ip/control/nexthop.c (2 hunks)
  • modules/ip6/control/nexthop.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/ip/control/nexthop.c
  • modules/ip6/control/nexthop.c
🧠 Learnings (7)
📓 Common learnings
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in modules/ip/control/route.c takes ownership of the nexthop reference by calling nexthop_incref(nh) at the beginning and properly calls nexthop_decref(nh) on any failure paths via the fail: label, so callers don't need to manually decrement the reference count when rib4_insert fails.
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in the IP module takes ownership of the nexthop reference and automatically calls nexthop_decref(nh) on failure paths, so callers don't need to manually decrement the reference count when rib4_insert fails.
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function takes ownership of the nexthop reference and automatically calls nexthop_decref(nh) on failure paths, so callers don't need to manually decrement the reference count when rib4_insert fails.
Learnt from: rjarry
Repo: DPDK/grout PR: 305
File: modules/ip6/control/route.c:407-413
Timestamp: 2025-08-27T15:33:22.299Z
Learning: DPDK rte_rib6_get_nxt() with RTE_RIB6_GET_NXT_ALL flag does not yield the default route ::/0 if configured. The explicit rte_rib6_lookup_exact() call for the default route is necessary to ensure complete route enumeration.
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: frr/rt_grout.c:355-361
Timestamp: 2025-09-25T07:52:17.403Z
Learning: The nexthop_new() function in FRR's zebra codebase cannot fail and return NULL - it will abort() the process if memory allocation fails, so null checks after calling nexthop_new() are unnecessary.
Learnt from: maxime-leroy
Repo: DPDK/grout PR: 309
File: modules/srv6/datapath/srv6_local.c:97-101
Timestamp: 2025-09-09T09:22:31.596Z
Learning: In SRv6 IPv6 extension header parsing, when is_ipv6_ext[proto] is true, rte_ipv6_get_next_ext() will not return a negative value, making error checking unnecessary. An assert can be used as a defensive measure for future DPDK compatibility.
📚 Learning: 2025-09-22T09:21:51.749Z
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in modules/ip/control/route.c takes ownership of the nexthop reference by calling nexthop_incref(nh) at the beginning and properly calls nexthop_decref(nh) on any failure paths via the fail: label, so callers don't need to manually decrement the reference count when rib4_insert fails.

Applied to files:

  • modules/ip/control/nexthop.c
  • modules/ip6/control/nexthop.c
📚 Learning: 2025-09-22T09:21:51.749Z
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in the IP module takes ownership of the nexthop reference and automatically calls nexthop_decref(nh) on failure paths, so callers don't need to manually decrement the reference count when rib4_insert fails.

Applied to files:

  • modules/ip/control/nexthop.c
  • modules/ip6/control/nexthop.c
📚 Learning: 2025-09-22T09:21:51.749Z
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function takes ownership of the nexthop reference and automatically calls nexthop_decref(nh) on failure paths, so callers don't need to manually decrement the reference count when rib4_insert fails.

Applied to files:

  • modules/ip/control/nexthop.c
  • modules/ip6/control/nexthop.c
📚 Learning: 2025-09-25T07:52:17.403Z
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: frr/rt_grout.c:355-361
Timestamp: 2025-09-25T07:52:17.403Z
Learning: The nexthop_new() function in FRR's zebra codebase cannot fail and return NULL - it will abort() the process if memory allocation fails, so null checks after calling nexthop_new() are unnecessary.

Applied to files:

  • modules/ip/control/nexthop.c
  • modules/ip6/control/nexthop.c
📚 Learning: 2025-08-27T15:33:22.299Z
Learnt from: rjarry
Repo: DPDK/grout PR: 305
File: modules/ip6/control/route.c:407-413
Timestamp: 2025-08-27T15:33:22.299Z
Learning: DPDK rte_rib6_get_nxt() with RTE_RIB6_GET_NXT_ALL flag does not yield the default route ::/0 if configured. The explicit rte_rib6_lookup_exact() call for the default route is necessary to ensure complete route enumeration.

Applied to files:

  • modules/ip/control/nexthop.c
  • modules/ip6/control/nexthop.c
📚 Learning: 2025-09-09T09:22:31.596Z
Learnt from: maxime-leroy
Repo: DPDK/grout PR: 309
File: modules/srv6/datapath/srv6_local.c:97-101
Timestamp: 2025-09-09T09:22:31.596Z
Learning: In SRv6 IPv6 extension header parsing, when is_ipv6_ext[proto] is true, rte_ipv6_get_next_ext() will not return a negative value, making error checking unnecessary. An assert can be used as a defensive measure for future DPDK compatibility.

Applied to files:

  • modules/ip6/control/nexthop.c
🧬 Code graph analysis (2)
modules/ip/control/nexthop.c (1)
modules/ip/control/route.c (1)
  • rib4_insert (206-214)
modules/ip6/control/nexthop.c (1)
modules/ip6/control/route.c (1)
  • rib6_insert (220-229)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: rpm
  • GitHub Check: deb
🔇 Additional comments (2)
modules/ip/control/nexthop.c (1)

61-66: Host /32 install in nh4_unreachable_cb looks consistent

Inserting the internal /32 immediately after allocating remote (and bailing out on real rib4_insert errors) matches the IPv6 pattern and keeps the host route creation localized to the nexthop allocation path. This looks correct and should help avoid duplicate later inserts.

modules/ip6/control/nexthop.c (1)

62-75: IPv6 /128 insertion path is clean and matches existing NDP behavior

Creating the internal /128 route right after remote allocation, with rib6_insert error checking and goto free on failure, mirrors the NDP probe handler and keeps the host‑route insertion single‑sourced. Arguments (vrf_id, iface_id, dst, RTE_IPV6_MAX_DEPTH, origin, remote) look correct.

@aharivel aharivel force-pushed the fix-403-nexthop-race branch from c37f741 to f27ed19 Compare December 2, 2025 14:59
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
modules/ip/control/nexthop.c (2)

69-72: Unreachable code after moving insertion.

This NULL check for remote appears unreachable:

  • If the creation block (lines 47-67) was entered and succeeded, remote is non-NULL
  • If rib4_insert() failed, we already jumped to free at line 65
  • If the block was skipped, remote was already non-NULL from line 44's lookup

Consider removing this check or replacing with an assert(remote != NULL) if defensive validation is desired.


150-150: Consider handling non-EEXIST errors.

While ignoring EEXIST is correct (concurrent insertion is harmless), silently ignoring other errors like ENOMEM loses diagnostic information.

Previous reviewer suggested checking specifically for -EEXIST. Consider:

-		rib4_insert(iface->vrf_id, sip, 32, GR_NH_ORIGIN_INTERNAL, nh);
+		if (rib4_insert(iface->vrf_id, sip, 32, GR_NH_ORIGIN_INTERNAL, nh) < 0 && errno != EEXIST)
+			LOG(ERR, "failed to insert route: %s", strerror(errno));

Based on learnings, rib4_insert() handles nexthop cleanup on failure, so no manual reference management needed.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c37f741 and f27ed19.

📒 Files selected for processing (2)
  • modules/ip/control/nexthop.c (2 hunks)
  • modules/ip6/control/nexthop.c (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • modules/ip6/control/nexthop.c
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/ip/control/nexthop.c
🧠 Learnings (6)
📓 Common learnings
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in modules/ip/control/route.c takes ownership of the nexthop reference by calling nexthop_incref(nh) at the beginning and properly calls nexthop_decref(nh) on any failure paths via the fail: label, so callers don't need to manually decrement the reference count when rib4_insert fails.
Learnt from: rjarry
Repo: DPDK/grout PR: 305
File: modules/ip6/control/route.c:407-413
Timestamp: 2025-08-27T15:33:22.299Z
Learning: DPDK rte_rib6_get_nxt() with RTE_RIB6_GET_NXT_ALL flag does not yield the default route ::/0 if configured. The explicit rte_rib6_lookup_exact() call for the default route is necessary to ensure complete route enumeration.
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in the IP module takes ownership of the nexthop reference and automatically calls nexthop_decref(nh) on failure paths, so callers don't need to manually decrement the reference count when rib4_insert fails.
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function takes ownership of the nexthop reference and automatically calls nexthop_decref(nh) on failure paths, so callers don't need to manually decrement the reference count when rib4_insert fails.
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: frr/rt_grout.c:355-361
Timestamp: 2025-09-25T07:52:17.403Z
Learning: The nexthop_new() function in FRR's zebra codebase cannot fail and return NULL - it will abort() the process if memory allocation fails, so null checks after calling nexthop_new() are unnecessary.
Learnt from: maxime-leroy
Repo: DPDK/grout PR: 309
File: modules/srv6/datapath/srv6_local.c:97-101
Timestamp: 2025-09-09T09:22:31.596Z
Learning: In SRv6 IPv6 extension header parsing, when is_ipv6_ext[proto] is true, rte_ipv6_get_next_ext() will not return a negative value, making error checking unnecessary. An assert can be used as a defensive measure for future DPDK compatibility.
📚 Learning: 2025-09-22T09:21:51.749Z
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in modules/ip/control/route.c takes ownership of the nexthop reference by calling nexthop_incref(nh) at the beginning and properly calls nexthop_decref(nh) on any failure paths via the fail: label, so callers don't need to manually decrement the reference count when rib4_insert fails.

Applied to files:

  • modules/ip/control/nexthop.c
📚 Learning: 2025-09-22T09:21:51.749Z
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function in the IP module takes ownership of the nexthop reference and automatically calls nexthop_decref(nh) on failure paths, so callers don't need to manually decrement the reference count when rib4_insert fails.

Applied to files:

  • modules/ip/control/nexthop.c
📚 Learning: 2025-09-22T09:21:51.749Z
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: modules/ip/control/address.c:102-110
Timestamp: 2025-09-22T09:21:51.749Z
Learning: The rib4_insert() function takes ownership of the nexthop reference and automatically calls nexthop_decref(nh) on failure paths, so callers don't need to manually decrement the reference count when rib4_insert fails.

Applied to files:

  • modules/ip/control/nexthop.c
📚 Learning: 2025-09-25T07:52:17.403Z
Learnt from: rjarry
Repo: DPDK/grout PR: 312
File: frr/rt_grout.c:355-361
Timestamp: 2025-09-25T07:52:17.403Z
Learning: The nexthop_new() function in FRR's zebra codebase cannot fail and return NULL - it will abort() the process if memory allocation fails, so null checks after calling nexthop_new() are unnecessary.

Applied to files:

  • modules/ip/control/nexthop.c
📚 Learning: 2025-08-27T15:33:22.299Z
Learnt from: rjarry
Repo: DPDK/grout PR: 305
File: modules/ip6/control/route.c:407-413
Timestamp: 2025-08-27T15:33:22.299Z
Learning: DPDK rte_rib6_get_nxt() with RTE_RIB6_GET_NXT_ALL flag does not yield the default route ::/0 if configured. The explicit rte_rib6_lookup_exact() call for the default route is necessary to ensure complete route enumeration.

Applied to files:

  • modules/ip/control/nexthop.c
🧬 Code graph analysis (1)
modules/ip/control/nexthop.c (1)
modules/ip/control/route.c (1)
  • rib4_insert (206-214)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-cross-aarch64
  • GitHub Check: deb
  • GitHub Check: rpm
🔇 Additional comments (1)
modules/ip/control/nexthop.c (1)

61-66: Race condition fix looks correct.

Moving rib4_insert() inside the nexthop-creation block ensures only the thread that successfully creates the nexthop attempts route insertion. Subsequent concurrent lookups will find the existing nexthop and skip creation. Error handling is preserved correctly.

Based on learnings, rib4_insert() takes ownership of the nexthop reference and handles cleanup on failure, so no manual decref is needed here.

@aharivel aharivel force-pushed the fix-403-nexthop-race branch 2 times, most recently from e3ab0d1 to 7ad268e Compare December 2, 2025 16:04
When multiple packets arrive for the same on-link destination, the code
first performs a nexthop lookup. Only the first packet finds nothing
and creates a new nexthop. Subsequent packets reuse the existing one.

The original code attempted route insertion after the lookup/create
logic. This caused an error where the second packet would create its own
nexthop, attempt rib4_insert(), receive EEXIST, and drop the packet.

Moving rib4_insert() inside the creation block ensures only the packet
that creates the nexthop attempts route insertion. Other packets skip
the insertion entirely since they found an existing nexthop.

Closes: DPDK#403
Signed-off-by: Anthony Harivel <[email protected]>
Reviewed-by: Robin Jarry <[email protected]>
Apply the same fix as the previous commit for IPv6.

When multiple packets arrive for the same on-link destination, only the
first packet that creates the nexthop should attempt route insertion.
Subsequent packets reuse the existing nexthop without trying to insert
a duplicate route.

This prevents EEXIST errors and packet drops during concurrent nexthop
resolution.

Signed-off-by: Anthony Harivel <[email protected]>
Reviewed-by: Robin Jarry <[email protected]>
@grout-bot grout-bot force-pushed the fix-403-nexthop-race branch from 7ad268e to 3f5dcc8 Compare December 2, 2025 16:25
@rjarry rjarry merged commit a39b8ba into DPDK:main Dec 2, 2025
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IPv4 forwarding test packet loss with physical VFs

3 participants