Skip to content

Conversation

@rjarry
Copy link
Collaborator

@rjarry rjarry commented Nov 3, 2025

Optimize how port state changes are handled in the datapath and control plane.

Move the admin status check from eth_input/eth_output to port_rx/port_tx to avoid unnecessary packet reception and processing when interfaces are down.

Use the RCU-protected port->started flag to allow ports to be safely stopped and restarted without always requiring graph manipulation. Only unplug/plug ports during reconfiguration when actual configuration changes are needed.

Stop and restart the port when changing MTU, which is required by some DPDK drivers.

Summary by CodeRabbit

  • Bug Fixes

    • Safer port MTU and reconfiguration flow: device stop/start and synchronization reduce disruption and ensure consistent plug/unplug sequencing.
    • Packet input/output logic: deferred interface UP checks prevent premature drops; RX now skips interfaces/ports not ready.
  • New Features

    • Added a transmit "port down" state for clearer diagnostics and drop tracking.
  • Chores

    • Test/runtime support updated to expose datapath synchronization object for tests.

@rjarry rjarry requested a review from david-marchand November 3, 2025 10:32
@coderabbitai
Copy link

coderabbitai bot commented Nov 3, 2025

📝 Walkthrough

Walkthrough

RCU synchronization was introduced around MTU updates and port reconfiguration; reconfiguration now stops the port, performs RCU quiescent synchronization, runs configuration, then plugs/starts the port and updates started flags. eth_input.c and eth_output.c removed early interface-UP short-circuits and defer UP checks into VLAN or later logic. port_rx.c added an early guard requiring the interface to be UP and the port to be started. port_tx.c adds a TX_DOWN edge, drop registration, and an early-return path for down ports. A gr_datapath_rcu accessor was added.

Pre-merge checks

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main optimization: moving port state handling checks from datapath to control plane and introducing RCU-protected state management.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d31ec0 and 039cc03.

📒 Files selected for processing (6)
  • modules/infra/control/port.c (6 hunks)
  • modules/infra/control/worker_test.c (2 hunks)
  • modules/infra/datapath/eth_input.c (0 hunks)
  • modules/infra/datapath/eth_output.c (0 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (3 hunks)
💤 Files with no reviewable changes (2)
  • modules/infra/datapath/eth_input.c
  • modules/infra/datapath/eth_output.c
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/infra/control/worker_test.c
  • modules/infra/control/port.c
  • modules/infra/datapath/port_tx.c
  • modules/infra/datapath/port_rx.c
🧠 Learnings (5)
📚 Learning: 2025-10-02T07:42:42.135Z
Learnt from: rjarry
Repo: DPDK/grout PR: 326
File: modules/infra/control/graph.c:360-364
Timestamp: 2025-10-02T07:42:42.135Z
Learning: In the grout codebase, `gr_vec_free()` is a macro defined in `main/gr_vec.h` that automatically sets the vector pointer to NULL after freeing the memory, preventing double-free issues. The macro definition is: `#define gr_vec_free(v) ((v) ? free(__gr_vec_hdr(v)) : (void)0, (v) = NULL)`.

Applied to files:

  • modules/infra/control/worker_test.c
  • modules/infra/control/port.c
📚 Learning: 2025-10-08T21:22:45.922Z
Learnt from: rjarry
Repo: DPDK/grout PR: 334
File: modules/infra/control/worker.c:278-281
Timestamp: 2025-10-08T21:22:45.922Z
Learning: In the codebase, `gr_vec_add` is a macro that does not return any value and cannot fail. Do not suggest checking its return value or adding error handling around it.

Applied to files:

  • modules/infra/control/worker_test.c
  • modules/infra/control/port.c
📚 Learning: 2025-09-05T08:58:53.047Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:533-539
Timestamp: 2025-09-05T08:58:53.047Z
Learning: DPDK functions rte_mempool_free() and rte_hash_free() are NULL-safe - they explicitly handle NULL pointers and do nothing when passed NULL, unlike many C standard library functions.

Applied to files:

  • modules/infra/control/worker_test.c
📚 Learning: 2025-10-21T15:42:43.874Z
Learnt from: rjarry
Repo: DPDK/grout PR: 350
File: modules/ip/control/address.c:214-216
Timestamp: 2025-10-21T15:42:43.874Z
Learning: In C code compiled with `-std=gnu2x`, the gr_vec_foreach macro supports inline variable declarations (e.g., `gr_vec_foreach (struct nexthop *nh, vector)`). This is valid C2x syntax and does not require pre-declaring the loop variable.

Applied to files:

  • modules/infra/control/port.c
📚 Learning: 2025-11-03T13:28:19.470Z
Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.470Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).

Applied to files:

  • modules/infra/datapath/port_tx.c
  • modules/infra/datapath/port_rx.c
🧬 Code graph analysis (3)
modules/infra/control/worker_test.c (1)
modules/infra/datapath/main_loop.c (1)
  • gr_datapath_rcu (300-302)
modules/infra/control/port.c (3)
modules/infra/control/worker_test.c (1)
  • gr_datapath_rcu (40-43)
modules/infra/datapath/main_loop.c (1)
  • gr_datapath_rcu (300-302)
modules/infra/control/worker.c (2)
  • port_unplug (145-182)
  • port_plug (184-223)
modules/infra/datapath/port_tx.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (522-549)
🔇 Additional comments (7)
modules/infra/datapath/port_rx.c (1)

42-49: LGTM! Early guard optimizes packet processing.

The early-return guard correctly prevents packet reception when the interface is down or port not started. The read of port->started relies on RCU synchronization performed by the control path (verified in port.c).

modules/infra/datapath/port_tx.c (2)

29-46: LGTM! TX_DOWN path correctly handles down ports.

The early-return path properly checks port state, traces packets, and enqueues to TX_DOWN before returning 0. Returning 0 is correct here as noted in learnings—it's purely for statistics and doesn't affect packet scheduling.

Based on learnings


33-34: Review comment is overly defensive; batch invariant is architecturally enforced.

The code assumes all mbufs in a batch share the same iface. This is valid: port_output routes each packet individually to port_tx via ctx->edges[port->port_id], and separate port_tx node instances exist per port+queue (port_tx-pXqY). The DPDK scheduler batches packets destined for the same port_tx instance together, naturally grouping by port and thus by iface. No explicit validation is needed or found in similar nodes (eth_output, loop_output, etc.).

modules/infra/control/worker_test.c (1)

40-43: LGTM! Mock implementation is appropriate for testing.

The mock gr_datapath_rcu() provides a valid stub for unit tests, returning a static zero-initialized rte_rcu_qsbr structure.

modules/infra/control/port.c (3)

340-344: LGTM! RCU synchronization correctly protects port reconfiguration.

The sequence properly ensures datapath threads observe started=false before the device is stopped, preventing races during reconfiguration.


384-384: Return value change is correct.

Changing from return port_plug(p) to return 0 is appropriate since port_plug is now called earlier (line 372) within the conditional block, and the unconditional success return here reflects completion of the full reconfig flow.


347-373: No cleanup needed on port_plug failure; consistent with codebase error handling pattern.

The concern is technically valid—if port_plug fails, the port remains configured but unplugged from workers. However, this intermediate state reflects the codebase's intentional fail-fast design. Every error path in iface_port_reconfig returns immediately without cleanup: rte_eth_dev_stop, port_unplug, port_configure, worker_queue_distribute, and port_mac_set all follow the same pattern. The resulting state (hardware-configured but not worker-integrated) is consistent and recoverable if the caller retries.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb413c0 and a088bde.

📒 Files selected for processing (5)
  • modules/infra/control/port.c (6 hunks)
  • modules/infra/datapath/eth_input.c (0 hunks)
  • modules/infra/datapath/eth_output.c (0 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (3 hunks)
💤 Files with no reviewable changes (2)
  • modules/infra/datapath/eth_output.c
  • modules/infra/datapath/eth_input.c
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/infra/datapath/port_tx.c
  • modules/infra/control/port.c
  • modules/infra/datapath/port_rx.c
🧠 Learnings (3)
📚 Learning: 2025-10-02T07:42:42.135Z
Learnt from: rjarry
Repo: DPDK/grout PR: 326
File: modules/infra/control/graph.c:360-364
Timestamp: 2025-10-02T07:42:42.135Z
Learning: In the grout codebase, `gr_vec_free()` is a macro defined in `main/gr_vec.h` that automatically sets the vector pointer to NULL after freeing the memory, preventing double-free issues. The macro definition is: `#define gr_vec_free(v) ((v) ? free(__gr_vec_hdr(v)) : (void)0, (v) = NULL)`.

Applied to files:

  • modules/infra/control/port.c
📚 Learning: 2025-10-21T15:42:43.874Z
Learnt from: rjarry
Repo: DPDK/grout PR: 350
File: modules/ip/control/address.c:214-216
Timestamp: 2025-10-21T15:42:43.874Z
Learning: In C code compiled with `-std=gnu2x`, the gr_vec_foreach macro supports inline variable declarations (e.g., `gr_vec_foreach (struct nexthop *nh, vector)`). This is valid C2x syntax and does not require pre-declaring the loop variable.

Applied to files:

  • modules/infra/control/port.c
📚 Learning: 2025-10-08T21:22:45.922Z
Learnt from: rjarry
Repo: DPDK/grout PR: 334
File: modules/infra/control/worker.c:278-281
Timestamp: 2025-10-08T21:22:45.922Z
Learning: In the codebase, `gr_vec_add` is a macro that does not return any value and cannot fail. Do not suggest checking its return value or adding error handling around it.

Applied to files:

  • modules/infra/control/port.c
🧬 Code graph analysis (2)
modules/infra/datapath/port_tx.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (522-549)
modules/infra/control/port.c (2)
modules/infra/datapath/main_loop.c (1)
  • gr_datapath_rcu (300-302)
modules/infra/control/worker.c (2)
  • port_unplug (145-182)
  • port_plug (184-223)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: deb
  • GitHub Check: rpm

Comment on lines +36 to +46
if (!(iface->flags & GR_IFACE_F_UP) || !port->started) {
for (unsigned i = 0; i < nb_objs; i++) {
if (gr_mbuf_is_traced(mbufs[i])) {
struct port_queue *t;
t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
*t = *ctx;
}
}
rte_node_enqueue(graph, node, TX_DOWN, objs, nb_objs);
return 0;
}
Copy link

@coderabbitai coderabbitai bot Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Return the consumed batch count here. Returning 0 after queueing everything on TX_DOWN tells the graph scheduler that nothing was consumed, so it will keep refeeding the same burst and you end up enqueueing duplicates forever. Hand back nb_objs just like the other paths in this function.

🤖 Prompt for AI Agents
In modules/infra/datapath/port_tx.c around lines 36 to 46, the code enqueues the
batch on TX_DOWN but returns 0 which signals no objects were consumed and causes
the scheduler to resend the same burst; change the return value to nb_objs so
the function reports the batch as consumed (i.e., after rte_node_enqueue(graph,
node, TX_DOWN, objs, nb_objs); return nb_objs;).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value of node process callbacks are only used for statistics. It does not affect the flow of packets through the graph. And in this case, the port_tx node didn't process anything. So I am returning 0 on purpose.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjarry, thank you for the clarification! I understand now—the return value is purely for statistics and doesn't affect packet scheduling. Returning 0 here makes sense since the node isn't actually processing packets when the port is down. I appreciate you explaining the DPDK graph framework behavior.

  /)_/)
 (•ᴗ•) ~ learned something new!
 />📚

✏️ Learnings added
Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.470Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@rjarry rjarry force-pushed the port-mtu-stop branch 2 times, most recently from fdc47f5 to 5d31ec0 Compare November 3, 2025 13:35
port = iface_info_port(iface);

if (!(iface->flags & GR_IFACE_F_UP)) {
if (!(iface->flags & GR_IFACE_F_UP) || !port->started) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be possible to merge iface->flags & GR_IFACE_F_UP and port->started ?
With the same RCU, etc ?

Interface admin status was checked in eth_input and eth_output nodes,
after packets had already been received from hardware queues or after L3
processing had been performed.

Move these checks earlier to port_rx and port_tx nodes. In port_rx, skip
calling rte_eth_rx_burst() entirely when the interface is
administratively down. In port_tx, redirect packets to a dedicated
port_tx_down drop node without attempting transmission.

This avoids unnecessary packet reception and processing for interfaces
that are not supposed to forward traffic.

Signed-off-by: Robin Jarry <[email protected]>
Reviewed-by: Christophe Fontaine <[email protected]>
Port reconfiguration requires calling rte_eth_dev_stop(). Previously,
port_unplug() was called unconditionally at the beginning of
iface_port_reconfig(), even when no actual reconfiguration was needed.

Add a check for port->started in port_rx and port_tx nodes. When a port
needs to be stopped, set started to false and use RCU synchronization to
ensure all datapath workers observe this change before calling
rte_eth_dev_stop(). Workers will skip packet reception and transmission
on stopped ports.

Only call port_unplug() and port_plug() when actual reconfiguration is
needed (needs_configure is true). This avoids unnecessary graph
manipulation when simply stopping and restarting a port.

Signed-off-by: Robin Jarry <[email protected]>
Reviewed-by: Christophe Fontaine <[email protected]>
Some DPDK drivers require the port to be stopped before the MTU can be
changed. Without this, rte_eth_dev_set_mtu() may fail or behave
unpredictably depending on the driver implementation.

Stop the port before calling rte_eth_dev_set_mtu() and restart it
afterward. Use RCU synchronization to ensure datapath workers observe
the started flag change before the port is actually stopped, preventing
them from attempting to receive or transmit packets during the MTU
change operation.

Fixes: a1d2887 ("port: implement interface callbacks for hardware configuration")
Signed-off-by: Robin Jarry <[email protected]>
Reviewed-by: Christophe Fontaine <[email protected]>
@christophefontaine christophefontaine merged commit 11b148f into DPDK:main Nov 4, 2025
7 checks passed
@rjarry rjarry deleted the port-mtu-stop branch November 4, 2025 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants