Skip to content

Conversation

@rjarry
Copy link
Collaborator

@rjarry rjarry commented Oct 6, 2025

  • rx,tx: initialize node context when creating graph
  • vec: add gr_strvec_free
  • vec: add gr_vec_extend
  • infra: rename enum to name functions
  • infra: rename {rx,tx}.c with a port_ prefix
  • port: pass whole structure to plug,unplug
  • eth_output: make output dependent on interface type
  • rx,tx: use one node per port queue

Summary by CodeRabbit

  • New Features
    • Introduced port-based RX, TX, and Output nodes for the datapath.
    • Configurable routing by interface type and mode, including NO_MAC handling.
  • Refactor
    • Reworked graph construction to be port-aware and simplified interface naming in telemetry.
    • Replaced legacy RX/TX implementations with modular port_rx/port_tx/port_output nodes.
  • Bug Fixes
    • More robust stats reload and cleanup; prevents stale stats exposure.
    • Broader aggregation of “port_rx-*” counters.
    • Improved shutdown behavior; no crashes when removing/recreating ports.
  • Tests
    • Added unit tests for vector utilities and worker graph reloads.
    • Expanded smoke tests for affinity and port lifecycle.

@coderabbitai
Copy link

coderabbitai bot commented Oct 6, 2025

📝 Walkthrough

Walkthrough

This PR introduces vector utilities (gr_vec_extend, gr_strvec_free) and adds tests. It renames iface string helpers and updates consumers. It removes node data getters/setters from gr_graph.h and significantly refactors graph construction to be port-based, changing worker_graph_* signatures and propagating a ports vector throughout control paths. Port plug/unplug APIs now take iface_info_port*. Datapath is restructured: rx/tx split into port_rx, port_tx, and a new port_output node; eth_output gains type-based edge registration; xconnect updates edges. Build scripts, stats handling, and smoke tests are updated accordingly; legacy rx.c/tx.c are removed.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

  • Wide API churn across control and datapath layers (function renames/signature changes).
  • Replacement of RX/TX pipeline files with new nodes and edge routing.
  • Substantial logic changes in graph construction, worker paths, and stats setup.
  • Multiple new helpers and global state; heterogeneous edits across many files.
  • Tests and build system adjustments; smoke scripts modified.

Possibly related PRs

  • gr_vec improvements #300 — Also modifies gr_vec API; overlaps with added gr_vec_extend/gr_strvec_free.
  • Nexthop cleanup #305 — Touches register_interface_mode and its declaration placement, matching changes in gr_rxtx.h and port_rx.c.

Suggested reviewers

  • christophefontaine
  • maxime-leroy

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.36% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly describes the change to the rx and tx datapath to use one node per port queue, which reflects a central component of the modifications and is specific enough to convey a meaningful aspect of the PR.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rjarry
Copy link
Collaborator Author

rjarry commented Oct 6, 2025

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Oct 6, 2025

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modules/infra/datapath/main_loop.c (1)

268-272: Release node_to_index on shutdown

ctx->node_to_index survives until thread exit, so we leak once per worker shutdown. Add rte_free(ctx->node_to_index); ctx->node_to_index = NULL; next to the ctx->w_stats free.

 if (ctx->stats)
 	rte_graph_cluster_stats_destroy(ctx->stats);
 rte_free(ctx->w_stats);
+rte_free(ctx->node_to_index);
 ctx->w_stats = NULL;
+ctx->node_to_index = NULL;
modules/infra/control/worker.c (1)

140-152: ctx->iface ends up bogus

ports only holds struct iface_info_port * coming from iface->info (see the enumerations in this file). That storage is separate from the owning struct iface, so subtracting offsetof(struct iface, info) from p does not recover the parent object. The ctx ends up pointing at garbage and any later deref in the RX node blows up. Carry the real struct iface * down (e.g. store it alongside the info pointer or look it up via iface_from_id()/similar) and assign that instead of this pointer arithmetic.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 794434b and 2e1890b.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (26)
  • main/gr_vec.h (2 hunks)
  • main/vec_test.c (1 hunks)
  • meson.build (1 hunks)
  • modules/infra/api/gr_infra.h (2 hunks)
  • modules/infra/api/stats.c (3 hunks)
  • modules/infra/control/gr_graph.h (0 hunks)
  • modules/infra/control/graph.c (11 hunks)
  • modules/infra/control/graph_priv.h (1 hunks)
  • modules/infra/control/port.c (2 hunks)
  • modules/infra/control/worker.c (7 hunks)
  • modules/infra/control/worker_priv.h (1 hunks)
  • modules/infra/control/worker_test.c (1 hunks)
  • modules/infra/datapath/eth_output.c (3 hunks)
  • modules/infra/datapath/gr_eth.h (1 hunks)
  • modules/infra/datapath/gr_icmp6.h (5 hunks)
  • modules/infra/datapath/gr_rxtx.h (1 hunks)
  • modules/infra/datapath/l1_xconnect.c (4 hunks)
  • modules/infra/datapath/main_loop.c (2 hunks)
  • modules/infra/datapath/meson.build (1 hunks)
  • modules/infra/datapath/port_output.c (1 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (1 hunks)
  • modules/infra/datapath/rx.c (0 hunks)
  • modules/infra/datapath/tx.c (0 hunks)
  • smoke/affinity_test.sh (1 hunks)
  • subprojects/dpdk.wrap (1 hunks)
💤 Files with no reviewable changes (3)
  • modules/infra/control/gr_graph.h
  • modules/infra/datapath/tx.c
  • modules/infra/datapath/rx.c
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-02T07:42:42.135Z
Learnt from: rjarry
PR: DPDK/grout#326
File: modules/infra/control/graph.c:360-364
Timestamp: 2025-10-02T07:42:42.135Z
Learning: In the grout codebase, `gr_vec_free()` is a macro defined in `main/gr_vec.h` that automatically sets the vector pointer to NULL after freeing the memory, preventing double-free issues. The macro definition is: `#define gr_vec_free(v) ((v) ? free(__gr_vec_hdr(v)) : (void)0, (v) = NULL)`.

Applied to files:

  • main/gr_vec.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: rpm
  • GitHub Check: deb
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
🔇 Additional comments (20)
smoke/affinity_test.sh (1)

30-34: Nice addition to cover port churn.

Recreating the ports after teardown is a solid smoke check for the new plug/unplug flow.

modules/infra/datapath/l1_xconnect.c (3)

14-18: LGTM! Enum renaming improves clarity.

OUTPUT is more descriptive than TX for this edge case.

Also applies to: 39-39


77-77: LGTM! Drop registration consistent with node mapping.

Name change to "xconnect_no_port" aligns with the next_nodes entry at line 63.


57-65: Approve port_output node registration
The “port_output” node is defined in modules/infra/datapath/port_output.c, confirming the mapping is correct.

main/gr_vec.h (3)

142-154: LGTM!

The __gr_vec_extend implementation correctly grows the destination vector, copies items from the source, and updates the length. The logic handles empty source vectors appropriately.


156-162: LGTM!

The __gr_strvec_free implementation correctly frees each dynamically allocated string before freeing the vector header. Returning NULL allows the macro to set the pointer to NULL, consistent with gr_vec_free behavior.

Based on learnings.


171-188: LGTM!

Both new macros (gr_strvec_free and gr_vec_extend) are properly defined and delegate to their internal helpers. The macro definitions follow the established pattern in this file.

main/vec_test.c (3)

89-97: LGTM!

The test correctly exercises gr_strvec_free by creating a vector of dynamically allocated strings and cleaning up properly.


99-126: LGTM!

The test thoroughly validates gr_vec_extend by creating two vectors, extending one with the other, and verifying the combined contents. The assertions cover both length and content correctness.


132-133: LGTM!

New tests are properly registered in the test suite.

modules/infra/datapath/gr_icmp6.h (1)

38-131: LGTM!

The change from __rte_packed to __attribute__((__packed__)) is a stylistic update to use standard GCC attributes. The DPDK macro __rte_packed expands to the same attribute, so this change is functionally equivalent. The combination of __rte_aligned(2) and __attribute__((__packed__)) on line 131 is valid.

modules/infra/datapath/gr_rxtx.h (4)

9-9: LGTM!

Added include for <rte_graph.h> to support the new rte_edge_t type used in port_output_edges.


14-17: LGTM!

Node naming macros provide a consistent convention for RX/TX node names with port and queue identifiers.


19-32: LGTM!

The new structures (port_queue, rx_node_ctx, port_output_edges) are well-designed for the port-queue-based architecture. The naming is clear and the fields are appropriately sized.


36-36: LGTM!

The new register_interface_mode API enables dynamic routing based on interface mode, which aligns with the broader refactoring toward per-port nodes.

modules/infra/datapath/meson.build (1)

15-17: LGTM!

The source file list correctly reflects the refactoring from rx.c/tx.c to the new port-based implementations (port_output.c, port_rx.c, port_tx.c).

modules/infra/datapath/gr_eth.h (1)

39-39: LGTM!

The new eth_output_register_interface_type API provides a mechanism to register next nodes for specific interface types, complementing the interface-mode registration in gr_rxtx.h.

modules/infra/api/gr_infra.h (2)

322-338: LGTM!

The rename from iface_type_to_str to gr_iface_type_name improves naming consistency with the module prefix. The function logic remains unchanged.


341-351: LGTM!

The rename from iface_mode_to_str to gr_iface_mode_name improves naming consistency with the module prefix. The function logic remains unchanged.

subprojects/dpdk.wrap (1)

2-3: DPDK v25.07 tag found; review the 25.07 release notes for breaking changes and security advisories.

@rjarry rjarry force-pushed the graph-rxtx branch 4 times, most recently from ab180b7 to a31b1b8 Compare October 6, 2025 15:27
@rjarry rjarry marked this pull request as ready for review October 6, 2025 15:27
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modules/infra/datapath/main_loop.c (1)

269-273: Free ctx.node_to_index during shutdown

We still allocate ctx.node_to_index on every reload, but the shutdown path only frees ctx.w_stats. That leaves the last buffer leaked each time a datapath thread exits. Free it alongside ctx.w_stats (and null it like we do on reload failure).

 	if (ctx.stats)
 		rte_graph_cluster_stats_destroy(ctx.stats);
 	rte_free(ctx.w_stats);
+	rte_free(ctx.node_to_index);
+	ctx.node_to_index = NULL;
modules/infra/control/graph.c (1)

163-194: Memory leak if graph creation fails after rte_malloc.

At line 163, port_output_edges is allocated via rte_malloc. If graph creation fails before line 194 (where the pointer is assigned), the allocated memory is never freed. The out label at line 196 doesn't handle this cleanup.

Add cleanup for the allocated out structure:

 	struct port_output_edges *out = rte_malloc(__func__, sizeof(*out), RTE_CACHE_LINE_SIZE);
+	if (out == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
 	for (unsigned i = 0; i < ARRAY_DIM(out->edges); i++)
 		out->edges[i] = RTE_EDGE_ID_INVALID;
 
 	gr_vec_foreach_ref (qmap, worker->txqs) {
 		if (!qmap->enabled)
 			continue;
 		LOG(DEBUG,
 		    "[CPU %d] -> port %u txq %u",
 		    worker->cpu_id,
 		    qmap->port_id,
 		    qmap->queue_id);
 		// find the corresponding port_tx clone for this port-txq pair
 		snprintf(node_name, sizeof(node_name), TX_NODE_FMT, qmap->port_id, qmap->queue_id);
 		node = rte_graph_node_get_by_name(graph_name, node_name);
+		if (node == NULL) {
+			ret = -ENOENT;
+			rte_free(out);
+			goto out;
+		}
 		// and update its context data to correct values
 		struct port_queue *ctx = (struct port_queue *)node->ctx;

Also ensure out is freed in the error path before line 196.

🧹 Nitpick comments (5)
modules/infra/control/worker.c (1)

145-182: Verify memory cleanup on early returns.

The ports vector is allocated but not freed if changed == 0 at line 167-168. While the vector may be empty at that point, explicitly freeing it ensures consistency with the cleanup pattern used elsewhere.

Consider this adjustment:

 	}
 	if (changed == 0)
+		gr_vec_free(ports);
 		return 0;
modules/infra/datapath/gr_rxtx.h (2)

19-22: Consider adding documentation for struct port_queue.

The structure is straightforward, but adding a brief comment explaining its purpose (e.g., "Identifies a specific port and queue pair") would improve readability for future maintainers.


36-36: Missing documentation for register_interface_mode.

The function declaration lacks parameter descriptions. Adding a brief comment explaining the purpose of mode and next_node would improve API usability.

modules/infra/control/graph.c (2)

255-298: Resource leak if worker_graph_nodes_add_missing fails mid-execution.

The function allocates multiple vectors (old_rx, old_tx, unused_nodes) and clones nodes. If any operation fails (e.g., memory allocation, node cloning), the cleanup at lines 294-295 may not execute, leading to leaks.

Add comprehensive error handling with cleanup labels to ensure all allocated resources are freed on failure.


282-287: Verify edge update and shrink operations.

Lines 282-287 update and shrink edges for port_output_node. The assertions assume both operations succeed. If they fail (e.g., due to invalid node state), the program will abort. Consider logging errors and handling failures gracefully.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e1890b and 5e9e510.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (26)
  • main/gr_vec.h (2 hunks)
  • main/vec_test.c (1 hunks)
  • meson.build (1 hunks)
  • modules/infra/api/gr_infra.h (2 hunks)
  • modules/infra/api/stats.c (3 hunks)
  • modules/infra/control/gr_graph.h (0 hunks)
  • modules/infra/control/graph.c (11 hunks)
  • modules/infra/control/graph_priv.h (1 hunks)
  • modules/infra/control/port.c (2 hunks)
  • modules/infra/control/worker.c (7 hunks)
  • modules/infra/control/worker_priv.h (1 hunks)
  • modules/infra/control/worker_test.c (1 hunks)
  • modules/infra/datapath/eth_output.c (3 hunks)
  • modules/infra/datapath/gr_eth.h (1 hunks)
  • modules/infra/datapath/gr_icmp6.h (5 hunks)
  • modules/infra/datapath/gr_rxtx.h (1 hunks)
  • modules/infra/datapath/l1_xconnect.c (4 hunks)
  • modules/infra/datapath/main_loop.c (2 hunks)
  • modules/infra/datapath/meson.build (1 hunks)
  • modules/infra/datapath/port_output.c (1 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (1 hunks)
  • modules/infra/datapath/rx.c (0 hunks)
  • modules/infra/datapath/tx.c (0 hunks)
  • smoke/affinity_test.sh (1 hunks)
  • subprojects/dpdk.wrap (1 hunks)
💤 Files with no reviewable changes (3)
  • modules/infra/datapath/tx.c
  • modules/infra/control/gr_graph.h
  • modules/infra/datapath/rx.c
🚧 Files skipped from review as they are similar to previous changes (6)
  • main/vec_test.c
  • main/gr_vec.h
  • modules/infra/api/stats.c
  • modules/infra/datapath/port_tx.c
  • modules/infra/datapath/meson.build
  • modules/infra/api/gr_infra.h
🧰 Additional context used
🧬 Code graph analysis (11)
modules/infra/control/worker_priv.h (1)
modules/infra/control/worker.c (2)
  • port_unplug (145-182)
  • port_plug (184-223)
modules/infra/datapath/port_output.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/eth_output.c (1)
  • eth_output_register_interface_type (25-32)
modules/infra/datapath/eth_output.c (4)
modules/infra/api/gr_infra.h (1)
  • gr_iface_type_name (322-338)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/control/iface.c (1)
  • iface_from_id (215-222)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/port_rx.c (2)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/datapath/trace.c (2)
  • gr_mbuf_trace_add (520-547)
  • trace_log_packet (354-504)
modules/infra/control/graph.c (3)
api/string.c (1)
  • astrcat (15-50)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
api/gr_errno.h (1)
  • errno_set (9-12)
modules/infra/control/worker_test.c (1)
modules/infra/control/graph.c (2)
  • worker_graph_reload (203-226)
  • worker_graph_reload_all (300-318)
modules/infra/datapath/main_loop.c (1)
main/sort.c (1)
  • topo_sort (53-97)
modules/infra/datapath/gr_eth.h (1)
modules/infra/datapath/eth_output.c (1)
  • eth_output_register_interface_type (25-32)
modules/infra/control/worker.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/iface.c (1)
  • iface_next (198-213)
modules/infra/control/graph.c (2)
  • worker_graph_reload_all (300-318)
  • worker_graph_reload (203-226)
modules/infra/control/port.c (1)
modules/infra/control/worker.c (2)
  • port_unplug (145-182)
  • port_plug (184-223)
modules/infra/control/graph_priv.h (1)
modules/infra/control/graph.c (3)
  • worker_graph_reload (203-226)
  • worker_graph_reload_all (300-318)
  • rte_node_t (255-279)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-cross-aarch64
  • GitHub Check: deb
  • GitHub Check: rpm
🔇 Additional comments (21)
smoke/affinity_test.sh (1)

29-34: LGTM!

The port deletion and recreation test appropriately validates the graph reload mechanism under port lifecycle changes.

modules/infra/datapath/eth_output.c (2)

23-32: LGTM!

The interface-type registration logic correctly validates bounds, prevents duplicate registrations, and attaches the parent edge.


34-105: LGTM!

The unified enqueue path via the next label simplifies error handling and consolidates the final enqueue operation. The edge-based routing correctly handles VLAN and PORT interface types.

modules/infra/datapath/gr_icmp6.h (1)

38-131: LGTM!

Standardizing to __attribute__((__packed__)) improves portability and consistency across the codebase.

modules/infra/datapath/gr_eth.h (1)

39-39: LGTM!

The function declaration correctly exposes the interface-type registration API.

meson.build (1)

70-70: LGTM!

The driver paths correctly use the intel/ prefix for Intel drivers. The common/intel entry is properly included.

modules/infra/control/port.c (2)

163-163: LGTM!

Passing the port structure directly aligns with the updated API and eliminates unnecessary dereference.


288-288: LGTM!

Consistent with the updated port_plug API signature.

modules/infra/datapath/port_output.c (2)

46-48: LGTM!

The registration callback correctly associates GR_IFACE_TYPE_PORT with the port_output node.


17-32: port_id bounds guaranteed: port->port_id is vetted by rte_eth_dev_is_valid_port (ensuring < RTE_MAX_ETHPORTS) before assignment, matching the edges array size.

subprojects/dpdk.wrap (1)

2-3: DPDK tag v25.07 exists upstream.

modules/infra/control/worker.c (4)

170-174: Inverted logic: ports vector contains all ports except the unplugged one.

The condition at line 172 filters out the target port (p->port_id != p->port_id is always false if the condition were direct comparison). The vector should contain all ports except the one being unplugged. However, the logic appears correct on second inspection—line 172 adds ports not matching p->port_id.


208-222: Potential inconsistency: found flag unused after loop.

The found variable tracks whether the target port already exists in the interface list. However, after the loop, the code always adds p if !found (line 217-218), which is correct. The logic ensures the target port is included in the reload.


278-299: Verify error path cleanup for ports vector.

If worker_graph_reload fails at line 284, the function jumps to end, which properly frees the ports vector. The cleanup pattern is consistent.


320-323: Consistent use of ports parameter in worker_graph_reload.

The call at line 320 properly passes the ports vector to worker_graph_reload. Error handling propagates correctly, and the vector is freed at line 422 regardless of the return path.

modules/infra/datapath/gr_rxtx.h (3)

14-17: LGTM! Clear naming convention for per-port/queue nodes.

The node naming macros establish a consistent pattern for dynamically created RX/TX nodes. The format strings will generate names like "port_rx-p0q1" for port 0, queue 1.


24-28: Verify burst_size calculation doesn't cause division by zero.

The burst_size field in rx_node_ctx is calculated in graph.c:151 as RTE_GRAPH_BURST_SIZE / gr_vec_len(worker->rxqs). Ensure that gr_vec_len(worker->rxqs) cannot be zero at that point (it should be protected by the check at line 94-97 in graph.c, but worth verifying).


30-32: Potential size concern: RTE_MAX_ETHPORTS may be large.

The port_output_edges structure allocates an array sized RTE_MAX_ETHPORTS (typically 512 or 1024). This could consume significant memory if allocated per graph. Verify that this structure is allocated once per worker graph and not per port.

Looking at graph.c:163-194, the structure is allocated once per graph via rte_malloc, which is appropriate. Memory usage is acceptable.

modules/infra/control/graph.c (3)

78-201: Verify cleanup of dynamically allocated node name at line 113.

The name variable allocated via astrcat at line 113 is added to both rx_nodes and graph_nodes. However, rx_nodes is freed at line 198 using gr_strvec_free, which should free the individual strings. Ensure that graph_nodes does not retain pointers to freed memory.

Looking at the code flow:

  • Line 113: name allocated
  • Line 114: added to rx_nodes
  • Line 115: added to graph_nodes (shallow copy of pointer)
  • Line 197: graph_nodes freed (does not free individual strings)
  • Line 198: rx_nodes freed (frees individual strings via gr_strvec_free)

This is safe because graph_nodes is freed before rx_nodes, and after use in graph creation (line 127).


228-253: Verify node name uniqueness and collision handling.

The ensure_queue_node function clones nodes dynamically. If the same port-queue pair is processed multiple times (e.g., due to race conditions or incorrect caller logic), duplicate nodes could be created. Verify that the caller guarantees uniqueness.


488-490: LGTM! Proper cleanup of global node name vectors.

The graph_fini function correctly frees the three global vectors using appropriate free functions. The use of gr_strvec_free for vectors containing allocated strings is correct.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/infra/control/worker.c (1)

278-299: Need add error handling
gr_vec_add can fail here as well; if it does we fall through with ports == NULL and still call worker_graph_reload, leaking prior entries. Please detect the failure, go to cleanup, and return the error.

♻️ Duplicate comments (2)
modules/infra/control/graph.c (2)

135-152: Replace assert(ctx->iface) with error handling and fix burst_size calculation

  • Abort on missing iface is unsafe; return an error instead.
  • burst_size uses all RXQs, not enabled ones; can be 0 when rxqs > RTE_GRAPH_BURST_SIZE or too small when many disabled.

Apply this diff:

-        }
-        assert(ctx->iface != NULL);
+        }
+        if (ctx->iface == NULL) {
+            ret = -ENODEV;
+            goto err_destroy;
+        }
         ctx->rxq.port_id = qmap->port_id;
         ctx->rxq.queue_id = qmap->queue_id;
-        ctx->burst_size = RTE_GRAPH_BURST_SIZE / gr_vec_len(worker->rxqs);
+        unsigned enabled = n_rxqs ? n_rxqs : 1;
+        ctx->burst_size = RTE_GRAPH_BURST_SIZE / enabled;
+        if (ctx->burst_size == 0)
+            ctx->burst_size = 1;

Add this cleanup label in worker_graph_new to destroy the just-created graph on failure:

err_destroy:
    if (worker->graph[index] != NULL) {
        int _ret = rte_graph_destroy(worker->graph[index]->id);
        if (_ret < 0)
            LOG(ERR, "rte_graph_destroy: %s", rte_strerror(-_ret));
        worker->graph[index] = NULL;
    }
    goto out;

285-293: Avoid asserts on edge updates; handle failures gracefully

Replace asserts on rte_node_edge_update/shrink with error handling (see previous diff). Asserting here can abort in low-memory or inconsistent graph states.

🧹 Nitpick comments (3)
modules/infra/datapath/gr_icmp6.h (1)

131-131: Inconsistent attribute style.

Line 131 mixes __rte_aligned(2) (RTE-specific) with __attribute__((__packed__)) (standard). For consistency, consider replacing __rte_aligned(2) with __attribute__((__aligned__(2))).

Apply this diff:

-} __rte_aligned(2) __attribute__((__packed__));
+} __attribute__((__aligned__(2))) __attribute__((__packed__));
modules/infra/api/stats.c (1)

60-61: LGTM – prefix match verified

RX_NODE_BASE is defined as "port_rx" and RX_NODE_FMT yields names like "port_rx-p…" so the literal "port_rx-" check correctly matches all port RX nodes.

Optional: replace strlen("port_rx-") with sizeof("port_rx-") - 1 to avoid the runtime strlen.

modules/infra/control/graph.c (1)

171-195: Bounds-check port_output edges by port_id

out->edges[ctx->port_id] assumes edges array is large enough for all possible DPDK port IDs. Verify struct port_output_edges capacity (e.g., >= RTE_MAX_ETHPORTS) or use a mapping (port_id → compact index) to avoid OOB writes.

Would you confirm port_output_edges has at least RTE_MAX_ETHPORTS entries or otherwise add a check:

if (ctx->port_id >= ARRAY_DIM(out->edges)) {
    ret = -ERANGE;
    goto err_destroy;
}
out->edges[ctx->port_id] = edge;
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e9e510 and 2bb562a.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (26)
  • main/gr_vec.h (2 hunks)
  • main/vec_test.c (1 hunks)
  • meson.build (1 hunks)
  • modules/infra/api/gr_infra.h (2 hunks)
  • modules/infra/api/stats.c (3 hunks)
  • modules/infra/control/gr_graph.h (0 hunks)
  • modules/infra/control/graph.c (11 hunks)
  • modules/infra/control/graph_priv.h (1 hunks)
  • modules/infra/control/port.c (2 hunks)
  • modules/infra/control/worker.c (7 hunks)
  • modules/infra/control/worker_priv.h (1 hunks)
  • modules/infra/control/worker_test.c (1 hunks)
  • modules/infra/datapath/eth_output.c (3 hunks)
  • modules/infra/datapath/gr_eth.h (1 hunks)
  • modules/infra/datapath/gr_icmp6.h (5 hunks)
  • modules/infra/datapath/gr_rxtx.h (1 hunks)
  • modules/infra/datapath/l1_xconnect.c (4 hunks)
  • modules/infra/datapath/main_loop.c (2 hunks)
  • modules/infra/datapath/meson.build (1 hunks)
  • modules/infra/datapath/port_output.c (1 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (1 hunks)
  • modules/infra/datapath/rx.c (0 hunks)
  • modules/infra/datapath/tx.c (0 hunks)
  • smoke/affinity_test.sh (1 hunks)
  • subprojects/dpdk.wrap (1 hunks)
💤 Files with no reviewable changes (3)
  • modules/infra/datapath/tx.c
  • modules/infra/datapath/rx.c
  • modules/infra/control/gr_graph.h
🚧 Files skipped from review as they are similar to previous changes (11)
  • main/vec_test.c
  • modules/infra/datapath/gr_eth.h
  • modules/infra/api/gr_infra.h
  • main/gr_vec.h
  • modules/infra/datapath/l1_xconnect.c
  • modules/infra/control/port.c
  • modules/infra/datapath/port_output.c
  • meson.build
  • modules/infra/control/worker_test.c
  • modules/infra/datapath/main_loop.c
  • subprojects/dpdk.wrap
🧰 Additional context used
🧬 Code graph analysis (7)
modules/infra/datapath/port_tx.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/trace.c (3)
  • trace_log_packet (354-504)
  • gr_mbuf_trace_add (520-547)
  • gr_mbuf_trace_finish (549-565)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/port_rx.c (1)
  • rxtx_trace_format (25-28)
modules/infra/datapath/eth_output.c (4)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/iface.c (1)
  • iface_from_id (215-222)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/control/worker_priv.h (1)
modules/infra/control/worker.c (2)
  • port_unplug (145-182)
  • port_plug (184-223)
modules/infra/control/graph.c (2)
api/string.c (1)
  • astrcat (15-50)
api/gr_errno.h (1)
  • errno_set (9-12)
modules/infra/control/graph_priv.h (1)
modules/infra/control/graph.c (3)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
  • rte_node_t (259-283)
modules/infra/control/worker.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/iface.c (1)
  • iface_next (198-213)
modules/infra/control/graph.c (2)
  • worker_graph_reload_all (304-322)
  • worker_graph_reload (207-230)
modules/infra/datapath/port_rx.c (2)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/datapath/trace.c (2)
  • gr_mbuf_trace_add (520-547)
  • trace_log_packet (354-504)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-cross-aarch64
  • GitHub Check: deb
  • GitHub Check: rpm
🔇 Additional comments (11)
modules/infra/datapath/gr_icmp6.h (1)

38-122: LGTM!

Migration from __rte_packed to standard __attribute__((__packed__)) is correct and maintains struct layout.

smoke/affinity_test.sh (1)

30-34: Keep the re-created port configuration consistent.

Earlier in the script we only exercised two RX queues per port and mapped them explicitly. Re-adding p1 with rxqs 3 leaves queue 2 without any assignment, which can cause the CLI to fail or silently hit the default path instead of the affinity logic we’re trying to cover. Please either keep rxqs 2 on the re-add or extend the test to map/show the third queue as well to prove it works.

modules/infra/datapath/eth_output.c (4)

17-21: LGTM: Explicit edge enum values.

Clear initialization with INVAL = 0 for unregistered interface types.


23-32: LGTM: Well-validated registration function.

Good defensive checks (bounds, duplicates) with clear logging. ABORT on errors is appropriate for initialization-time failures.


38-85: LGTM: Value-based src_mac handling.

Switching from pointer to value eliminates potential dangling pointer issues.


100-101: LGTM: Unified enqueue path reduces duplication.

Single enqueue point via next label improves maintainability.

modules/infra/api/stats.c (1)

359-359: LGTM! Function renames applied correctly.

The updated function names align with the infra API refactor.

Also applies to: 376-376

modules/infra/control/graph.c (1)

197-199: ctx_ptr vs ctx: confirm expected field for port_output node

You set ctx_ptr for "port_output" while other nodes use ctx. Ensure port_output.c reads ctx_ptr accordingly; otherwise assign to node->ctx.

Would you confirm port_output.c expects ctx_ptr?

modules/infra/datapath/meson.build (1)

15-19: Build sources updated appropriately

Replacing rx/tx with port_output/port_rx/port_tx matches the refactor. LGTM.

modules/infra/datapath/port_rx.c (1)

30-36: Interface-mode edge registration is clean

Dynamic attach with a default unknown edge is sound. LGTM.

Note: rx burst size depends on worker_graph_new’s ctx->burst_size; fixing that calculation (see graph.c comment) will affect RX throughput distribution.

modules/infra/control/worker_priv.h (1)

9-11: All port_ call sites updated to use struct iface_info_port; approving change.**

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (8)
modules/infra/datapath/port_tx.c (1)

30-50: Tracing still touches mbufs after handing them to the driver

We’re again walking mbufs[i] post‐rte_eth_tx_burst, which is racy. Please finalize tracing before the burst and drop the post-loop.

@@
-	tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
-	if (tx_ok < nb_objs)
-		rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);
-
-	for (unsigned i = 0; i < tx_ok; i++) {
-		if (gr_mbuf_is_traced(mbufs[i])) {
-			struct port_queue *t;
-			t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
-			*t = *ctx;
-			gr_mbuf_trace_finish(mbufs[i]);
-		}
-	}
+	for (unsigned i = 0; i < nb_objs; i++) {
+		if (gr_mbuf_is_traced(mbufs[i])) {
+			struct port_queue *t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
+			*t = *ctx;
+			gr_mbuf_trace_finish(mbufs[i]);
+		}
+	}
+
+	tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
+	if (tx_ok < nb_objs)
+		rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);
modules/infra/control/graph_priv.h (1)

14-14: Remove unused declaration.

worker_graph_nodes_del_unused is declared but never defined or called. The unused nodes returned by worker_graph_nodes_add_missing are freed inline in worker_graph_reload_all.

Apply this diff:

 int worker_graph_reload(struct worker *, gr_vec struct iface_info_port **);
 int worker_graph_reload_all(gr_vec struct iface_info_port **);
-void worker_graph_nodes_del_unused(gr_vec rte_node_t *);
 void worker_graph_free(struct worker *);
modules/infra/control/worker.c (3)

170-178: Handle gr_vec_add failures to prevent leaks.

If gr_vec_add fails (OOM), the partially-built ports vector leaks and worker_graph_reload_all receives incomplete data. Check each gr_vec_add return value, free ports on failure, and abort before calling reload.


208-222: Check gr_vec_add return values.

Same issue as port_unplug: if gr_vec_add returns an error, ports is incomplete or partially allocated, leading to leaks and inconsistent state. Validate each gr_vec_add call, free ports on failure, and propagate the error instead of calling reload.


278-281: Missing error handling for gr_vec_add.

Building the ports vector without checking gr_vec_add return values risks leaking memory or passing incomplete data to worker_graph_reload if allocation fails.

Check gr_vec_add and handle failures:

 	gr_vec struct iface_info_port **ports = NULL;
 	struct iface *iface = NULL;
-	while ((iface = iface_next(GR_IFACE_TYPE_PORT, iface)) != NULL)
-		gr_vec_add(ports, (struct iface_info_port *)iface->info);
+	while ((iface = iface_next(GR_IFACE_TYPE_PORT, iface)) != NULL) {
+		if (gr_vec_add(ports, (struct iface_info_port *)iface->info) < 0) {
+			ret = errno_set(errno);
+			goto end;
+		}
+	}
modules/infra/control/graph.c (3)

148-148: Replace assertion with error handling.

If the ports vector is incomplete or stale, ctx->iface remains NULL and the assertion aborts. Return an error instead.

Apply this diff:

 		}
-		assert(ctx->iface != NULL);
+		if (ctx->iface == NULL) {
+			ret = -ENODEV;
+			goto out;
+		}
 		ctx->rxq.port_id = qmap->port_id;

239-257: Handle allocation and clone failures.

The function asserts on astrcat returning NULL (line 240) and rte_node_clone returning RTE_NODE_ID_INVALID (line 253). These should propagate errors to the caller.

Apply this diff:

 	char *name = astrcat(NULL, name_fmt, port_id, queue_id);
-	assert(name != NULL);
+	if (name == NULL)
+		return NULL;
 
 	// remove node from the old list so that only unused nodes remain
 	for (unsigned i = 0; i < gr_vec_len(old_list); i++) {
 		if (strcmp(old_list[i], name) == 0) {
 			free(old_list[i]);
 			gr_vec_del(old_list, i);
 			break;
 		}
 	}
 	if (rte_node_from_name(name) == RTE_NODE_ID_INVALID) {
 		// node does not exist yet, clone it from the base
 		rte_node_t node = rte_node_clone(base_node, strstr(name, "-") + 1);
-		assert(node != RTE_NODE_ID_INVALID);
+		if (node == RTE_NODE_ID_INVALID) {
+			free(name);
+			return NULL;
+		}
 	}

259-302: Propagate errors instead of aborting.

worker_graph_nodes_add_missing asserts on edge update failures (lines 289, 291) and assumes ensure_queue_node cannot fail. Return NULL on error and update the caller to handle it.

Apply this diff:

 	gr_vec_foreach (struct iface_info_port *port, ports) {
 		for (uint16_t rxq = 0; rxq < port->n_rxq; rxq++) {
 			name = ensure_queue_node(
 				port_rx_node, RX_NODE_FMT, port->port_id, rxq, old_rx
 			);
+			if (name == NULL)
+				goto err;
 			gr_vec_add(rx_node_names, name);
 		}
 		for (uint16_t txq = 0; txq < port->n_txq; txq++) {
 			name = ensure_queue_node(
 				port_tx_node, TX_NODE_FMT, port->port_id, txq, old_tx
 			);
+			if (name == NULL)
+				goto err;
 			gr_vec_add(tx_node_names, name);
 		}
 	}
 
 	// update the port_output edges to allow reaching all possible port_tx clones
 	edge = rte_node_edge_update(
 		port_output_node, 0, (const char **)tx_node_names, gr_vec_len(tx_node_names)
 	);
-	assert(edge != RTE_EDGE_ID_INVALID);
+	if (edge == RTE_EDGE_ID_INVALID)
+		goto err;
 	edge = rte_node_edge_shrink(port_output_node, gr_vec_len(tx_node_names));
-	assert(edge != RTE_EDGE_ID_INVALID);
+	if (edge == RTE_EDGE_ID_INVALID)
+		goto err;
 
 	// store all unused node_ids in a list to be returned to the caller
 	gr_vec_foreach (name, old_rx)
 		gr_vec_add(unused_nodes, rte_node_from_name(name));
 	gr_vec_foreach (name, old_tx)
 		gr_vec_add(unused_nodes, rte_node_from_name(name));
 	gr_strvec_free(old_rx);
 	gr_strvec_free(old_tx);
 
 	return unused_nodes;
+err:
+	gr_strvec_free(old_rx);
+	gr_strvec_free(old_tx);
+	gr_vec_free(unused_nodes);
+	errno = (errno != 0) ? errno : EINVAL;
+	return NULL;

Update the caller:

 	gr_vec rte_node_t *unused_nodes = worker_graph_nodes_add_missing(ports);
+	if (unused_nodes == NULL)
+		return errno_set(errno ? errno : EINVAL);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2bb562a and 4803e5e.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (15)
  • modules/infra/api/stats.c (3 hunks)
  • modules/infra/control/graph.c (11 hunks)
  • modules/infra/control/graph_priv.h (1 hunks)
  • modules/infra/control/worker.c (7 hunks)
  • modules/infra/control/worker_test.c (1 hunks)
  • modules/infra/datapath/eth_output.c (3 hunks)
  • modules/infra/datapath/gr_eth.h (1 hunks)
  • modules/infra/datapath/gr_rxtx.h (1 hunks)
  • modules/infra/datapath/l1_xconnect.c (4 hunks)
  • modules/infra/datapath/main_loop.c (2 hunks)
  • modules/infra/datapath/meson.build (1 hunks)
  • modules/infra/datapath/port_output.c (1 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (1 hunks)
  • smoke/affinity_test.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • modules/infra/api/stats.c
  • modules/infra/datapath/eth_output.c
  • modules/infra/datapath/gr_eth.h
  • modules/infra/control/worker_test.c
  • modules/infra/datapath/meson.build
🧰 Additional context used
🧬 Code graph analysis (7)
modules/infra/control/worker.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/iface.c (1)
  • iface_next (198-213)
modules/infra/control/graph.c (2)
  • worker_graph_reload_all (304-322)
  • worker_graph_reload (207-230)
modules/infra/datapath/main_loop.c (1)
main/sort.c (1)
  • topo_sort (53-97)
modules/infra/datapath/port_tx.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/trace.c (3)
  • trace_log_packet (354-504)
  • gr_mbuf_trace_add (520-547)
  • gr_mbuf_trace_finish (549-565)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/port_rx.c (1)
  • rxtx_trace_format (25-28)
modules/infra/datapath/port_rx.c (2)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/datapath/trace.c (2)
  • gr_mbuf_trace_add (520-547)
  • trace_log_packet (354-504)
modules/infra/control/graph_priv.h (1)
modules/infra/control/graph.c (3)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
  • rte_node_t (259-283)
modules/infra/datapath/port_output.c (5)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (520-547)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/eth_output.c (1)
  • eth_output_register_interface_type (25-32)
modules/infra/control/graph.c (3)
api/string.c (1)
  • astrcat (15-50)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
api/gr_errno.h (1)
  • errno_set (9-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: deb
  • GitHub Check: rpm
🔇 Additional comments (3)
smoke/affinity_test.sh (1)

30-34: Good teardown/recreate coverage.

Deleting then re-adding the ports (with a queue-count change) hardens the smoke test against regressions in the new per-port node flow. Nicely done.

modules/infra/datapath/gr_rxtx.h (1)

1-36: LGTM!

The header changes cleanly define the new port-based node naming and structures. The node format macros and updated context structures align with the refactored RX/TX path.

modules/infra/control/graph.c (1)

118-119: gr_vec_extend handles allocation failures by aborting
__gr_vec_grow calls realloc() and invokes ABORT() on NULL, so gr_vec_extend never silently fails or returns NULL.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/infra/datapath/main_loop.c (1)

268-274: Free ctx.node_to_index at shutdown to avoid a leak.

Currently only ctx.w_stats is freed.

Apply this fix:

  shutdown:
  	log(NOTICE, "shutting down tid=%d", w->tid);
  	atomic_store(&w->stats, NULL);
  	if (ctx.stats)
  		rte_graph_cluster_stats_destroy(ctx.stats);
  	rte_free(ctx.w_stats);
+ 	rte_free(ctx.node_to_index);
+ 	ctx.node_to_index = NULL;
  	rte_rcu_qsbr_thread_unregister(rcu, rte_lcore_id());
♻️ Duplicate comments (11)
modules/infra/datapath/main_loop.c (1)

118-131: Realloc path fixed; good cleanup before (re)alloc.

Free-before-alloc for node_to_index and w_stats looks correct; failure path resets state. Thanks for addressing the leak on reconfig.

modules/infra/datapath/port_tx.c (1)

38-50: Don’t mutate mbufs after tx_burst; move tracing before transmit.

Annotating after handing mbufs to the driver is racy. Trace before tx_burst and drop the post-loop.

Apply:

-	tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
-	if (tx_ok < nb_objs)
-		rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);
-
-	for (unsigned i = 0; i < tx_ok; i++) {
-		// FIXME racy: we are operating on mbufs already passed to driver
-		if (gr_mbuf_is_traced(mbufs[i])) {
-			struct port_queue *t;
-			t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
-			*t = *ctx;
-			gr_mbuf_trace_finish(mbufs[i]);
-		}
-	}
+	for (unsigned i = 0; i < nb_objs; i++) {
+		if (gr_mbuf_is_traced(mbufs[i])) {
+			struct port_queue *t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
+			*t = *ctx;
+			gr_mbuf_trace_finish(mbufs[i]);
+		}
+	}
+
+	tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
+	if (tx_ok < nb_objs)
+		rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);
modules/infra/control/worker.c (3)

278-299: Check ports vector construction in rxq assign

The rxq reassignment path also builds ports via unchecked gr_vec_add calls; if allocation fails we’ll jump into reload with garbage and ret is uninitialised on the way out. Add proper error handling and initialise ret before the loop.


170-178: Handle gr_vec_add failure

gr_vec_add(ports, port) can fail; today we ignore the error, leak any partial vector, and still call worker_graph_reload_all. Please capture the return, abort the loop, free ports, and propagate the error instead of proceeding.


208-222: Same missing error handling path

Same story here: every gr_vec_add (including the final gr_vec_add(ports, p)) needs error checking with proper cleanup and early return; otherwise OOM yields a bogus reload.

modules/infra/datapath/eth_output.c (1)

52-78: Still missing VLAN parent/type validation

We still deref iface_from_id(sub->parent_id) without checking and then index iface_type_edges with an unchecked priv->iface->type; both lead to crashes on bad data. Please bail out (setting edge) when the parent lookup fails and validate the type before indexing.

 		if (priv->iface->type == GR_IFACE_TYPE_VLAN) {
 			sub = (struct iface_info_vlan *)priv->iface->info;
 			vlan = (struct rte_vlan_hdr *)rte_pktmbuf_prepend(mbuf, sizeof(*vlan));
 			if (unlikely(vlan == NULL)) {
 				edge = NO_HEADROOM;
 				goto next;
 			}
 			vlan->vlan_tci = rte_cpu_to_be_16(sub->vlan_id);
 			vlan->eth_proto = priv->ether_type;
 			priv->ether_type = RTE_BE16(RTE_ETHER_TYPE_VLAN);
-			priv->iface = iface_from_id(sub->parent_id);
+			const struct iface *parent = iface_from_id(sub->parent_id);
+			if (unlikely(parent == NULL)) {
+				edge = INVAL;
+				goto next;
+			}
+			priv->iface = parent;
 		}
 
-		edge = iface_type_edges[priv->iface->type];
+		if (unlikely(priv->iface->type >= ARRAY_DIM(iface_type_edges))) {
+			edge = INVAL;
+			goto next;
+		}
+		edge = iface_type_edges[priv->iface->type];
 		if (edge == INVAL)
 			goto next;
modules/infra/control/graph.c (5)

113-115: Check astrcat return value for NULL.

The astrcat call can fail and return NULL on OOM. Dereferencing or adding NULL to vectors will cause undefined behavior.

This issue was previously flagged. See past review comments.


148-148: Replace assertion with error handling.

The assertion assumes ctx->iface is always found, but if the ports vector is incomplete, this will abort the program.

This issue was previously flagged. See past review comments.


240-240: Replace allocation and clone assertions with error returns.

Assertions on astrcat and rte_node_clone will abort on failure. Return NULL to let callers handle errors gracefully.

These issues were previously flagged. See past review comments.

Also applies to: 253-253


289-291: Replace edge update assertions with error handling.

Assertions on rte_node_edge_update and rte_node_edge_shrink will abort on failure. Return NULL and propagate errors to the caller.

This issue was previously flagged. See past review comments.


308-308: Check worker_graph_nodes_add_missing for NULL.

If the asserts in worker_graph_nodes_add_missing are replaced with proper error returns (as previously suggested), this call must check for NULL and propagate the error.

This issue was previously flagged as part of error propagation improvements. See past review comments.

🧹 Nitpick comments (4)
modules/infra/datapath/main_loop.c (1)

141-145: Use gr_vec_len(nodes) for qsort length (avoid relying on macro side-effects).

Safer and clearer to sort using the actual vector length.

Apply this refactor:

- // sort by name first to ensure stable topo_sort
- qsort(nodes, count, sizeof(void *), node_name_cmp);
+ // sort by name first to ensure stable topo_sort
+ size_t n = gr_vec_len(nodes);
+ qsort(nodes, n, sizeof(void *), node_name_cmp);
modules/infra/datapath/port_output.c (1)

31-31: Validate next edge index (nb_edges=1 vs per-port mapping).

If ctx->edges[port_id] can be > 0, enqueue will hit an invalid edge since nb_edges=1. Either ensure mapping is always 0 for this node or add a bound check/assert during wiring.

Also applies to: 45-47

modules/infra/control/graph.c (2)

252-252: Guard strstr result before pointer arithmetic.

While the format string ensures "-" exists, defensive programming suggests checking strstr for NULL:

-	rte_node_t node = rte_node_clone(base_node, strstr(name, "-") + 1);
+	const char *suffix = strstr(name, "-");
+	if (suffix == NULL) {
+		free(name);
+		return NULL;
+	}
+	rte_node_t node = rte_node_clone(base_node, suffix + 1);

163-169: Document port_output_edges ownership and cleanup. The out struct allocated via rte_malloc in modules/infra/control/graph.c is stored in the “port_output” node’s ctx_ptr and freed by port_output_fini() in modules/infra/datapath/port_output.c; add a clarifying comment here or in the API docs to describe this lifecycle.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4803e5e and 7711493.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (15)
  • modules/infra/api/stats.c (3 hunks)
  • modules/infra/control/graph.c (11 hunks)
  • modules/infra/control/graph_priv.h (1 hunks)
  • modules/infra/control/worker.c (7 hunks)
  • modules/infra/control/worker_test.c (1 hunks)
  • modules/infra/datapath/eth_output.c (3 hunks)
  • modules/infra/datapath/gr_eth.h (1 hunks)
  • modules/infra/datapath/gr_rxtx.h (1 hunks)
  • modules/infra/datapath/l1_xconnect.c (4 hunks)
  • modules/infra/datapath/main_loop.c (2 hunks)
  • modules/infra/datapath/meson.build (1 hunks)
  • modules/infra/datapath/port_output.c (1 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (1 hunks)
  • smoke/affinity_test.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • modules/infra/datapath/meson.build
  • smoke/affinity_test.sh
🧰 Additional context used
🧬 Code graph analysis (11)
modules/infra/datapath/port_output.c (5)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (520-547)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/eth_output.c (1)
  • eth_output_register_interface_type (26-33)
modules/infra/datapath/eth_output.c (4)
modules/infra/api/gr_infra.h (1)
  • gr_iface_type_name (322-338)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/control/iface.c (2)
  • iface_from_id (215-222)
  • iface_get_eth_addr (224-237)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/control/graph.c (3)
api/string.c (1)
  • astrcat (15-50)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
api/gr_errno.h (1)
  • errno_set (9-12)
modules/infra/datapath/port_rx.c (2)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/datapath/trace.c (2)
  • gr_mbuf_trace_add (520-547)
  • trace_log_packet (354-504)
modules/infra/control/graph_priv.h (1)
modules/infra/control/graph.c (3)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
  • rte_node_t (259-283)
modules/infra/control/worker_test.c (1)
modules/infra/control/graph.c (2)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
modules/infra/control/worker.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/iface.c (1)
  • iface_next (198-213)
modules/infra/control/graph.c (2)
  • worker_graph_reload_all (304-322)
  • worker_graph_reload (207-230)
modules/infra/datapath/main_loop.c (1)
main/sort.c (1)
  • topo_sort (53-97)
modules/infra/api/stats.c (2)
modules/infra/api/gr_infra.h (2)
  • gr_iface_type_name (322-338)
  • gr_iface_mode_name (341-351)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/port_tx.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/trace.c (3)
  • trace_log_packet (354-504)
  • gr_mbuf_trace_add (520-547)
  • gr_mbuf_trace_finish (549-565)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/port_rx.c (1)
  • rxtx_trace_format (25-28)
modules/infra/datapath/gr_eth.h (1)
modules/infra/datapath/eth_output.c (1)
  • eth_output_register_interface_type (26-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: rpm
  • GitHub Check: deb
🔇 Additional comments (7)
modules/infra/api/stats.c (1)

60-61: Prefix match correct; RX_NODE_FMT defines all queue nodes starting with "port_rx-".

modules/infra/datapath/gr_eth.h (1)

39-39: API addition looks good.

Signature matches eth_output.c and includes cover types.

modules/infra/control/worker_test.c (1)

55-56: Mocks updated to new signatures — OK.

Aligns with graph.c reload APIs taking ports vectors.

modules/infra/control/graph.c (4)

187-194: Verify edge assignment completeness.

If the loop doesn't find a matching tx_node_name, out->edges[ctx->port_id] remains RTE_EDGE_ID_INVALID. Confirm this is intentional for disabled queues or add error handling if all enabled txqs must have valid edges.


358-359: LGTM: port_output_node special handling.

The special cases for port_output_node correctly handle its dynamic edge count and edge retrieval.

Also applies to: 386-389, 407-407


464-472: LGTM: Node tracking initialization.

Correctly identifies and tracks port_rx_node, port_tx_node, and port_output_node while populating base_node_names for other nodes.


492-494: LGTM: Vector cleanup.

Proper cleanup of the new node name vectors.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/infra/datapath/eth_output.c (1)

80-83: Use updated iface pointer for stats.

iface still references the pre-VLAN iface. After we rewrite priv->iface, stats should index priv->iface->id; otherwise VLAN packets keep updating the sub-interface stats. Use the refreshed pointer.

♻️ Duplicate comments (11)
modules/infra/control/graph_priv.h (1)

14-14: Remove unused declaration worker_graph_nodes_del_unused.

No definition or usage exists. Drop it.

As noted in previous review:

 int worker_graph_reload(struct worker *, gr_vec struct iface_info_port **);
 int worker_graph_reload_all(gr_vec struct iface_info_port **);
-void worker_graph_nodes_del_unused(gr_vec rte_node_t *);
 void worker_graph_free(struct worker *);
modules/infra/datapath/port_output.c (2)

19-21: Use node->ctx (not ctx_ptr) and free the correct pointer.

rte_node exposes ctx; ctx_ptr will not compile and fini frees the wrong member.

As flagged in previous review: apply this fix:

-	const struct port_output_edges *ctx = node->ctx_ptr;
+	const struct port_output_edges *ctx = node->ctx;

37-39: Free node->ctx, not ctx_ptr.

As flagged in previous review:

-static void port_output_fini(const struct rte_graph *, struct rte_node *node) {
-	rte_free(node->ctx_ptr);
+static void port_output_fini(const struct rte_graph *, struct rte_node *node) {
+	rte_free(node->ctx);
modules/infra/datapath/port_tx.c (1)

42-50: Don't mutate mbufs after tx_burst; move tracing before transmit.

Accessing/annotating mbufs after handing them to the driver is racy (acknowledged by FIXME).

As flagged in previous review, add and finish the trace before tx_burst:

+	for (unsigned i = 0; i < nb_objs; i++) {
+		if (gr_mbuf_is_traced(mbufs[i])) {
+			struct port_queue *t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
+			*t = *ctx;
+			gr_mbuf_trace_finish(mbufs[i]);
+		}
+	}
+
 	tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
 	if (tx_ok < nb_objs)
 		rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);
-
-	for (unsigned i = 0; i < tx_ok; i++) {
-		// FIXME racy: we are operating on mbufs already passed to driver
-		if (gr_mbuf_is_traced(mbufs[i])) {
-			struct port_queue *t;
-			t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
-			*t = *ctx;
-			gr_mbuf_trace_finish(mbufs[i]);
-		}
-	}
modules/infra/control/worker.c (2)

170-178: Leak ports on gr_vec_add failure.

If gr_vec_add fails (OOM), you leak any already-added ports and call worker_graph_reload_all with NULL/partial ports.

As flagged in previous review, check gr_vec_add return and free ports on failure:

 	while ((iface = iface_next(GR_IFACE_TYPE_PORT, iface)) != NULL) {
 		struct iface_info_port *port = (struct iface_info_port *)iface->info;
-		if (port->port_id != p->port_id)
-			gr_vec_add(ports, port);
+		if (port->port_id != p->port_id) {
+			if (gr_vec_add(ports, port) < 0) {
+				gr_vec_free(ports);
+				return errno_set(ENOMEM);
+			}
+		}
 	}

208-222: Missing cleanup when gr_vec_add fails.

Same as unplug: if gr_vec_add returns <0, you leak and pass incomplete ports to worker_graph_reload_all.

As flagged in previous review, check every gr_vec_add and bail on failure:

 	while ((iface = iface_next(GR_IFACE_TYPE_PORT, iface)) != NULL) {
 		struct iface_info_port *port = (struct iface_info_port *)iface->info;
 		if (port->port_id == p->port_id)
 			found = true;
-		gr_vec_add(ports, port);
+		if (gr_vec_add(ports, port) < 0) {
+			gr_vec_free(ports);
+			return errno_set(ENOMEM);
+		}
 	}
-	if (!found)
-		gr_vec_add(ports, p);
+	if (!found) {
+		if (gr_vec_add(ports, p) < 0) {
+			gr_vec_free(ports);
+			return errno_set(ENOMEM);
+		}
+	}
modules/infra/control/graph.c (5)

113-115: Check astrcat and gr_vec_add return values.

Lines 113-115 build rx_nodes without checking if astrcat returns NULL (OOM) or if gr_vec_add fails. Leaks memory.

As flagged in previous review:

 		char *name = astrcat(NULL, RX_NODE_FMT, qmap->port_id, qmap->queue_id);
+		if (name == NULL) {
+			ret = -ENOMEM;
+			goto out;
+		}
-		gr_vec_add(rx_nodes, name);
-		gr_vec_add(graph_nodes, name);
+		if (gr_vec_add(rx_nodes, name) < 0 || gr_vec_add(graph_nodes, name) < 0) {
+			free(name);
+			ret = -ENOMEM;
+			goto out;
+		}

141-141: Null-check rte_graph_node_get_by_name before dereferencing.

Line 140 gets node; line 141 dereferences without NULL check → segfault if node doesn't exist.

As flagged in previous review, add NULL check:

 	node = rte_graph_node_get_by_name(graph_name, node_name);
+	if (node == NULL) {
+		ret = -ENOENT;
+		goto out;
+	}
 	struct rx_node_ctx *ctx = (struct rx_node_ctx *)node->ctx;

Apply similar checks at lines 156, 181, 198.


148-148: Assertion failure risk: ctx->iface could be NULL.

If ports vector is incomplete or stale, this assertion aborts. Return an error instead.

As flagged in previous review:

-		assert(ctx->iface != NULL);
+		if (ctx->iface == NULL) {
+			ret = -ENODEV;
+			goto out;
+		}

239-257: Handle allocation/clone failures instead of asserts in ensure_queue_node.

Abort on allocation/clone failures is brittle. Return NULL on failure and let the caller propagate an error.

As flagged in previous review:

 	char *name = astrcat(NULL, name_fmt, port_id, queue_id);
-	assert(name != NULL);
+	if (name == NULL)
+		return NULL;
 
 	// ... search old_list ...
 
 	if (rte_node_from_name(name) == RTE_NODE_ID_INVALID) {
 		rte_node_t node = rte_node_clone(base_node, strstr(name, "-") + 1);
-		assert(node != RTE_NODE_ID_INVALID);
+		if (node == RTE_NODE_ID_INVALID) {
+			free(name);
+			return NULL;
+		}
 	}

259-302: Propagate errors from worker_graph_nodes_add_missing; replace asserts on edge updates.

If ensure_queue_node fails or edge updates fail, return an error instead of aborting.

As flagged in previous review, check ensure_queue_node for NULL and handle edge update failures:

 	gr_vec_foreach (struct iface_info_port *port, ports) {
 		for (uint16_t rxq = 0; rxq < port->n_rxq; rxq++) {
 			name = ensure_queue_node(
 				port_rx_node, RX_NODE_FMT, port->port_id, rxq, old_rx
 			);
+			if (name == NULL)
+				goto err;
 			gr_vec_add(rx_node_names, name);
 		}
 		// similar for txq...
 	}
 
 	edge = rte_node_edge_update(
 		port_output_node, 0, (const char **)tx_node_names, gr_vec_len(tx_node_names)
 	);
-	assert(edge != RTE_EDGE_ID_INVALID);
+	if (edge == RTE_EDGE_ID_INVALID)
+		goto err;
 	edge = rte_node_edge_shrink(port_output_node, gr_vec_len(tx_node_names));
-	assert(edge != RTE_EDGE_ID_INVALID);
+	if (edge == RTE_EDGE_ID_INVALID)
+		goto err;
 
 	return unused_nodes;
+err:
+	gr_strvec_free(old_rx);
+	gr_strvec_free(old_tx);
+	gr_vec_free(unused_nodes);
+	errno = (errno != 0) ? errno : EINVAL;
+	return NULL;

Update worker_graph_reload_all to handle NULL as error.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7711493 and 0ba39fa.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (15)
  • modules/infra/api/stats.c (3 hunks)
  • modules/infra/control/graph.c (11 hunks)
  • modules/infra/control/graph_priv.h (1 hunks)
  • modules/infra/control/worker.c (7 hunks)
  • modules/infra/control/worker_test.c (1 hunks)
  • modules/infra/datapath/eth_output.c (3 hunks)
  • modules/infra/datapath/gr_eth.h (1 hunks)
  • modules/infra/datapath/gr_rxtx.h (1 hunks)
  • modules/infra/datapath/l1_xconnect.c (4 hunks)
  • modules/infra/datapath/main_loop.c (2 hunks)
  • modules/infra/datapath/meson.build (1 hunks)
  • modules/infra/datapath/port_output.c (1 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (1 hunks)
  • smoke/affinity_test.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • modules/infra/datapath/meson.build
  • modules/infra/datapath/gr_eth.h
  • modules/infra/control/worker_test.c
  • modules/infra/api/stats.c
  • modules/infra/datapath/port_rx.c
  • modules/infra/datapath/gr_rxtx.h
🧰 Additional context used
🧬 Code graph analysis (7)
modules/infra/datapath/eth_output.c (4)
modules/infra/api/gr_infra.h (1)
  • gr_iface_type_name (322-338)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/control/iface.c (2)
  • iface_from_id (215-222)
  • iface_get_eth_addr (224-237)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/main_loop.c (1)
main/sort.c (1)
  • topo_sort (53-97)
modules/infra/control/worker.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/iface.c (1)
  • iface_next (198-213)
modules/infra/control/graph.c (2)
  • worker_graph_reload_all (304-322)
  • worker_graph_reload (207-230)
modules/infra/control/graph_priv.h (1)
modules/infra/control/graph.c (3)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
  • rte_node_t (259-283)
modules/infra/datapath/port_tx.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/trace.c (3)
  • trace_log_packet (354-504)
  • gr_mbuf_trace_add (520-547)
  • gr_mbuf_trace_finish (549-565)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/port_rx.c (1)
  • rxtx_trace_format (25-28)
modules/infra/datapath/port_output.c (5)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (520-547)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/eth_output.c (1)
  • eth_output_register_interface_type (26-33)
modules/infra/control/graph.c (3)
api/string.c (1)
  • astrcat (15-50)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
api/gr_errno.h (1)
  • errno_set (9-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: deb
  • GitHub Check: rpm
🔇 Additional comments (2)
smoke/affinity_test.sh (1)

33-34: Keep the rxq count consistent (or justify the bump).

We end the test with datapath CPUs set to 2,3 (two CPUs) yet we recreate p1 with rxqs 3. That leaves an extra queue with no corresponding CPU mapping and changes the final topology compared to the start of the script. Please either restore rxqs 2 or add follow-up steps to map the new queue so the test leaves the system in a well-defined state.

modules/infra/datapath/main_loop.c (1)

222-222: LGTM!

Setting w->stats to NULL before reload prevents readers from accessing stale data during reconfiguration.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (17)
modules/infra/datapath/main_loop.c (3)

111-117: Still leaking ctx->stats on w_stats allocation failure.

If rte_zmalloc_socket fails, ctx->stats (allocated at line 105) is not freed. This was flagged in a previous review but remains unaddressed.

Apply this diff:

 	ctx->w_stats = rte_zmalloc_socket(__func__, len, RTE_CACHE_LINE_SIZE, graph->socket);
 	if (ctx->w_stats == NULL) {
+		rte_graph_cluster_stats_destroy(ctx->stats);
+		ctx->stats = NULL;
 		LOG(ERR, "rte_zmalloc_socket: %s", rte_strerror(rte_errno));
 		return -rte_errno;
 	}

118-131: Good fix for node_to_index, but ctx->stats still leaks on failure.

Line 118 properly addresses the previous review concern. However, if rte_calloc_socket fails, ctx->stats is not freed. This was flagged in a previous review but remains unaddressed.

Apply this diff:

 	if (ctx->node_to_index == NULL) {
 		rte_free(ctx->w_stats);
 		ctx->w_stats = NULL;
+		rte_graph_cluster_stats_destroy(ctx->stats);
+		ctx->stats = NULL;
 		LOG(ERR, "rte_calloc_socket: %s", rte_strerror(rte_errno));
 		return -rte_errno;
 	}

134-154: Missing null check, topo_sort error handling, and resource cleanup.

Three issues remain from the previous review:

  1. No null check after gr_vec_add at line 139. If allocation fails, nodes is NULL and qsort at line 142 will crash.
  2. topo_sort errors are logged at line 144 but execution continues with corrupted topology.
  3. If errors occur, allocated resources (ctx->stats, ctx->w_stats, ctx->node_to_index) are not freed.

Apply this diff:

 	rte_graph_foreach_node (count, off, graph, node)
 		gr_vec_add(nodes, node);
+	if (nodes == NULL) {
+		rte_free(ctx->node_to_index);
+		ctx->node_to_index = NULL;
+		rte_free(ctx->w_stats);
+		ctx->w_stats = NULL;
+		rte_graph_cluster_stats_destroy(ctx->stats);
+		ctx->stats = NULL;
+		LOG(ERR, "gr_vec_add failed");
+		return errno_set(ENOMEM);
+	}
 
 	// sort by name first to ensure stable topo_sort
 	qsort(nodes, count, sizeof(void *), node_name_cmp);
-	if (topo_sort((gr_vec const void **)nodes, node_is_child) < 0)
+	if (topo_sort((gr_vec const void **)nodes, node_is_child) < 0) {
 		LOG(ERR, "topo_sort failed: %s", strerror(errno));
+		gr_vec_free(nodes);
+		rte_free(ctx->node_to_index);
+		ctx->node_to_index = NULL;
+		rte_free(ctx->w_stats);
+		ctx->w_stats = NULL;
+		rte_graph_cluster_stats_destroy(ctx->stats);
+		ctx->stats = NULL;
+		return -errno;
+	}
modules/infra/datapath/port_output.c (1)

19-21: Use node->ctx and free node->ctx (not ctx_ptr)

ctx_ptr is invalid; will not compile and fini frees wrong member.

Apply:

- const struct port_output_edges *ctx = node->ctx_ptr;
+ const struct port_output_edges *ctx = node->ctx;
...
-static void port_output_fini(const struct rte_graph *, struct rte_node *node) {
-	rte_free(node->ctx_ptr);
+static void port_output_fini(const struct rte_graph *, struct rte_node *node) {
+	rte_free(node->ctx);

Also applies to: 37-39

modules/infra/datapath/port_tx.c (1)

38-50: Move tracing before tx_burst; don’t mutate after handing to driver

Annotating mbufs post tx_burst is racy. Record and finish traces before transmit.

Apply:

-    tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
-    if (tx_ok < nb_objs)
-        rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);
-
-    for (unsigned i = 0; i < tx_ok; i++) {
-        // FIXME racy: we are operating on mbufs already passed to driver
-        if (gr_mbuf_is_traced(mbufs[i])) {
-            struct port_queue *t;
-            t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
-            *t = *ctx;
-            gr_mbuf_trace_finish(mbufs[i]);
-        }
-    }
+    for (unsigned i = 0; i < nb_objs; i++) {
+        if (gr_mbuf_is_traced(mbufs[i])) {
+            struct port_queue *t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
+            *t = *ctx;
+            gr_mbuf_trace_finish(mbufs[i]);
+        }
+    }
+
+    tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
+    if (tx_ok < nb_objs)
+        rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);
modules/infra/control/graph_priv.h (1)

12-14: Remove unused declaration worker_graph_nodes_del_unused

No definition or usage; avoid stale API.

Apply:

 int worker_graph_reload(struct worker *, gr_vec struct iface_info_port **);
 int worker_graph_reload_all(gr_vec struct iface_info_port **);
-void worker_graph_nodes_del_unused(gr_vec rte_node_t *);
 void worker_graph_free(struct worker *);
modules/infra/datapath/eth_output.c (1)

61-67: Guard against NULL parent iface and invalid type before edge lookup

iface_from_id may return NULL; also validate type bounds before indexing iface_type_edges.

Apply:

-            priv->iface = iface_from_id(sub->parent_id);
+            priv->iface = iface_from_id(sub->parent_id);
+            if (unlikely(priv->iface == NULL)) {
+                edge = INVAL;
+                goto next;
+            }
         }
 
-        edge = iface_type_edges[priv->iface->type];
-        if (edge == INVAL)
+        if (unlikely(priv->iface->type >= GR_IFACE_TYPE_COUNT)) {
+            edge = INVAL;
+            goto next;
+        }
+        edge = iface_type_edges[priv->iface->type];
+        if (edge == INVAL)
             goto next;
modules/infra/control/graph.c (7)

141-141: Check rte_graph_node_get_by_name for NULL.

If the node doesn't exist in the graph, this returns NULL, causing a segfault when dereferencing node->ctx.

Add NULL check:

 	node = rte_graph_node_get_by_name(graph_name, node_name);
+	if (node == NULL) {
+		ret = -ENOENT;
+		goto out;
+	}
 	struct rx_node_ctx *ctx = (struct rx_node_ctx *)node->ctx;

Apply similar checks at lines 156, 181, and 198.


286-291: Replace assertions with error handling.

rte_node_edge_update and rte_node_edge_shrink can fail. Return NULL instead of aborting.

Replace the assertions:

 	edge = rte_node_edge_update(
 		port_output_node, 0, (const char **)tx_node_names, gr_vec_len(tx_node_names)
 	);
-	assert(edge != RTE_EDGE_ID_INVALID);
+	if (edge == RTE_EDGE_ID_INVALID)
+		goto err;
 	edge = rte_node_edge_shrink(port_output_node, gr_vec_len(tx_node_names));
-	assert(edge != RTE_EDGE_ID_INVALID);
+	if (edge == RTE_EDGE_ID_INVALID)
+		goto err;

The err label should be added as suggested in the previous comment.


250-254: Replace assertion with error handling.

rte_node_clone can fail (e.g., resource exhaustion). Return NULL instead of aborting.

Replace the assertion:

 		// node does not exist yet, clone it from the base
 		rte_node_t node = rte_node_clone(base_node, strstr(name, "-") + 1);
-		assert(node != RTE_NODE_ID_INVALID);
+		if (node == RTE_NODE_ID_INVALID) {
+			free(name);
+			return NULL;
+		}

240-240: Replace assertion with error handling.

astrcat can return NULL on OOM. Aborting is brittle; return NULL to let the caller handle the failure.

Replace the assertion:

 	char *name = astrcat(NULL, name_fmt, port_id, queue_id);
-	assert(name != NULL);
+	if (name == NULL)
+		return NULL;

148-148: Replace assertion with error handling.

If the ports vector is incomplete or stale, ctx->iface could remain NULL, causing an abort. Return an error instead.

Replace the assertion:

-	assert(ctx->iface != NULL);
+	if (ctx->iface == NULL) {
+		ret = -ENODEV;
+		goto out;
+	}

304-308: Check worker_graph_nodes_add_missing for NULL.

If node management fails, it returns NULL. Proceeding without a check causes undefined behavior.

Add error handling:

 	gr_vec rte_node_t *unused_nodes = worker_graph_nodes_add_missing(ports);
+	if (unused_nodes == NULL)
+		return errno_set(errno ? errno : EINVAL);

113-115: Check astrcat and gr_vec_add return values.

astrcat returns NULL on OOM, and gr_vec_add can fail. Dereferencing or proceeding without checks risks crashes or memory leaks.

Add error handling:

 		char *name = astrcat(NULL, RX_NODE_FMT, qmap->port_id, qmap->queue_id);
+		if (name == NULL) {
+			ret = -ENOMEM;
+			goto out;
+		}
-		gr_vec_add(rx_nodes, name);
-		gr_vec_add(graph_nodes, name);
+		if (gr_vec_add(rx_nodes, name) < 0) {
+			free(name);
+			ret = -ENOMEM;
+			goto out;
+		}
+		if (gr_vec_add(graph_nodes, name) < 0) {
+			ret = -ENOMEM;
+			goto out;
+		}
modules/infra/control/worker.c (3)

211-218: Check gr_vec_add return values.

Same issue as port_unplug: if gr_vec_add fails, we leak or pass an incomplete ports vector to the reload function.

Add error handling:

 	while ((iface = iface_next(GR_IFACE_TYPE_PORT, iface)) != NULL) {
 		struct iface_info_port *port = (struct iface_info_port *)iface->info;
 		if (port->port_id == p->port_id)
 			found = true;
-		gr_vec_add(ports, port);
+		if (gr_vec_add(ports, port) < 0) {
+			gr_vec_free(ports);
+			return errno_set(ENOMEM);
+		}
 	}
 	if (!found)
-		gr_vec_add(ports, p);
+		if (gr_vec_add(ports, p) < 0) {
+			gr_vec_free(ports);
+			return errno_set(ENOMEM);
+		}

170-174: Check gr_vec_add return value.

gr_vec_add can fail (return < 0) on OOM. If it fails here, we proceed with a partially-built or NULL ports and still call worker_graph_reload_all(ports), which expects a complete vector.

Check the return value and handle failure:

 	while ((iface = iface_next(GR_IFACE_TYPE_PORT, iface)) != NULL) {
 		struct iface_info_port *port = (struct iface_info_port *)iface->info;
 		if (port->port_id != p->port_id)
-			gr_vec_add(ports, port);
+			if (gr_vec_add(ports, port) < 0) {
+				ret = errno_set(ENOMEM);
+				goto end;
+			}
 	}
+
+end:
+	if (ret < 0) {
+		gr_vec_free(ports);
+		return ret;
+	}

Also adjust the existing return ret; at line 181 to use the end label.


278-281: Check gr_vec_add return value.

If gr_vec_add fails here, we continue with a partially-built ports vector and pass it to worker_graph_reload.

Add error handling:

 	gr_vec struct iface_info_port **ports = NULL;
 	struct iface *iface = NULL;
 	while ((iface = iface_next(GR_IFACE_TYPE_PORT, iface)) != NULL)
-		gr_vec_add(ports, (struct iface_info_port *)iface->info);
+		if (gr_vec_add(ports, (struct iface_info_port *)iface->info) < 0) {
+			ret = errno_set(ENOMEM);
+			goto end;
+		}
🧹 Nitpick comments (1)
modules/infra/datapath/eth_output.c (1)

80-83: Confirm stats target (pre-translation vs parent iface)

You count tx on original iface (pre-VLAN) while tracing uses parent (priv->iface). Is that intentional? If not, switch to priv->iface->id.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ba39fa and 7f43f1d.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (15)
  • modules/infra/api/stats.c (3 hunks)
  • modules/infra/control/graph.c (11 hunks)
  • modules/infra/control/graph_priv.h (1 hunks)
  • modules/infra/control/worker.c (7 hunks)
  • modules/infra/control/worker_test.c (1 hunks)
  • modules/infra/datapath/eth_output.c (4 hunks)
  • modules/infra/datapath/gr_eth.h (1 hunks)
  • modules/infra/datapath/gr_rxtx.h (1 hunks)
  • modules/infra/datapath/l1_xconnect.c (4 hunks)
  • modules/infra/datapath/main_loop.c (2 hunks)
  • modules/infra/datapath/meson.build (1 hunks)
  • modules/infra/datapath/port_output.c (1 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (1 hunks)
  • smoke/affinity_test.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • smoke/affinity_test.sh
  • modules/infra/control/worker_test.c
  • modules/infra/datapath/l1_xconnect.c
  • modules/infra/datapath/gr_eth.h
  • modules/infra/datapath/gr_rxtx.h
  • modules/infra/datapath/port_rx.c
  • modules/infra/api/stats.c
🧰 Additional context used
🧬 Code graph analysis (7)
modules/infra/datapath/main_loop.c (1)
main/sort.c (1)
  • topo_sort (53-97)
modules/infra/control/worker.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/iface.c (1)
  • iface_next (198-213)
modules/infra/control/graph.c (2)
  • worker_graph_reload_all (304-322)
  • worker_graph_reload (207-230)
modules/infra/datapath/eth_output.c (4)
modules/infra/api/gr_infra.h (1)
  • gr_iface_type_name (322-338)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/control/iface.c (2)
  • iface_from_id (215-222)
  • iface_get_eth_addr (224-237)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/control/graph.c (3)
api/string.c (1)
  • astrcat (15-50)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
api/gr_errno.h (1)
  • errno_set (9-12)
modules/infra/datapath/port_tx.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/trace.c (3)
  • trace_log_packet (354-504)
  • gr_mbuf_trace_add (520-547)
  • gr_mbuf_trace_finish (549-565)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/port_rx.c (1)
  • rxtx_trace_format (25-28)
modules/infra/datapath/port_output.c (5)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (520-547)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/eth_output.c (1)
  • eth_output_register_interface_type (26-33)
modules/infra/control/graph_priv.h (1)
modules/infra/control/graph.c (3)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
  • rte_node_t (259-283)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-cross-aarch64
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: rpm
  • GitHub Check: deb
🔇 Additional comments (2)
modules/infra/datapath/main_loop.c (1)

222-222: LGTM!

Properly nulling the stats pointer before reload ensures safe concurrent access patterns.

modules/infra/datapath/meson.build (1)

15-17: Build sources updated correctly

New port_* sources wired; matches graph changes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (15)
modules/infra/datapath/main_loop.c (3)

111-117: Memory leak on w_stats allocation failure.

ctx->stats (allocated at line 105) is not freed when rte_zmalloc_socket fails, leaking memory until the next reload or shutdown. This issue was flagged in previous reviews.

Apply this diff:

 	ctx->w_stats = rte_zmalloc_socket(__func__, len, RTE_CACHE_LINE_SIZE, graph->socket);
 	if (ctx->w_stats == NULL) {
+		rte_graph_cluster_stats_destroy(ctx->stats);
+		ctx->stats = NULL;
 		LOG(ERR, "rte_zmalloc_socket: %s", rte_strerror(rte_errno));
 		return -rte_errno;
 	}

118-131: Memory leak on node_to_index allocation failure.

While the free at line 118 properly addresses a previous concern, ctx->stats (allocated at line 105) is not freed when rte_calloc_socket fails, leaking memory until the next reload or shutdown. This issue was flagged in previous reviews.

Apply this diff:

 	if (ctx->node_to_index == NULL) {
 		rte_free(ctx->w_stats);
 		ctx->w_stats = NULL;
+		rte_graph_cluster_stats_destroy(ctx->stats);
+		ctx->stats = NULL;
 		LOG(ERR, "rte_calloc_socket: %s", rte_strerror(rte_errno));
 		return -rte_errno;
 	}

134-154: Missing error handling for allocation failures and topo_sort.

Two critical issues flagged in previous reviews remain unaddressed:

  1. No null check after gr_vec_add at line 139. If allocation fails, nodes is NULL and qsort will crash.
  2. topo_sort errors at line 143 are logged but execution continues, resulting in incorrect topology ordering.

Apply this diff:

 	rte_graph_foreach_node (count, off, graph, node)
 		gr_vec_add(nodes, node);
+	if (nodes == NULL) {
+		rte_free(ctx->node_to_index);
+		ctx->node_to_index = NULL;
+		rte_free(ctx->w_stats);
+		ctx->w_stats = NULL;
+		rte_graph_cluster_stats_destroy(ctx->stats);
+		ctx->stats = NULL;
+		LOG(ERR, "gr_vec_add failed");
+		return errno_set(ENOMEM);
+	}
 
 	// sort by name first to ensure stable topo_sort
 	qsort(nodes, count, sizeof(void *), node_name_cmp);
-	if (topo_sort((gr_vec const void **)nodes, node_is_child) < 0)
+	if (topo_sort((gr_vec const void **)nodes, node_is_child) < 0) {
 		LOG(ERR, "topo_sort failed: %s", strerror(errno));
+		gr_vec_free(nodes);
+		rte_free(ctx->node_to_index);
+		ctx->node_to_index = NULL;
+		rte_free(ctx->w_stats);
+		ctx->w_stats = NULL;
+		rte_graph_cluster_stats_destroy(ctx->stats);
+		ctx->stats = NULL;
+		return -errno;
+	}
modules/infra/datapath/eth_output.c (2)

61-66: Critical: NULL dereference after VLAN parent lookup.

iface_from_id(sub->parent_id) can return NULL. Line 64 dereferences priv->iface->type without checking, causing a segfault if parent is invalid.

Add NULL check after Line 61:

 		priv->iface = iface_from_id(sub->parent_id);
+		if (unlikely(priv->iface == NULL)) {
+			edge = INVAL;
+			goto next;
+		}
 		src_mac = sub->mac;

64-66: Major: Bounds check missing for iface_type array access.

Line 64 indexes iface_type_edges with priv->iface->type without verifying it's within bounds (< GR_IFACE_TYPE_COUNT).

Add bounds check:

 	edge = iface_type_edges[priv->iface->type];
+	if (unlikely(priv->iface->type >= GR_IFACE_TYPE_COUNT))
+		edge = INVAL;
 	if (edge == INVAL)
modules/infra/control/graph_priv.h (1)

14-14: Minor: Unused declaration.

This declaration appears unused based on past review comments. Remove it or provide an implementation to avoid confusion.

Apply this diff:

 int worker_graph_reload(struct worker *, gr_vec struct iface_info_port **);
 int worker_graph_reload_all(gr_vec struct iface_info_port **);
-void worker_graph_nodes_del_unused(gr_vec rte_node_t *);
 void worker_graph_free(struct worker *);
modules/infra/control/worker.c (2)

170-174: Check gr_vec_add return value.

gr_vec_add at line 173 can fail (OOM). If it returns an error, ports will be incomplete and the subsequent reload may fail silently or behave unexpectedly. Check the return value, free ports, and return the error.


208-218: Check gr_vec_add return values.

Lines 215 and 218 call gr_vec_add without checking for failure. If either fails, ports will be incomplete or NULL, leading to incorrect graph reload. Add error checks, free ports, and propagate the error.

Apply this diff:

 	gr_vec struct iface_info_port **ports = NULL;
 	struct iface *iface = NULL;
 	bool found = false;
 	while ((iface = iface_next(GR_IFACE_TYPE_PORT, iface)) != NULL) {
 		struct iface_info_port *port = (struct iface_info_port *)iface->info;
 		if (port->port_id == p->port_id)
 			found = true;
-		gr_vec_add(ports, port);
+		if (gr_vec_add(ports, port) < 0) {
+			gr_vec_free(ports);
+			return errno_set(ENOMEM);
+		}
 	}
 	if (!found)
-		gr_vec_add(ports, p);
+		if (gr_vec_add(ports, p) < 0) {
+			gr_vec_free(ports);
+			return errno_set(ENOMEM);
+		}
modules/infra/control/graph.c (7)

113-115: Check astrcat and gr_vec_add return values.

Line 113 calls astrcat which can return NULL on OOM. Lines 114-115 call gr_vec_add without checking for failure. This can lead to NULL dereference or incomplete node lists.


136-152: NULL-check rte_graph_node_get_by_name and replace assertion.

Line 140 calls rte_graph_node_get_by_name which can return NULL. Line 141 dereferences without checking. Line 148 has an assertion that will abort if ctx->iface is NULL. These should be proper error checks.


154-160: NULL-check rte_graph_node_get_by_name.

Line 156 calls rte_graph_node_get_by_name and line 157 dereferences the result without checking for NULL.


179-195: NULL-check rte_graph_node_get_by_name.

Line 181 calls rte_graph_node_get_by_name and line 183 dereferences without NULL check. If the node doesn't exist in the graph, this will segfault.


198-198: NULL-check rte_graph_node_get_by_name.

Line 198 calls rte_graph_node_get_by_name and immediately dereferences ->ctx_ptr without checking for NULL.


232-257: Replace assertions with error handling in ensure_queue_node.

Line 240 asserts that astrcat succeeded. Line 252 uses strstr result without checking for NULL. Line 253 asserts rte_node_clone succeeded. All should return NULL on failure instead of aborting.


259-302: Propagate errors instead of assertions in worker_graph_nodes_add_missing.

Lines 272-282 call ensure_queue_node and gr_vec_add without checking for failure. Lines 289 and 291 use assertions on edge operations. Function should return NULL on error and let callers handle it.

🧹 Nitpick comments (3)
modules/infra/control/worker.c (3)

343-343: Unchecked gr_vec_add for cpus vector.

Line 343 calls gr_vec_add without checking for failure. While less critical than port-related failures, consistency in error handling is good practice.


400-400: Unchecked gr_vec_add for rxqs vector.

Line 400 adds to worker->rxqs without checking for failure. Consider adding error handling for consistency.


416-416: Unchecked gr_vec_add for txqs vector.

Line 416 adds to worker->txqs without checking for failure. Consider adding error handling for consistency.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f43f1d and 18eabea.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (24)
  • main/gr_vec.h (2 hunks)
  • main/vec_test.c (1 hunks)
  • modules/infra/api/gr_infra.h (2 hunks)
  • modules/infra/api/stats.c (3 hunks)
  • modules/infra/control/gr_graph.h (0 hunks)
  • modules/infra/control/graph.c (11 hunks)
  • modules/infra/control/graph_priv.h (1 hunks)
  • modules/infra/control/port.c (2 hunks)
  • modules/infra/control/worker.c (7 hunks)
  • modules/infra/control/worker_priv.h (1 hunks)
  • modules/infra/control/worker_test.c (1 hunks)
  • modules/infra/datapath/eth_output.c (4 hunks)
  • modules/infra/datapath/gr_eth.h (1 hunks)
  • modules/infra/datapath/gr_rxtx.h (1 hunks)
  • modules/infra/datapath/l1_xconnect.c (4 hunks)
  • modules/infra/datapath/main_loop.c (4 hunks)
  • modules/infra/datapath/meson.build (1 hunks)
  • modules/infra/datapath/port_output.c (1 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (1 hunks)
  • modules/infra/datapath/rx.c (0 hunks)
  • modules/infra/datapath/tx.c (0 hunks)
  • smoke/_init.sh (1 hunks)
  • smoke/affinity_test.sh (1 hunks)
💤 Files with no reviewable changes (3)
  • modules/infra/control/gr_graph.h
  • modules/infra/datapath/rx.c
  • modules/infra/datapath/tx.c
🚧 Files skipped from review as they are similar to previous changes (9)
  • modules/infra/datapath/l1_xconnect.c
  • main/vec_test.c
  • modules/infra/datapath/port_tx.c
  • modules/infra/datapath/meson.build
  • main/gr_vec.h
  • modules/infra/datapath/port_rx.c
  • modules/infra/datapath/port_output.c
  • modules/infra/datapath/gr_eth.h
  • smoke/affinity_test.sh
🧰 Additional context used
🧬 Code graph analysis (9)
modules/infra/control/worker_priv.h (1)
modules/infra/control/worker.c (2)
  • port_unplug (145-182)
  • port_plug (184-223)
modules/infra/api/stats.c (1)
modules/infra/api/gr_infra.h (2)
  • gr_iface_type_name (322-338)
  • gr_iface_mode_name (341-351)
modules/infra/datapath/main_loop.c (1)
main/sort.c (1)
  • topo_sort (53-97)
modules/infra/control/worker.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/iface.c (1)
  • iface_next (198-213)
modules/infra/control/graph.c (2)
  • worker_graph_reload_all (304-322)
  • worker_graph_reload (207-230)
modules/infra/datapath/eth_output.c (3)
modules/infra/api/gr_infra.h (1)
  • gr_iface_type_name (322-338)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/control/iface.c (2)
  • iface_from_id (215-222)
  • iface_get_eth_addr (224-237)
modules/infra/control/port.c (1)
modules/infra/control/worker.c (2)
  • port_unplug (145-182)
  • port_plug (184-223)
modules/infra/control/worker_test.c (1)
modules/infra/control/graph.c (2)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
modules/infra/control/graph_priv.h (1)
modules/infra/control/graph.c (3)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
  • rte_node_t (259-283)
modules/infra/control/graph.c (3)
api/string.c (1)
  • astrcat (15-50)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
api/gr_errno.h (1)
  • errno_set (9-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: rpm
  • GitHub Check: deb
🔇 Additional comments (13)
modules/infra/api/gr_infra.h (1)

322-351: LGTM: Improved naming and enum coverage.

The namespace prefix clarifies these are public helpers, and explicit COUNT case handling is good defensive practice.

modules/infra/api/stats.c (3)

60-62: Correct adaptation for per-port-queue nodes.

The prefix match correctly captures stats from nodes named port_rx-p0q0, port_rx-p1q0, etc., aligning with the new one-node-per-queue architecture.


359-359: LGTM: Updated to renamed helper.


376-376: LGTM: Updated to renamed helper.

modules/infra/datapath/eth_output.c (2)

26-33: LGTM: Clean interface-type routing registration.

The registration function properly validates type bounds and prevents double-registration.


74-77: LGTM: Proper error handling for MAC address lookup.

The NO_MAC edge correctly handles the failure case.

modules/infra/datapath/gr_rxtx.h (1)

14-36: LGTM: Clean API for per-port-queue architecture.

The macros, structures, and function declaration support the new one-node-per-queue design cleanly.

modules/infra/control/port.c (2)

163-163: LGTM: Updated to new signature.

Passing the entire port structure instead of just the ID aligns with the updated API.


288-288: LGTM: Updated to new signature.

modules/infra/control/worker_test.c (1)

55-56: LGTM: Mocks updated to match new signatures.

The mock declarations correctly reflect the updated API signatures with ports context.

modules/infra/control/worker_priv.h (1)

9-10: LGTM: Signature change supports richer port context.

Passing the entire port structure instead of just the ID enables more flexible port operations.

modules/infra/control/graph_priv.h (1)

12-13: LGTM: Updated signatures for port-aware graph reload.

The ports parameter enables per-port node creation and management.

modules/infra/control/graph.c (1)

118-119: No additional error handling needed for gr_vec_extend. It aborts on allocation failure internally.

Likely an incorrect or invalid review comment.

@rjarry rjarry force-pushed the graph-rxtx branch 2 times, most recently from 84715ae to 2368d6d Compare October 8, 2025 21:17
@rjarry rjarry requested a review from aharivel October 8, 2025 22:30
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/infra/datapath/eth_output.c (1)

52-97: Guard the VLAN parent lookup

iface_from_id(sub->parent_id) can return NULL (stale VLAN, unplugged port, bad config). We immediately dereference it, which will crash the datapath. Add a fast drop when the parent is missing and skip further processing. For example:

 			priv->iface = iface_from_id(sub->parent_id);
+			if (unlikely(priv->iface == NULL)) {
+				edge = INVAL;
+				goto next;
+			}
 			src_mac = sub->mac;
♻️ Duplicate comments (2)
modules/infra/datapath/port_tx.c (1)

38-49: Finish tracing before handing mbufs to the driver

We still touch mbufs after rte_eth_tx_burst. Once the driver owns them, this is racy and can scribble over freed buffers. Record and finish any tracing data before the burst, then enqueue leftovers. Suggested fix:

-	tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
-	if (tx_ok < nb_objs)
-		rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);
-
-	for (unsigned i = 0; i < tx_ok; i++) {
-		// FIXME racy: we are operating on mbufs already passed to driver
-		if (gr_mbuf_is_traced(mbufs[i])) {
-			struct port_queue *t;
-			t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
-			*t = *ctx;
-			gr_mbuf_trace_finish(mbufs[i]);
-		}
-	}
+	for (unsigned i = 0; i < nb_objs; i++) {
+		if (gr_mbuf_is_traced(mbufs[i])) {
+			struct port_queue *t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
+			*t = *ctx;
+			gr_mbuf_trace_finish(mbufs[i]);
+		}
+	}
+
+	tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
+	if (tx_ok < nb_objs)
+		rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);
modules/infra/control/graph.c (1)

113-302: Address previously flagged issues.

Past review comments have identified multiple critical and major issues in this file that remain unaddressed:

  • Critical: Missing NULL checks for rte_graph_node_get_by_name at lines 141, 156, 181, 198 can cause segfaults.
  • Critical: Line 252 dereferences strstr result without checking for NULL.
  • Major: Assertions at lines 148, 240, 253, 289-291 should be replaced with proper error handling.
  • Major: Missing return value checks for astrcat, gr_vec_add, ensure_queue_node at lines 113-115, 272-282, 308.

Refer to the detailed past review comments for specific fix suggestions.

🧹 Nitpick comments (1)
modules/infra/control/graph.c (1)

464-472: Verify critical node IDs were initialized.

If RX_NODE_BASE, TX_NODE_BASE, or "port_output" are not found in the node_infos list, the corresponding static variables remain uninitialized (0). Later operations (e.g., lines 286, 290, 358) rely on these IDs being valid.

After the loop, verify the IDs were set:

 		if (strcmp(reg->name, "port_output") == 0)
 			port_output_node = reg->id;
 	}
+
+	if (port_rx_node == RTE_NODE_ID_INVALID)
+		ABORT("port_rx base node not found");
+	if (port_tx_node == RTE_NODE_ID_INVALID)
+		ABORT("port_tx base node not found");
+	if (port_output_node == RTE_NODE_ID_INVALID)
+		ABORT("port_output node not found");
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84715ae and 7169af2.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (15)
  • modules/infra/api/stats.c (3 hunks)
  • modules/infra/control/graph.c (11 hunks)
  • modules/infra/control/graph_priv.h (1 hunks)
  • modules/infra/control/worker.c (7 hunks)
  • modules/infra/control/worker_test.c (1 hunks)
  • modules/infra/datapath/eth_output.c (4 hunks)
  • modules/infra/datapath/gr_eth.h (1 hunks)
  • modules/infra/datapath/gr_rxtx.h (1 hunks)
  • modules/infra/datapath/l1_xconnect.c (4 hunks)
  • modules/infra/datapath/main_loop.c (5 hunks)
  • modules/infra/datapath/meson.build (1 hunks)
  • modules/infra/datapath/port_output.c (1 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (1 hunks)
  • smoke/affinity_test.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • modules/infra/datapath/meson.build
  • modules/infra/datapath/gr_eth.h
  • modules/infra/api/stats.c
  • modules/infra/datapath/port_rx.c
  • smoke/affinity_test.sh
  • modules/infra/datapath/gr_rxtx.h
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-08T21:22:45.894Z
Learnt from: rjarry
PR: DPDK/grout#334
File: modules/infra/control/worker.c:278-281
Timestamp: 2025-10-08T21:22:45.894Z
Learning: In the codebase, `gr_vec_add` is a macro that does not return any value and cannot fail. Do not suggest checking its return value or adding error handling around it.

Applied to files:

  • modules/infra/control/worker.c
🧬 Code graph analysis (8)
modules/infra/control/worker_test.c (1)
modules/infra/control/graph.c (2)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
modules/infra/datapath/port_output.c (5)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (520-547)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/eth_output.c (1)
  • eth_output_register_interface_type (26-33)
modules/infra/control/worker.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/iface.c (1)
  • iface_next (198-213)
modules/infra/control/graph.c (2)
  • worker_graph_reload_all (304-322)
  • worker_graph_reload (207-230)
modules/infra/datapath/eth_output.c (4)
modules/infra/api/gr_infra.h (1)
  • gr_iface_type_name (322-338)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/control/iface.c (2)
  • iface_from_id (215-222)
  • iface_get_eth_addr (224-237)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/control/graph_priv.h (1)
modules/infra/control/graph.c (2)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
modules/infra/datapath/main_loop.c (1)
main/sort.c (1)
  • topo_sort (53-97)
modules/infra/control/graph.c (3)
api/string.c (1)
  • astrcat (15-50)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
api/gr_errno.h (1)
  • errno_set (9-12)
modules/infra/datapath/port_tx.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/trace.c (3)
  • trace_log_packet (354-504)
  • gr_mbuf_trace_add (520-547)
  • gr_mbuf_trace_finish (549-565)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/port_rx.c (1)
  • rxtx_trace_format (25-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-cross-aarch64
  • GitHub Check: deb
  • GitHub Check: rpm
🔇 Additional comments (5)
modules/infra/control/graph.c (1)

29-34: Clean refactoring from hash-based to vector-based node storage.

The shift from hash-backed node data to explicit vectors (base_node_names, rx_node_names, tx_node_names) and tracked node IDs is straightforward and improves clarity. Proper cleanup is present in graph_fini.

modules/infra/datapath/main_loop.c (4)

112-130: Good: Previous memory leak issues resolved.

The two-stage allocation properly frees existing buffers before reallocation (lines 113, 119), and error paths now use centralized cleanup via goto err. This addresses the previous review concerns about memory leaks on reload.


141-144: Good: topo_sort errors now properly abort.

The error handling correctly uses goto err instead of just logging, ensuring proper cleanup and preventing execution with invalid topology.


157-167: Solid error cleanup logic.

The centralized err label properly frees all allocated resources and nulls the pointers, addressing previous review concerns about incomplete cleanup on error paths.


177-182: Proper initialization and shutdown cleanup.

Explicit null initialization (lines 177-182) is good defensive programming. The atomic store at line 227 ensures safe state during reconfiguration, and line 289 properly frees the node_to_index buffer on shutdown.

Also applies to: 227-227, 289-289

@rjarry rjarry force-pushed the graph-rxtx branch 3 times, most recently from 6a260b2 to 8bf42d1 Compare October 9, 2025 11:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (9)
modules/infra/datapath/main_loop.c (1)

136-140: Check for gr_vec_add allocation failure before qsort.

If gr_vec_add fails, nodes could be NULL or incomplete, causing qsort to crash or exhibit undefined behavior.

Apply this diff to add the check:

 	rte_graph_foreach_node (count, off, graph, node)
 		gr_vec_add(nodes, node);
+	if (nodes == NULL || gr_vec_len(nodes) != count) {
+		LOG(ERR, "gr_vec_add failed or incomplete node collection");
+		goto err;
+	}
 
 	// sort by name first to ensure stable topo_sort
 	qsort(nodes, count, sizeof(void *), node_name_cmp);
modules/infra/datapath/port_rx.c (1)

32-36: Validate mode before indexing edges

We still index edges[mode] without ensuring mode < GR_IFACE_MODE_COUNT. Out-of-range values scribble past the array. Guard first, then touch the slot.

 void register_interface_mode(gr_iface_mode_t mode, const char *next_node) {
-	if (edges[mode] != IFACE_MODE_UNKNOWN)
+	if (unlikely(mode >= GR_IFACE_MODE_COUNT))
+		ABORT("invalid interface mode=%u", mode);
+	if (edges[mode] != IFACE_MODE_UNKNOWN)
 		ABORT("next node already registered for interface mode %u", mode);
modules/infra/control/graph.c (7)

139-152: Handle missing iface instead of asserting

If ports is stale or omits a port, ctx->iface stays NULL. The assert will abort the process. Resolve gracefully (return error, log). Drop the assert.

-		assert(ctx->iface != NULL);
+		if (unlikely(ctx->iface == NULL)) {
+			ret = -ENODEV;
+			goto out;
+		}

141-141: Check rte_graph_node_get_by_name before dereference

rte_graph_node_get_by_name can return NULL; dereferencing node->ctx will segfault. Validate and error out.

-		node = rte_graph_node_get_by_name(graph_name, node_name);
-		struct rx_node_ctx *ctx = (struct rx_node_ctx *)node->ctx;
+		node = rte_graph_node_get_by_name(graph_name, node_name);
+		if (unlikely(node == NULL)) {
+			ret = -ENOENT;
+			goto out;
+		}
+		struct rx_node_ctx *ctx = (struct rx_node_ctx *)node->ctx;

198-198: Validate port_output node lookup

Same story: NULL from rte_graph_node_get_by_name makes ->ctx_ptr write crash. Guard it.

-	rte_graph_node_get_by_name(graph_name, "port_output")->ctx_ptr = out;
+	node = rte_graph_node_get_by_name(graph_name, "port_output");
+	if (unlikely(node == NULL)) {
+		ret = -ENOENT;
+		goto out;
+	}
+	node->ctx_ptr = out;

239-254: Replace asserts in ensure_queue_node with real error handling

astrcat/rte_node_clone can fail; assert is compiled out in release builds, leaving NULL deref. Return NULL and let callers handle the failure.

-	char *name = astrcat(NULL, name_fmt, port_id, queue_id);
-	assert(name != NULL);
+	char *name = astrcat(NULL, name_fmt, port_id, queue_id);
+	if (unlikely(name == NULL))
+		return NULL;
@@
-	if (rte_node_from_name(name) == RTE_NODE_ID_INVALID) {
-		// node does not exist yet, clone it from the base
-		rte_node_t node = rte_node_clone(base_node, strstr(name, "-") + 1);
-		assert(node != RTE_NODE_ID_INVALID);
+	if (rte_node_from_name(name) == RTE_NODE_ID_INVALID) {
+		const char *suffix = strstr(name, "-");
+		if (unlikely(suffix == NULL)) {
+			free(name);
+			return NULL;
+		}
+		rte_node_t node = rte_node_clone(base_node, suffix + 1);
+		if (node == RTE_NODE_ID_INVALID) {
+			free(name);
+			return NULL;
+		}

270-292: Propagate errors from ensure_queue_node / edge updates

We still assume success: NULL names and failed edge updates crash later. Handle failures, unwind, and return NULL so callers can abort cleanly.

 		for (uint16_t rxq = 0; rxq < port->n_rxq; rxq++) {
 			name = ensure_queue_node(
 				port_rx_node, RX_NODE_FMT, port->port_id, rxq, old_rx
 			);
-			gr_vec_add(rx_node_names, name);
+			if (unlikely(name == NULL))
+				goto err;
+			gr_vec_add(rx_node_names, name);
@@
 			name = ensure_queue_node(
 				port_tx_node, TX_NODE_FMT, port->port_id, txq, old_tx
 			);
-			gr_vec_add(tx_node_names, name);
+			if (unlikely(name == NULL))
+				goto err;
+			gr_vec_add(tx_node_names, name);
 		}
 	}
@@
-	edge = rte_node_edge_update(
+	edge = rte_node_edge_update(
 		port_output_node, 0, (const char **)tx_node_names, gr_vec_len(tx_node_names)
 	);
-	assert(edge != RTE_EDGE_ID_INVALID);
+	if (edge == RTE_EDGE_ID_INVALID)
+		goto err;
 	edge = rte_node_edge_shrink(port_output_node, gr_vec_len(tx_node_names));
-	assert(edge != RTE_EDGE_ID_INVALID);
+	if (edge == RTE_EDGE_ID_INVALID)
+		goto err;
@@
 	return unused_nodes;
+err:
+	gr_strvec_free(old_rx);
+	gr_strvec_free(old_tx);
+	gr_vec_free(unused_nodes);
+	errno = (errno != 0) ? errno : EINVAL;
+	return NULL;

308-319: Abort reload-all when node cloning fails

If worker_graph_nodes_add_missing returns NULL, we plough ahead and dereference it. Check and propagate the error.

-	gr_vec rte_node_t *unused_nodes = worker_graph_nodes_add_missing(ports);
+	gr_vec rte_node_t *unused_nodes = worker_graph_nodes_add_missing(ports);
+	if (unlikely(unused_nodes == NULL))
+		return errno_set(errno ? errno : EINVAL);

183-194: Bounds-check port_id before writing edge slot

out->edges is sized for RTE_MAX_ETHPORTS. Guard ctx->port_id before writing; otherwise a rogue port id overruns the array.

-				out->edges[ctx->port_id] = edge;
+				if (unlikely(ctx->port_id >= ARRAY_DIM(out->edges))) {
+					ret = -ERANGE;
+					goto out;
+				}
+				out->edges[ctx->port_id] = edge;
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7169af2 and 8bf42d1.

📒 Files selected for processing (11)
  • modules/infra/api/stats.c (3 hunks)
  • modules/infra/control/graph.c (11 hunks)
  • modules/infra/control/graph_priv.h (1 hunks)
  • modules/infra/control/worker.c (7 hunks)
  • modules/infra/control/worker_test.c (1 hunks)
  • modules/infra/datapath/gr_rxtx.h (1 hunks)
  • modules/infra/datapath/main_loop.c (5 hunks)
  • modules/infra/datapath/port_output.c (1 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (1 hunks)
  • smoke/affinity_test.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • smoke/affinity_test.sh
  • modules/infra/control/worker_test.c
  • modules/infra/datapath/gr_rxtx.h
  • modules/infra/datapath/port_tx.c
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-08T21:22:45.894Z
Learnt from: rjarry
PR: DPDK/grout#334
File: modules/infra/control/worker.c:278-281
Timestamp: 2025-10-08T21:22:45.894Z
Learning: In the codebase, `gr_vec_add` is a macro that does not return any value and cannot fail. Do not suggest checking its return value or adding error handling around it.

Applied to files:

  • modules/infra/control/worker.c
🧬 Code graph analysis (7)
modules/infra/datapath/main_loop.c (1)
main/sort.c (1)
  • topo_sort (53-97)
modules/infra/api/stats.c (2)
modules/infra/api/gr_infra.h (2)
  • gr_iface_type_name (322-338)
  • gr_iface_mode_name (341-351)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/port_output.c (5)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (520-547)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/eth_output.c (1)
  • eth_output_register_interface_type (26-33)
modules/infra/control/graph_priv.h (1)
modules/infra/control/graph.c (2)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
modules/infra/control/graph.c (3)
api/string.c (1)
  • astrcat (15-50)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
api/gr_errno.h (1)
  • errno_set (9-12)
modules/infra/control/worker.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/iface.c (1)
  • iface_next (198-213)
modules/infra/control/graph.c (2)
  • worker_graph_reload_all (304-322)
  • worker_graph_reload (207-230)
modules/infra/datapath/port_rx.c (2)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/datapath/trace.c (2)
  • gr_mbuf_trace_add (520-547)
  • trace_log_packet (354-504)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-cross-aarch64
  • GitHub Check: rpm
  • GitHub Check: deb
🔇 Additional comments (11)
modules/infra/api/stats.c (2)

60-61: LGTM!

Correctly broadens stats accumulation to include all per-queue port_rx nodes (e.g., "port_rx-0", "port_rx-1") using prefix matching, aligning with the one-node-per-port-queue architecture.


359-359: LGTM!

Function renames (gr_iface_type_name and gr_iface_mode_name) correctly applied, consistent with the broader infra API refactoring.

Also applies to: 376-376

modules/infra/datapath/main_loop.c (9)

88-88: LGTM!

Clean declaration for collecting graph nodes.


109-109: LGTM!

Proper use of centralized error handling.


112-131: LGTM!

Memory management properly prevents leaks by freeing old allocations before creating new ones. Error paths correctly jump to centralized cleanup.


141-144: LGTM!

Proper error handling now aborts on topo_sort failure rather than continuing with bad topology.


154-154: LGTM!

Proper cleanup of the nodes vector.


157-167: LGTM!

Centralized error handling properly frees all allocated resources on any failure path.


177-182: LGTM!

Explicit initialization prevents undefined state.


227-229: LGTM!

Properly hides stale stats from control plane during reconfiguration.


292-292: LGTM!

Proper cleanup of node_to_index on shutdown.

Copy link
Collaborator

@aharivel aharivel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (13)
modules/infra/datapath/eth_output.c (1)

52-71: Handle missing parent interface before dereferencing.

iface_from_id(sub->parent_id) can still yield NULL; we then touch priv->iface->type and index iface_type_edges, triggering a crash. Please bail out to INVAL before those dereferences.

 		priv->iface = iface_from_id(sub->parent_id);
+		if (unlikely(priv->iface == NULL)) {
+			edge = INVAL;
+			goto next;
+		}
 		src_mac = sub->mac;
modules/infra/datapath/port_tx.c (1)

38-50: Fix tracing race after transmit.

You still walk the array and mutate traced mbufs after rte_eth_tx_burst() hands them to the driver. That ownership transfer makes the post-loop racy (the FIXME even calls it out). Finalize the trace before calling the driver and drop the after-burst loop.

-	tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
-	if (tx_ok < nb_objs)
-		rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);
-
-	for (unsigned i = 0; i < tx_ok; i++) {
-		// FIXME racy: we are operating on mbufs already passed to driver
-		if (gr_mbuf_is_traced(mbufs[i])) {
-			struct port_queue *t;
-			t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
-			*t = *ctx;
-			gr_mbuf_trace_finish(mbufs[i]);
-		}
-	}
+	for (unsigned i = 0; i < nb_objs; i++) {
+		if (gr_mbuf_is_traced(mbufs[i])) {
+			struct port_queue *t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
+			*t = *ctx;
+			gr_mbuf_trace_finish(mbufs[i]);
+		}
+	}
+
+	tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
+	if (tx_ok < nb_objs)
+		rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);
modules/infra/control/graph.c (11)

141-141: NULL dereference: check rte_graph_node_get_by_name result.

Dereferencing node without NULL-check causes segfault if node not found in graph.

Add NULL check:

 	snprintf(node_name, sizeof(node_name), RX_NODE_FMT, qmap->port_id, qmap->queue_id);
 	node = rte_graph_node_get_by_name(graph_name, node_name);
+	if (node == NULL) {
+		ret = -ENOENT;
+		goto out;
+	}
 	struct rx_node_ctx *ctx = (struct rx_node_ctx *)node->ctx;

148-148: Replace assert with error handling.

Assertion aborts if port not found; return error instead.

Replace assert:

-	assert(ctx->iface != NULL);
+	if (ctx->iface == NULL) {
+		ret = -ENODEV;
+		goto out;
+	}

156-156: NULL dereference: check rte_graph_node_get_by_name result.

Same NULL-check issue as line 141.

Add NULL check:

 	gr_vec_foreach (const char *name, tx_node_names) {
 		node = rte_graph_node_get_by_name(graph_name, name);
+		if (node == NULL)
+			continue;
 		struct port_queue *ctx = (struct port_queue *)node->ctx;

181-181: NULL dereference: check rte_graph_node_get_by_name result.

Same NULL-check issue as line 141.

Add NULL check:

 		snprintf(node_name, sizeof(node_name), TX_NODE_FMT, qmap->port_id, qmap->queue_id);
 		node = rte_graph_node_get_by_name(graph_name, node_name);
+		if (node == NULL)
+			continue;
 		struct port_queue *ctx = (struct port_queue *)node->ctx;

191-191: Buffer overflow: bounds-check port_id before indexing.

Writing to out->edges[ctx->port_id] without validating port_id < ARRAY_DIM(out->edges) risks overflow.

Add bounds check:

-			out->edges[ctx->port_id] = edge;
+			if (ctx->port_id < ARRAY_DIM(out->edges))
+				out->edges[ctx->port_id] = edge;

198-198: NULL dereference: check rte_graph_node_get_by_name result.

Same NULL-check issue as line 141.

Add NULL check:

-	rte_graph_node_get_by_name(graph_name, "port_output")->ctx_ptr = out;
+	node = rte_graph_node_get_by_name(graph_name, "port_output");
+	if (node == NULL) {
+		ret = -ENOENT;
+		goto out;
+	}
+	node->ctx_ptr = out;

240-240: Replace assert with error handling.

Return NULL on allocation failure instead of aborting.

Replace assert:

 	char *name = astrcat(NULL, name_fmt, port_id, queue_id);
-	assert(name != NULL);
+	if (name == NULL)
+		return NULL;

252-253: NULL dereference and unsafe assertion.

strstr can return NULL if no "-" found, causing crash. Assert aborts on clone failure.

Replace with error handling:

-		rte_node_t node = rte_node_clone(base_node, strstr(name, "-") + 1);
-		assert(node != RTE_NODE_ID_INVALID);
+		const char *suffix = strstr(name, "-");
+		if (suffix == NULL) {
+			free(name);
+			return NULL;
+		}
+		rte_node_t node = rte_node_clone(base_node, suffix + 1);
+		if (node == RTE_NODE_ID_INVALID) {
+			free(name);
+			return NULL;
+		}

272-291: Handle failures in ensure_queue_node and edge operations.

Function ignores NULL from ensure_queue_node and asserts on edge failures. Return NULL on error so callers can propagate failure.

Add error handling:

 		for (uint16_t rxq = 0; rxq < port->n_rxq; rxq++) {
 			name = ensure_queue_node(
 				port_rx_node, RX_NODE_FMT, port->port_id, rxq, old_rx
 			);
+			if (name == NULL)
+				goto err;
 			gr_vec_add(rx_node_names, name);
 		}
 		for (uint16_t txq = 0; txq < port->n_txq; txq++) {
 			name = ensure_queue_node(
 				port_tx_node, TX_NODE_FMT, port->port_id, txq, old_tx
 			);
+			if (name == NULL)
+				goto err;
 			gr_vec_add(tx_node_names, name);
 		}
 	}
 
 	edge = rte_node_edge_update(
 		port_output_node, 0, (const char **)tx_node_names, gr_vec_len(tx_node_names)
 	);
-	assert(edge != RTE_EDGE_ID_INVALID);
+	if (edge == RTE_EDGE_ID_INVALID)
+		goto err;
 	edge = rte_node_edge_shrink(port_output_node, gr_vec_len(tx_node_names));
-	assert(edge != RTE_EDGE_ID_INVALID);
+	if (edge == RTE_EDGE_ID_INVALID)
+		goto err;
+
+	// ... rest of function ...
+
+err:
+	gr_strvec_free(old_rx);
+	gr_strvec_free(old_tx);
+	gr_vec_free(unused_nodes);
+	errno = (errno != 0) ? errno : EINVAL;
+	return NULL;

113-115: Check astrcat and gr_vec_add return values.

Both can fail (OOM), but results not checked. Could dereference NULL or leak memory.

Add error handling:

 		char *name = astrcat(NULL, RX_NODE_FMT, qmap->port_id, qmap->queue_id);
+		if (name == NULL) {
+			ret = -ENOMEM;
+			goto out;
+		}
-		gr_vec_add(rx_nodes, name);
-		gr_vec_add(graph_nodes, name);
+		if (gr_vec_add(rx_nodes, name) < 0 || gr_vec_add(graph_nodes, name) < 0) {
+			free(name);
+			ret = -ENOMEM;
+			goto out;
+		}

308-308: Check worker_graph_nodes_add_missing for NULL.

Function can return NULL on error, but result not checked before use.

Add error check:

 	gr_vec rte_node_t *unused_nodes = worker_graph_nodes_add_missing(ports);
+	if (unused_nodes == NULL)
+		return errno_set(errno ? errno : EINVAL);
🧹 Nitpick comments (1)
main/vec_test.c (1)

89-97: Add assertions to verify the test behavior.

The test lacks any assertions to verify that the strings were allocated and the vector was populated correctly. Consider adding checks for vector length and non-NULL strings before calling gr_strvec_free.

Apply this diff to add basic verification:

 static void dyn_str_vec(void **) {
 	gr_vec char **vec = NULL;
 
 	gr_vec_add(vec, strdup("foo"));
 	gr_vec_add(vec, strdup("bar"));
 	gr_vec_add(vec, strdup("baz"));
+
+	assert_int_equal(gr_vec_len(vec), 3);
+	assert_non_null(vec[0]);
+	assert_non_null(vec[1]);
+	assert_non_null(vec[2]);
 
 	gr_strvec_free(vec);
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 263dd93 and b270101.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (25)
  • main/gr_vec.h (2 hunks)
  • main/vec_test.c (1 hunks)
  • meson.build (2 hunks)
  • modules/infra/api/gr_infra.h (2 hunks)
  • modules/infra/api/stats.c (3 hunks)
  • modules/infra/control/gr_graph.h (0 hunks)
  • modules/infra/control/graph.c (11 hunks)
  • modules/infra/control/graph_priv.h (1 hunks)
  • modules/infra/control/port.c (2 hunks)
  • modules/infra/control/worker.c (7 hunks)
  • modules/infra/control/worker_priv.h (1 hunks)
  • modules/infra/control/worker_test.c (1 hunks)
  • modules/infra/datapath/eth_output.c (4 hunks)
  • modules/infra/datapath/gr_eth.h (1 hunks)
  • modules/infra/datapath/gr_rxtx.h (1 hunks)
  • modules/infra/datapath/l1_xconnect.c (4 hunks)
  • modules/infra/datapath/main_loop.c (5 hunks)
  • modules/infra/datapath/meson.build (1 hunks)
  • modules/infra/datapath/port_output.c (1 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (1 hunks)
  • modules/infra/datapath/rx.c (0 hunks)
  • modules/infra/datapath/tx.c (0 hunks)
  • smoke/_init.sh (1 hunks)
  • smoke/affinity_test.sh (1 hunks)
💤 Files with no reviewable changes (3)
  • modules/infra/control/gr_graph.h
  • modules/infra/datapath/tx.c
  • modules/infra/datapath/rx.c
🚧 Files skipped from review as they are similar to previous changes (11)
  • main/gr_vec.h
  • meson.build
  • modules/infra/datapath/gr_eth.h
  • modules/infra/control/worker_priv.h
  • smoke/affinity_test.sh
  • modules/infra/control/port.c
  • modules/infra/datapath/port_rx.c
  • modules/infra/datapath/meson.build
  • smoke/_init.sh
  • modules/infra/api/gr_infra.h
  • modules/infra/datapath/gr_rxtx.h
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-08T21:22:45.894Z
Learnt from: rjarry
PR: DPDK/grout#334
File: modules/infra/control/worker.c:278-281
Timestamp: 2025-10-08T21:22:45.894Z
Learning: In the codebase, `gr_vec_add` is a macro that does not return any value and cannot fail. Do not suggest checking its return value or adding error handling around it.

Applied to files:

  • modules/infra/datapath/main_loop.c
  • modules/infra/control/worker.c
🧬 Code graph analysis (8)
modules/infra/control/worker_test.c (1)
modules/infra/control/graph.c (2)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
modules/infra/control/graph_priv.h (1)
modules/infra/control/graph.c (2)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-322)
modules/infra/datapath/main_loop.c (1)
main/sort.c (1)
  • topo_sort (53-97)
modules/infra/control/worker.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/iface.c (1)
  • iface_next (198-213)
modules/infra/control/graph.c (2)
  • worker_graph_reload_all (304-322)
  • worker_graph_reload (207-230)
modules/infra/datapath/eth_output.c (3)
modules/infra/control/graph.c (1)
  • gr_node_attach_parent (36-64)
modules/infra/control/iface.c (2)
  • iface_from_id (215-222)
  • iface_get_eth_addr (224-237)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/control/graph.c (2)
api/string.c (1)
  • astrcat (15-50)
api/gr_errno.h (1)
  • errno_set (9-12)
modules/infra/datapath/port_output.c (5)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (520-547)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/eth_output.c (1)
  • eth_output_register_interface_type (26-33)
modules/infra/datapath/port_tx.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/trace.c (3)
  • trace_log_packet (354-504)
  • gr_mbuf_trace_add (520-547)
  • gr_mbuf_trace_finish (549-565)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/port_rx.c (1)
  • rxtx_trace_format (25-28)
🔇 Additional comments (5)
main/vec_test.c (2)

99-126: LGTM!

The test comprehensively validates the gr_vec_extend functionality with proper assertions for initial state, operation result, and element ordering.


132-133: LGTM!

Test registrations are correct and follow the existing pattern.

modules/infra/datapath/l1_xconnect.c (1)

15-15: LGTM! Renaming improves semantic clarity.

The TX → OUTPUT rename is more descriptive for cross-connect logic. All changes are internally consistent: enum value (line 15), edge assignment (line 39), node mappings (lines 62-63), and drop registration (line 77) align correctly. The simplified "xconnect_no_port" name is clearer than "xconnect_no_tx_port".

Also applies to: 39-39, 62-63, 77-77

modules/infra/api/stats.c (2)

359-359: Function renames look correct.

The updated calls to gr_iface_type_name and gr_iface_mode_name align with the API naming convention changes. Both functions should return equivalent string representations as their predecessors.

Also applies to: 376-376


60-61: All port RX nodes use the “port_rx-” prefix
RX_NODE_FMT expands to “port_rx-…”, so every registered RX queue node is matched by the new prefix logic. The original “port_rx” node is unreferenced in the graph and won’t be missed.

Copy link
Collaborator

@maxime-leroy maxime-leroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain rte_node_free what is it ? why it's coming in new version of dpdk. Why we need it.

@rjarry
Copy link
Collaborator Author

rjarry commented Oct 10, 2025

Explain rte_node_free what is it ? why it's coming in new version of dpdk. Why we need it.

It allows to free resources allocated by __rte_node_register and rte_node_clone. It was added only in DPDK 25.07 DPDK/dpdk@16b7191.

We need this because the last commit of this series calls rte_node_clone extensively to match the configured ports RX and TX queues. When removing a port or reducing their number of queues, node clones must be removed/freed. Without this, we will have unused nodes.

If bumping the DPDK version is problematic, we could live with the fact that unused nodes are left hanging until a new stable LTS version is released.

rjarry added 10 commits October 10, 2025 10:58
The stats_context.node_to_index array is dynamically allocated using
rte_malloc(). Call rte_free() when exiting the main loop to avoid
leaking memory.

Explicitly initialize all pointers in the stats_context to avoid
ambiguities.

Fixes: e535cfd ("stats: avoid crash with node_id greater than 255")
Signed-off-by: Robin Jarry <[email protected]>
Grout runs as a background process and is killed in the cleanup trap
function, which runs with errexit disabled. If grout crashes during
a test, the error from `wait %?grout` is silently ignored.

Move the cleanup before removing temp files and check the exit status
to ensure smoke tests fail when grout crashes.

Signed-off-by: Robin Jarry <[email protected]>
Use rte_graph_node_get_by_name() to get a direct reference to the
instantiated rte_node object after the graph has been created and set
the node context pointer directly, eliminating the hash table-based
indirection on graph and node names.

Remove the .init callbacks from port_rx and port_tx nodes which are now
unnecessary. Keep the rte_free() calls in .fini callbacks to ensure
context pointers are properly freed.

Signed-off-by: Robin Jarry <[email protected]>
When dealing with vectors of dynamically allocated strings, it is useful
to be able to free all strings and the vectors with a single function
call.

Signed-off-by: Robin Jarry <[email protected]>
Add a function to concatenate a vector to the end of another one.

Signed-off-by: Robin Jarry <[email protected]>
All enums that have inline functions to get their human readable string
representation use the same naming scheme. Follow this for interface
types and modes.

Signed-off-by: Robin Jarry <[email protected]>
Rename the files so that they match the node names.

Signed-off-by: Robin Jarry <[email protected]>
In the next patches, we will need access to the iface_info_port
structures.

Signed-off-by: Robin Jarry <[email protected]>
Do not hardcode the default edge after eth_output. Make it dependent on
interface type.

Add a new port_output node and register it for GR_IFACE_TYPE_PORT.

Use iface_get_eth_addr to fill in the source mac address and error out
if the output interface does not report its mac address.

Signed-off-by: Robin Jarry <[email protected]>
Instead of dealing with a list of queues to poll packets from,
instantiate one port_rx node clone for each configured port RX queue.

Do the same thing for port_tx and each configured port TX queue.

Depending on worker RX queue mapping, include the associated
port_rx-p*q* clones and initialize their context data accordingly.

Always include *all* port_tx-p*q* clones in all graphs, regardless of
worker TX queue mappings. Instead, configure the port_output node
context to steer packets to the proper port_tx-p*q* clone.

All of this requires delicate juggling with multiple node names lists to
ensure that nodes are cloned when needed before they are affected to
a graph. Secondly, we need to call rte_node_free() on those which are
not used in any graph after all of them have been reloaded.

Before: single port_rx and port_tx nodes handling multiple queues:

  ┌──────────────────────────────────────────────────┐
  │             port_rx (SOURCE)                     │
  │  ctx_ptr → rx_node_queues:                       │
  │         [{port:0,q:0}, {port:0,q:1}, ...]        │
  │  Loop all queues, call rte_eth_rx_burst() each   │
  └────────────────────────┬─────────────────────────┘
                           v
                    ┌─────────────┐
                    │  eth_input  │
                    └──────┬──────┘
                           v
                        [ ... ]  (L3/L4 nodes)
                           v
                    ┌─────────────┐
                    │ eth_output  │
                    └──────┬──────┘
                           v
                    ┌─────────────┐
                    │ port_output │
                    └──────┬──────┘
                           v
  ┌───────────────────────────────────────────────────┐
  │              port_tx (SINK)                       │
  │  ctx_ptr → tx_node_queues:                        │
  │    txq_ids[]: {[0]:0, [1]:1, [2]:0xffff, ...}     │
  │  Group mbufs by port, rte_eth_tx_burst() per port │
  └───────────────────────────────────────────────────┘

After: one clone per port-queue pair:

Only the nodes associated to RX queues assigned to a given worker are
part of its graph. Single rte_eth_rx_burst() per node.

  ┌──────────────┐ ┌──────────────┐           ┌─────────────┐
  │ port_rx-p0q0 │ │ port_rx-p0q1 │           │port_rx-p1q0 │
  │ ctx:         │ │ ctx:         │  [ ... ]  │ctx:         │
  │  rxq: {0, 0} │ │  rxq: {0, 1} │           │ rxq: {1, 0} │
  └────────┬─────┘ └────────┬─────┘           └──────┬──────┘
           │                │                        │
           └────────────────┼────────────────────────┘
                            v
                     ┌─────────────┐
                     │  eth_input  │
                     └──────┬──────┘
                            v
                         [ ... ]  (L3/L4 nodes)
                            v
                     ┌─────────────┐
                     │ eth_output  │
                     └──────┬──────┘
                            v
              ┌────────────────────────────┐
              │     port_output            │
              │ ctx_ptr → edges[port_id]:  │
              │   [0]: edge → port_tx-p0q0 │
              │   [1]: edge → port_tx-p1q0 │
              └────────────┬───────────────┘
                           │
        ┌────────────┬─────┴────┬─────────────────────┐
        v            v          v                     v
  ┌──────────┐ ┌─────────┐ ┌─────────┐           ┌─────────┐
  │port_tx-  │ │port_tx- │ │port_tx- │           │port_tx- │
  │  p0q0    │ │  p0q1   │ │  p1q0   │  [ ... ]  │  pXqY   │
  │ctx:{0,0} │ │{MAX,MAX}│ │ctx:{1,0}│           │{MAX,MAX}│
  │          │ │ UNUSED  │ │         │           │ UNUSED  │
  └──────────┘ └─────────┘ └─────────┘           └─────────┘

All TX nodes are part of all graphs. Only the port_output context
differs per-worker and steers packets to the correct TX queue node.
Single rte_eth_tx_burst() per node.

Signed-off-by: Robin Jarry <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
modules/infra/datapath/port_tx.c (1)

42-50: Don’t touch mbufs after tx_burst; move tracing before transmit

Post-TX loop is racy; mbufs are owned by the driver. Finalize trace before calling rte_eth_tx_burst and drop the post-loop.

-    tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
-    if (tx_ok < nb_objs)
-        rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);
-
-    for (unsigned i = 0; i < tx_ok; i++) {
-        // FIXME racy: we are operating on mbufs already passed to driver
-        if (gr_mbuf_is_traced(mbufs[i])) {
-            struct port_queue *t;
-            t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
-            *t = *ctx;
-            gr_mbuf_trace_finish(mbufs[i]);
-        }
-    }
+    for (unsigned i = 0; i < nb_objs; i++) {
+        if (gr_mbuf_is_traced(mbufs[i])) {
+            struct port_queue *t = gr_mbuf_trace_add(mbufs[i], node, sizeof(*t));
+            *t = *ctx;
+            gr_mbuf_trace_finish(mbufs[i]);
+        }
+    }
+
+    tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
+    if (tx_ok < nb_objs)
+        rte_node_enqueue(graph, node, TX_ERROR, &objs[tx_ok], nb_objs - tx_ok);

If you need “only on success”, add a compensating trace in port_tx_error.

modules/infra/control/graph.c (2)

139-145: NULL-check graph nodes before dereferencing

rte_graph_node_get_by_name can return NULL; guard before using node->ctx / ctx_ptr to avoid crashes.

         snprintf(node_name, sizeof(node_name), RX_NODE_FMT, qmap->port_id, qmap->queue_id);
-        node = rte_graph_node_get_by_name(graph_name, node_name);
-        struct rx_node_ctx *ctx = (struct rx_node_ctx *)node->ctx;
+        node = rte_graph_node_get_by_name(graph_name, node_name);
+        if (node == NULL) { ret = -ENOENT; goto out; }
+        struct rx_node_ctx *ctx = (struct rx_node_ctx *)node->ctx;
@@
-    node = rte_graph_node_get_by_name(graph_name, name);
-    struct port_queue *ctx = (struct port_queue *)node->ctx;
+    node = rte_graph_node_get_by_name(graph_name, name);
+    if (node == NULL) { ret = -ENOENT; goto out; }
+    struct port_queue *ctx = (struct port_queue *)node->ctx;
@@
-    // finally, set the port_output context data
-    rte_graph_node_get_by_name(graph_name, "port_output")->ctx_ptr = out;
+    // finally, set the port_output context data
+    node = rte_graph_node_get_by_name(graph_name, "port_output");
+    if (node == NULL) { ret = -ENOENT; goto out; }
+    node->ctx_ptr = out;

Also applies to: 156-160, 181-186, 197-199


187-194: Bounds-check port_id when indexing edges

Protect out->edges[] from overflow on unexpected port_id values.

         for (rte_edge_t edge = 0; edge < gr_vec_len(tx_node_names); edge++) {
             if (strcmp(tx_node_names[edge], node_name) == 0) {
                 // update the port_output context data to map this port to the
                 // correct edge
-                out->edges[ctx->port_id] = edge;
+                if (ctx->port_id < ARRAY_DIM(out->edges))
+                    out->edges[ctx->port_id] = edge;
                 break;
             }
         }
🧹 Nitpick comments (4)
modules/infra/control/worker.c (1)

179-182: Log on success; log error on failure

Avoid “unplugged” message when reload fails.

-LOG(INFO, "port %u unplugged", p->port_id);
-
-return ret;
+if (ret == 0)
+    LOG(INFO, "port %u unplugged", p->port_id);
+else
+    errno_log(-ret, "port %u unplug failed", p->port_id);
+return ret;
modules/infra/datapath/gr_rxtx.h (1)

24-28: Guard ctx struct sizes at compile-time

Add static_asserts so ctx fits in rte_node->ctx (16 bytes).

 struct rx_node_ctx {
     const struct iface *iface;
     struct port_queue rxq;
     uint16_t burst_size;
 };
+
+static_assert(sizeof(struct rx_node_ctx) <= 16, "rx_node_ctx exceeds rte_node ctx size");
+static_assert(sizeof(struct port_queue) <= 16, "port_queue exceeds rte_node ctx size");

Also applies to: 30-32

modules/infra/control/graph.c (2)

151-152: Use enabled RXQ count for burst sizing

Divide by enabled RXQs, not total entries in worker->rxqs.

-        ctx->burst_size = RTE_GRAPH_BURST_SIZE / gr_vec_len(worker->rxqs);
+        ctx->burst_size = RTE_GRAPH_BURST_SIZE / n_rxqs;

315-319: Free unused cloned nodes now that DPDK 25.07 provides rte_node_free

Switch from dropping the list to actually freeing clones.

-    // these port_rx and port_tx clones are now not referenced in any graph
-    // we can safely delete them
-    // FIXME: call rte_node_free on each one of them when updating to DPDK 25.11
-    gr_vec_free(unused_nodes);
+    // these port_rx and port_tx clones are now not referenced in any graph
+    // we can safely delete them
+    if (unused_nodes != NULL) {
+        for (unsigned i = 0; i < gr_vec_len(unused_nodes); i++)
+            rte_node_free(unused_nodes[i]);
+        gr_vec_free(unused_nodes);
+    }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b270101 and 3bbd6a0.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (24)
  • main/gr_vec.h (2 hunks)
  • main/vec_test.c (1 hunks)
  • modules/infra/api/gr_infra.h (2 hunks)
  • modules/infra/api/stats.c (3 hunks)
  • modules/infra/control/gr_graph.h (0 hunks)
  • modules/infra/control/graph.c (11 hunks)
  • modules/infra/control/graph_priv.h (1 hunks)
  • modules/infra/control/port.c (2 hunks)
  • modules/infra/control/worker.c (7 hunks)
  • modules/infra/control/worker_priv.h (1 hunks)
  • modules/infra/control/worker_test.c (1 hunks)
  • modules/infra/datapath/eth_output.c (4 hunks)
  • modules/infra/datapath/gr_eth.h (1 hunks)
  • modules/infra/datapath/gr_rxtx.h (1 hunks)
  • modules/infra/datapath/l1_xconnect.c (4 hunks)
  • modules/infra/datapath/main_loop.c (5 hunks)
  • modules/infra/datapath/meson.build (1 hunks)
  • modules/infra/datapath/port_output.c (1 hunks)
  • modules/infra/datapath/port_rx.c (1 hunks)
  • modules/infra/datapath/port_tx.c (1 hunks)
  • modules/infra/datapath/rx.c (0 hunks)
  • modules/infra/datapath/tx.c (0 hunks)
  • smoke/_init.sh (1 hunks)
  • smoke/affinity_test.sh (1 hunks)
💤 Files with no reviewable changes (3)
  • modules/infra/datapath/rx.c
  • modules/infra/control/gr_graph.h
  • modules/infra/datapath/tx.c
🚧 Files skipped from review as they are similar to previous changes (8)
  • modules/infra/datapath/gr_eth.h
  • modules/infra/datapath/port_rx.c
  • modules/infra/datapath/port_output.c
  • modules/infra/control/worker_test.c
  • modules/infra/control/worker_priv.h
  • main/vec_test.c
  • modules/infra/datapath/eth_output.c
  • smoke/affinity_test.sh
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/infra/api/gr_infra.h
  • main/gr_vec.h
  • modules/infra/api/stats.c
  • modules/infra/datapath/main_loop.c
  • modules/infra/control/worker.c
  • modules/infra/control/port.c
  • modules/infra/datapath/gr_rxtx.h
  • modules/infra/datapath/port_tx.c
  • modules/infra/datapath/l1_xconnect.c
  • modules/infra/control/graph_priv.h
  • modules/infra/control/graph.c
**/*.sh

⚙️ CodeRabbit configuration file

**/*.sh: - Don't bother about unquoted shell variables.

Files:

  • smoke/_init.sh
🧠 Learnings (2)
📚 Learning: 2025-10-02T07:42:42.135Z
Learnt from: rjarry
PR: DPDK/grout#326
File: modules/infra/control/graph.c:360-364
Timestamp: 2025-10-02T07:42:42.135Z
Learning: In the grout codebase, `gr_vec_free()` is a macro defined in `main/gr_vec.h` that automatically sets the vector pointer to NULL after freeing the memory, preventing double-free issues. The macro definition is: `#define gr_vec_free(v) ((v) ? free(__gr_vec_hdr(v)) : (void)0, (v) = NULL)`.

Applied to files:

  • main/gr_vec.h
📚 Learning: 2025-10-08T21:22:45.894Z
Learnt from: rjarry
PR: DPDK/grout#334
File: modules/infra/control/worker.c:278-281
Timestamp: 2025-10-08T21:22:45.894Z
Learning: In the codebase, `gr_vec_add` is a macro that does not return any value and cannot fail. Do not suggest checking its return value or adding error handling around it.

Applied to files:

  • modules/infra/datapath/main_loop.c
  • modules/infra/control/worker.c
🧬 Code graph analysis (7)
modules/infra/api/stats.c (1)
modules/infra/api/gr_infra.h (2)
  • gr_iface_type_name (322-338)
  • gr_iface_mode_name (341-351)
modules/infra/datapath/main_loop.c (1)
main/sort.c (1)
  • topo_sort (53-97)
modules/infra/control/worker.c (3)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/iface.c (1)
  • iface_next (198-213)
modules/infra/control/graph.c (2)
  • worker_graph_reload_all (304-321)
  • worker_graph_reload (207-230)
modules/infra/control/port.c (1)
modules/infra/control/worker.c (2)
  • port_unplug (145-182)
  • port_plug (184-223)
modules/infra/datapath/port_tx.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/trace.c (3)
  • trace_log_packet (354-504)
  • gr_mbuf_trace_add (520-547)
  • gr_mbuf_trace_finish (549-565)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/port_rx.c (1)
  • rxtx_trace_format (25-28)
modules/infra/control/graph_priv.h (1)
modules/infra/control/graph.c (2)
  • worker_graph_reload (207-230)
  • worker_graph_reload_all (304-321)
modules/infra/control/graph.c (3)
api/string.c (1)
  • astrcat (15-50)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
api/gr_errno.h (1)
  • errno_set (9-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-cross-aarch64
  • GitHub Check: rpm
  • GitHub Check: deb
🔇 Additional comments (18)
smoke/_init.sh (1)

53-57: LGTM!

The cleanup reordering and explicit SIGTERM handling are correct. When grout exits cleanly with status 0 after receiving SIGTERM, the || fail clause won't trigger.

main/gr_vec.h (2)

142-154: LGTM!

The vector extension logic is correct: grows capacity as needed, appends items via memcpy, and updates the length.


156-162: LGTM!

String vector cleanup is correct: frees each string, frees the header, returns NULL (as per learnings, this prevents double-free issues).

modules/infra/datapath/l1_xconnect.c (1)

14-18: LGTM!

The renaming from TX to OUTPUT and node name updates (port_tx → port_output, xconnect_no_tx_port → xconnect_no_port) are consistent throughout.

Also applies to: 39-39, 62-63, 77-77

modules/infra/api/gr_infra.h (2)

322-338: LGTM!

Function rename to gr_iface_type_name is clear. The added GR_IFACE_TYPE_COUNT case correctly falls through to the default "?" return for unknown values.


341-351: LGTM!

Function rename to gr_iface_mode_name is clear. The added GR_IFACE_MODE_COUNT case correctly falls through to the default "?" return for unknown values.

modules/infra/datapath/meson.build (1)

15-17: LGTM!

Replacing rx.c/tx.c with port_output.c, port_rx.c, and port_tx.c aligns with the PR's per-port-queue node architecture.

modules/infra/control/port.c (1)

163-164: LGTM!

The updated calls to port_unplug(p) and port_plug(p) correctly match the new signatures that take the full port structure instead of just the port_id.

Also applies to: 288-288

modules/infra/control/graph_priv.h (1)

6-11: LGTM!

The added includes and updated function signatures to accept gr_vec struct iface_info_port **ports are correct and match the implementations in graph.c.

modules/infra/api/stats.c (2)

60-62: LGTM!

Switching to prefix match strncmp(name, "port_rx-", ...) correctly handles the new per-queue port_rx nodes (e.g., port_rx-0, port_rx-1).


359-360: LGTM!

The function renames to gr_iface_type_name and gr_iface_mode_name are consistent with the API changes in gr_infra.h.

Also applies to: 376-377

modules/infra/datapath/main_loop.c (3)

112-130: Good cleanup and allocation ordering

Freeing previous w_stats/node_to_index before (re)alloc and unified err path look solid.


227-230: Hiding stale stats during reload is correct

Setting w->stats to NULL until stats_reload completes avoids control-plane races.


292-293: Shutdown frees look complete

Freeing ctx.node_to_index alongside w_stats/stats prevents leaks on exit.

modules/infra/control/worker.c (3)

170-177: Ports vector build for unplug looks correct

Collecting all other ports and passing to worker_graph_reload_all is consistent with new per-port graph wiring.


206-223: Plug path correctly reconstructs ports set

Ensures target port is present; frees vector after reload.


278-299: RXQ reassign reload sequencing is sound

Reload source worker first (to release), then assign and reload destination; ports vector reuse is fine.

modules/infra/control/graph.c (1)

162-170: ctx_ptr freed by port_output_fini

The .fini callback in modules/infra/datapath/port_output.c:41 invokes rte_free(node->ctx_ptr).

@christophefontaine christophefontaine merged commit 587d5e4 into DPDK:main Oct 10, 2025
11 checks passed
@rjarry rjarry deleted the graph-rxtx branch October 10, 2025 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants