feat: ForwardPassMetrics dynamo event plane integration by tedzhouhk · Pull Request #7250 · ai-dynamo/dynamo

tedzhouhk · 2026-03-11T22:35:41Z

FPM Event Plane Integration

Summary

Adds a Dynamo event plane relay for ForwardPassMetrics, enabling automatic discovery and transport-agnostic delivery (NATS Core / ZMQ). Follows the same two-layer bridge architecture as KV events.

Motivation

The initial FPM implementation (merged in the prior PR) uses raw ZMQ PUB/SUB with manual port configuration. This works for local testing but lacks:

Discovery: subscribers must know the engine's IP and port
Transport abstraction: hard-coded to ZMQ, no NATS support
Dynamic scaling: adding/removing engines requires reconfiguring subscribers

This PR bridges the gap by adding a relay in the parent process that subscribes to the child's raw ZMQ and re-publishes to the Dynamo event plane, exactly like KV events do.

Architecture

Child (EngineCore):  InstrumentedScheduler -> _FpmPublisherThread -> ZMQ PUB (localhost)
                                                                        |
Parent (dynamo.vllm): FpmEventRelay (ZMQ SUB) -> EventPublisher -> Event Plane (NATS/ZMQ)
                                                                        |
Consumer:            FpmEventSubscriber (auto-discovered) -> decode() -> ForwardPassMetrics

The raw ZMQ hop is needed because the scheduler runs in a forked child process without access to the Dynamo runtime. The FpmEventRelay bridge in the parent process handles event plane transport and discovery registration automatically.

Why two layers (same as KV events)

vLLM's EngineCore runs in a forked child process. After fork(), the child has no access to the parent's Dynamo runtime (tokio executor, NATS connections, etcd leases, K8s watchers don't survive fork). The child can only open new raw sockets. The parent process has the runtime and acts as the bridge.

Changes

Rust: `dynamo-llm` crate

lib/llm/src/fpm_publisher.rs (new)

FpmEventRelay -- subscribes to a local ZMQ PUB socket and relays raw bytes to EventPublisher::for_component("forward-pass-metrics"). Simpler than KvEventPublisher (no event transformation, no batching, no local indexer). Includes:

Consecutive error tracking with max-error exit
Warning on malformed ZMQ frames
CancellationToken for graceful shutdown
Drop impl that cancels the relay task

Rust: `dynamo-py3` crate (Python bindings)

lib/bindings/python/rust/llm/fpm.rs (new)

FpmEventRelay -- thin #[pyclass] wrapper around dynamo_llm::fpm_publisher::FpmEventRelay
FpmEventSubscriber -- wraps EventSubscriber::for_component for the consumer side. Background tokio task feeds an mpsc channel; recv() releases the GIL while blocking. Exits cleanly when the receiver is dropped.

Python

components/src/dynamo/vllm/main.py

Added setup_fpm_relay() that creates one FpmEventRelay per dp_rank (parallel to setup_kv_event_publisher())
Called in both init_prefill() and init() paths
Opt-in: only activates when DYN_VLLM_FORWARDPASS_METRIC_PORT is set

components/src/dynamo/vllm/handlers.py

Added self.fpm_relays field on handler class
Added relay shutdown in cleanup() to prevent resource leaks

components/src/dynamo/common/recv_forward_pass_metrics.py (new)

Standalone consumer that uses FpmEventSubscriber with the Dynamo runtime for auto-discovery. Usage:

python -m dynamo.common.recv_forward_pass_metrics \
  --namespace dynamo --component backend --endpoint generate

lib/bindings/python/src/dynamo/llm/__init__.py

Re-exports FpmEventRelay and FpmEventSubscriber from _core.

Files not changed

instrumented_scheduler.py -- stays as-is (raw ZMQ PUB from child process)
forward_pass_metrics.py -- schema unchanged
envs.py -- DYN_VLLM_FORWARDPASS_METRIC_PORT still needed for child-to-parent ZMQ
args.py -- injection logic unchanged

Discovery

Each FpmEventRelay creates an EventPublisher::for_component("forward-pass-metrics") which automatically:

Binds a ZMQ PUB on an auto-assigned port (or connects to NATS)
Registers as DiscoveryInstance::EventChannel with topic: "forward-pass-metrics" in the discovery plane

Subscribers use EventSubscriber::for_component("forward-pass-metrics") which watches discovery and dynamically connects/disconnects as engines come and go. Works across all discovery backends (K8s CRD, etcd, file, mem).

Testing

# Terminal 1: Start engine with FPM
DYN_VLLM_FORWARDPASS_METRIC_PORT=20380 bash examples/backends/vllm/launch/agg.sh

# Terminal 2: Receive via event plane (auto-discovery)
python -m dynamo.common.recv_forward_pass_metrics \
  --namespace dynamo --component backend --endpoint generate

# Terminal 3: Load test
aiperf profile --model Qwen/Qwen3-0.6B --endpoint-type chat \
  --endpoint /v1/chat/completions --streaming --url http://localhost:8000 \
  --synthetic-input-tokens-mean 500 --output-tokens-mean 50 \
  --concurrency 50 --request-count 500

Summary by CodeRabbit

Release Notes

New Features
- Added forward pass metrics collection and consumption capabilities via new APIs for event-based metric streaming.
- Introduced utility tools to subscribe to and display forward pass metrics.
Chores
- Enhanced documentation describing forward pass metrics architecture and data flow.

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

coderabbitai · 2026-03-11T22:38:35Z

Walkthrough

This change introduces a complete forward-pass metrics (FPM) infrastructure that bridges local ZMQ metrics from vLLM to the Dynamo event plane. The implementation includes Rust bindings for relay and subscriber components, Python utilities for metrics consumption, vLLM integration for lifecycle management, and architectural documentation describing the two-layer event flow system.

Changes

Cohort / File(s)	Summary
Architectural Documentation & Utilities `components/src/dynamo/common/forward_pass_metrics.py`, `components/src/dynamo/common/recv_forward_pass_metrics.py`	Updated module docstring detailing FPM architecture, data flow, and relay design. Added new consumer utility that subscribes to ForwardPassMetrics via event plane, auto-discovers publishers, and prints metrics as JSON with CLI configuration options.
vLLM Integration `components/src/dynamo/vllm/handlers.py`, `components/src/dynamo/vllm/main.py`	Added `fpm_relays` attribute to BaseWorkerHandler with shutdown cleanup. Introduced `setup_fpm_relay()` function that creates FpmEventRelay instances per data-parallel rank, gated by environment variable, integrated into both prefill and standard engine initialization paths.
Python Bindings & Type Stubs `lib/bindings/python/src/dynamo/_core.pyi`, `lib/bindings/python/src/dynamo/llm/__init__.py`	Added type stubs for FpmEventRelay and FpmEventSubscriber classes with lifecycle and message reception methods. Exposed both classes via llm module public imports.
Rust Module Wiring `lib/bindings/python/rust/lib.rs`, `lib/bindings/python/rust/llm.rs`	Registered FpmEventRelay and FpmEventSubscriber as Python-accessible classes. Added fpm module declarations alongside model_card and preprocessor modules.
Rust Implementation `lib/bindings/python/rust/llm/fpm.rs`, `lib/llm/src/fpm_publisher.rs`, `lib/llm/src/lib.rs`	Implemented FpmEventRelay bridging ZMQ SUB socket to event plane with background relay task, multipart frame parsing, consecutive error tracking, and graceful shutdown. Implemented FpmEventSubscriber with async message reception, background spawning, and GIL-aware synchronous recv() method. Exported both as public modules.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A relay hops through ZMQ's dance,
Metrics flow through Dynamo's expanse,
With Rust bindings spun and Python compiled,
Forward-pass wisdom flows undefiled! 🌟

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: ForwardPassMetrics dynamo event plane integration' accurately and specifically describes the main feature being added: integration of ForwardPassMetrics with the Dynamo event plane.
Description check	✅ Passed	The PR description is comprehensive and well-structured, covering Summary, Motivation, Architecture, Changes, Discovery, and Testing. It exceeds the repository template requirements with detailed technical context.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

PeaBrane

Clean integration that follows the existing KV event relay patterns well. One minor nit on cleanup consistency.

components/src/dynamo/vllm/handlers.py

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

components/src/dynamo/vllm/handlers.py (1)
879-882: Keep the shutdown traceback here.

Line 881-882 logs only str(e), which makes relay teardown failures much harder to debug across the Python/Rust boundary. Prefer logger.exception(...) or exc_info=True so the full stack is preserved.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/vllm/handlers.py` around lines 879 - 882, The current
except block around relay.shutdown() only logs str(e) which loses the traceback;
update the handler catching Exception in the shutdown try/except so it logs the
full exception context (either call logger.exception("Failed to shut down FPM
relay") or logger.warning("Failed to shut down FPM relay", exc_info=True))
instead of logger.warning(f"Failed to shut down FPM relay: {e}"), leaving the
message text intact and ensuring the stacktrace from relay.shutdown() is
preserved for debugging.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/common/recv_forward_pass_metrics.py`:
- Around line 60-62: The runtime is being created with enable_nats forcibly set
to False which breaks topologies where the event plane uses NATS; update the
DistributedRuntime instantiation (the runtime variable / DistributedRuntime
constructor call) to stop forcing enable_nats=False — either remove that
argument so the binding uses its default behavior, or wire an explicit
CLI/runtime flag into the constructor (e.g., pass args.enable_nats) so callers
can control NATS instead of hardcoding False.

In `@components/src/dynamo/vllm/main.py`:
- Around line 724-727: The FPM relays are being started for engines built from
snapshots without a seeded worker ID, causing blank worker_id in
ForwardPassMetrics; update the checkpoint/snapshot path in worker()—where
snapshot_engine is created via setup_vllm_engine()—to either seed fpm_worker_id
before engine construction (pass the fpm_worker_id into setup_vllm_engine) or
skip calling setup_fpm_relay entirely for snapshot-created engines;
specifically, ensure the code that currently calls
setup_fpm_relay(generate_endpoint, vllm_config) and assigns handler.fpm_relays
only runs when a valid fpm_worker_id is present (or when the engine was not
created from a snapshot), or alternatively inject fpm_worker_id into
setup_vllm_engine so subsequent setup_fpm_relay sees a non-empty worker id
(affecting fpm_relays, setup_fpm_relay, handler.fpm_relays, snapshot_engine,
worker(), setup_vllm_engine, and fpm_worker_id).

In `@lib/bindings/python/rust/llm/fpm.rs`:
- Around line 68-69: Change the unbounded handoff to a bounded channel to
prevent unbounded memory growth: replace the rx field type
Arc<std::sync::Mutex<tokio::sync::mpsc::UnboundedReceiver<Vec<u8>>>> with
Arc<std::sync::Mutex<tokio::sync::mpsc::Receiver<Vec<u8>>>> (or the equivalent
bounded receiver type), create the channel with
tokio::sync::mpsc::channel(CAPACITY) instead of unbounded_channel in the
producer/initialization code (the site currently calling unbounded_channel
around line 86), and update the producer send logic to use .send().await
(backpressure) or an explicit overflow policy (e.g., call try_send() and drop
oldest when Err(TrySendError::Full) if you prefer a drop policy); also adjust
any recv/send call sites and error handling to handle the bounded channel
semantics and closed-channel errors while keeping the CancellationToken usage
unchanged.

In `@lib/llm/src/fpm_publisher.rs`:
- Around line 89-103: The code currently accepts any ZMQ multipart message with
frames.len() >= 3 and silently discards extra frames; update the validation so
only messages with exactly 3 frames are accepted: check frames.len() == 3,
extract the payload only when that is true (instead of swap_remove on index 2
for >=3), and treat any other frame count as a malformed message (log/warn and
skip publishing via publisher.publish_bytes). Ensure you reference the existing
frames Vec<Vec<u8>> handling and the call to
publisher.publish_bytes(payload).await when implementing this strict check.

---

Nitpick comments:
In `@components/src/dynamo/vllm/handlers.py`:
- Around line 879-882: The current except block around relay.shutdown() only
logs str(e) which loses the traceback; update the handler catching Exception in
the shutdown try/except so it logs the full exception context (either call
logger.exception("Failed to shut down FPM relay") or logger.warning("Failed to
shut down FPM relay", exc_info=True)) instead of logger.warning(f"Failed to shut
down FPM relay: {e}"), leaving the message text intact and ensuring the
stacktrace from relay.shutdown() is preserved for debugging.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 507f52c2-8883-4eb3-8ab5-fc1e5c9740ff

📥 Commits

Reviewing files that changed from the base of the PR and between 5178a4a and 327e70f.

📒 Files selected for processing (11)

components/src/dynamo/common/forward_pass_metrics.py
components/src/dynamo/common/recv_forward_pass_metrics.py
components/src/dynamo/vllm/handlers.py
components/src/dynamo/vllm/main.py
lib/bindings/python/rust/lib.rs
lib/bindings/python/rust/llm.rs
lib/bindings/python/rust/llm/fpm.rs
lib/bindings/python/src/dynamo/_core.pyi
lib/bindings/python/src/dynamo/llm/__init__.py
lib/llm/src/fpm_publisher.rs
lib/llm/src/lib.rs

components/src/dynamo/common/recv_forward_pass_metrics.py

components/src/dynamo/vllm/main.py

lib/bindings/python/rust/llm/fpm.rs

lib/llm/src/fpm_publisher.rs

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

components/src/dynamo/common/recv_forward_pass_metrics.py

components/src/dynamo/common/forward_pass_metrics.py

lib/bindings/python/rust/lib.rs

biswapanda · 2026-03-12T00:44:16Z

On naming, FPM is not a well-understood acronym yet.
we'd either use long full name (ForwardPassMetrics) or add some notes that fpm=ForwardPassMetrics
FPM = ForwardPassMetrics

components/src/dynamo/common/forward_pass_metrics.py

components/src/dynamo/vllm/main.py

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

feat: ForwardPassMetrics dynamo event plane integration

60d5713

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

tedzhouhk requested review from a team as code owners March 11, 2026 22:35

pull-request-size bot added the size/XL label Mar 11, 2026

github-actions bot added feat backend::vllm Relates to the vllm backend labels Mar 11, 2026

pc

327e70f

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB March 11, 2026 22:37 Inactive

PeaBrane approved these changes Mar 11, 2026

View reviewed changes

components/src/dynamo/vllm/handlers.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

components/src/dynamo/common/recv_forward_pass_metrics.py Show resolved Hide resolved

components/src/dynamo/vllm/main.py Outdated Show resolved Hide resolved

lib/bindings/python/rust/llm/fpm.rs Show resolved Hide resolved

lib/llm/src/fpm_publisher.rs Show resolved Hide resolved

address comment

da60330

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB March 11, 2026 23:09 Inactive

move to common

fcc95fb

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB March 11, 2026 23:16 Inactive

nnshah1 reviewed Mar 11, 2026

View reviewed changes

components/src/dynamo/common/recv_forward_pass_metrics.py Show resolved Hide resolved

tedzhouhk enabled auto-merge (squash) March 11, 2026 23:54

nnshah1 reviewed Mar 11, 2026

View reviewed changes

components/src/dynamo/common/forward_pass_metrics.py Show resolved Hide resolved

tedzhouhk disabled auto-merge March 12, 2026 00:05

biswapanda reviewed Mar 12, 2026

View reviewed changes

lib/bindings/python/rust/lib.rs Show resolved Hide resolved

biswapanda reviewed Mar 12, 2026

View reviewed changes

components/src/dynamo/common/forward_pass_metrics.py Show resolved Hide resolved

biswapanda reviewed Mar 12, 2026

View reviewed changes

components/src/dynamo/vllm/main.py Outdated Show resolved Hide resolved

copy-pr-bot bot temporarily deployed to GITLAB March 12, 2026 01:30 Inactive

biswapanda approved these changes Mar 12, 2026

View reviewed changes

comment

af12adb

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB March 12, 2026 17:59 Inactive

copy-pr-bot bot temporarily deployed to GITLAB March 12, 2026 18:44 Inactive

Merge branch 'main' into hzhou/fpm-event-plane

cd7fe29

tedzhouhk enabled auto-merge (squash) March 12, 2026 20:20

copy-pr-bot bot temporarily deployed to GITLAB March 12, 2026 20:20 Inactive

copy-pr-bot bot temporarily deployed to GITLAB March 12, 2026 20:33 Inactive

tedzhouhk merged commit cd4773f into main Mar 12, 2026
153 checks passed

tedzhouhk deleted the hzhou/fpm-event-plane branch March 12, 2026 20:58

This was referenced Mar 14, 2026

feat: emit per-iteration forward pass metrics via ZMQ PUB sgl-project/sglang#20567

Closed

feat(sglang): relay forward pass metrics to event plane #7376

Closed

feat(sglang): relay forward pass metrics to event plane #7386

Draft

ShounakRay pushed a commit to ShounakRay/fuzzy-dynamo that referenced this pull request Mar 20, 2026

feat: ForwardPassMetrics dynamo event plane integration (ai-dynamo#7250)

0766a0a

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ForwardPassMetrics dynamo event plane integration#7250

feat: ForwardPassMetrics dynamo event plane integration#7250
tedzhouhk merged 6 commits intomainfrom
hzhou/fpm-event-plane

tedzhouhk commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

PeaBrane left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

biswapanda commented Mar 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tedzhouhk commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FPM Event Plane Integration

Summary

Motivation

Architecture

Why two layers (same as KV events)

Changes

Rust: dynamo-llm crate

Rust: dynamo-py3 crate (Python bindings)

Python

Files not changed

Discovery

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

PeaBrane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

biswapanda commented Mar 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tedzhouhk commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

Rust: `dynamo-llm` crate

Rust: `dynamo-py3` crate (Python bindings)

coderabbitai bot commented Mar 11, 2026 •

edited

Loading