Skip to content

feat: ForwardPassMetrics dynamo event plane integration#7250

Merged
tedzhouhk merged 6 commits intomainfrom
hzhou/fpm-event-plane
Mar 12, 2026
Merged

feat: ForwardPassMetrics dynamo event plane integration#7250
tedzhouhk merged 6 commits intomainfrom
hzhou/fpm-event-plane

Conversation

@tedzhouhk
Copy link
Copy Markdown
Contributor

@tedzhouhk tedzhouhk commented Mar 11, 2026

FPM Event Plane Integration

Summary

Adds a Dynamo event plane relay for ForwardPassMetrics, enabling automatic discovery and transport-agnostic delivery (NATS Core / ZMQ). Follows the same two-layer bridge architecture as KV events.

Motivation

The initial FPM implementation (merged in the prior PR) uses raw ZMQ PUB/SUB with manual port configuration. This works for local testing but lacks:

  • Discovery: subscribers must know the engine's IP and port
  • Transport abstraction: hard-coded to ZMQ, no NATS support
  • Dynamic scaling: adding/removing engines requires reconfiguring subscribers

This PR bridges the gap by adding a relay in the parent process that subscribes to the child's raw ZMQ and re-publishes to the Dynamo event plane, exactly like KV events do.

Architecture

Child (EngineCore):  InstrumentedScheduler -> _FpmPublisherThread -> ZMQ PUB (localhost)
                                                                        |
Parent (dynamo.vllm): FpmEventRelay (ZMQ SUB) -> EventPublisher -> Event Plane (NATS/ZMQ)
                                                                        |
Consumer:            FpmEventSubscriber (auto-discovered) -> decode() -> ForwardPassMetrics

The raw ZMQ hop is needed because the scheduler runs in a forked child process without access to the Dynamo runtime. The FpmEventRelay bridge in the parent process handles event plane transport and discovery registration automatically.

Why two layers (same as KV events)

vLLM's EngineCore runs in a forked child process. After fork(), the child has no access to the parent's Dynamo runtime (tokio executor, NATS connections, etcd leases, K8s watchers don't survive fork). The child can only open new raw sockets. The parent process has the runtime and acts as the bridge.

Changes

Rust: dynamo-llm crate

lib/llm/src/fpm_publisher.rs (new)

FpmEventRelay -- subscribes to a local ZMQ PUB socket and relays raw bytes to EventPublisher::for_component("forward-pass-metrics"). Simpler than KvEventPublisher (no event transformation, no batching, no local indexer). Includes:

  • Consecutive error tracking with max-error exit
  • Warning on malformed ZMQ frames
  • CancellationToken for graceful shutdown
  • Drop impl that cancels the relay task

Rust: dynamo-py3 crate (Python bindings)

lib/bindings/python/rust/llm/fpm.rs (new)

  • FpmEventRelay -- thin #[pyclass] wrapper around dynamo_llm::fpm_publisher::FpmEventRelay
  • FpmEventSubscriber -- wraps EventSubscriber::for_component for the consumer side. Background tokio task feeds an mpsc channel; recv() releases the GIL while blocking. Exits cleanly when the receiver is dropped.

Python

components/src/dynamo/vllm/main.py

  • Added setup_fpm_relay() that creates one FpmEventRelay per dp_rank (parallel to setup_kv_event_publisher())
  • Called in both init_prefill() and init() paths
  • Opt-in: only activates when DYN_VLLM_FORWARDPASS_METRIC_PORT is set

components/src/dynamo/vllm/handlers.py

  • Added self.fpm_relays field on handler class
  • Added relay shutdown in cleanup() to prevent resource leaks

components/src/dynamo/common/recv_forward_pass_metrics.py (new)

Standalone consumer that uses FpmEventSubscriber with the Dynamo runtime for auto-discovery. Usage:

python -m dynamo.common.recv_forward_pass_metrics \
  --namespace dynamo --component backend --endpoint generate

lib/bindings/python/src/dynamo/llm/__init__.py

Re-exports FpmEventRelay and FpmEventSubscriber from _core.

Files not changed

  • instrumented_scheduler.py -- stays as-is (raw ZMQ PUB from child process)
  • forward_pass_metrics.py -- schema unchanged
  • envs.py -- DYN_VLLM_FORWARDPASS_METRIC_PORT still needed for child-to-parent ZMQ
  • args.py -- injection logic unchanged

Discovery

Each FpmEventRelay creates an EventPublisher::for_component("forward-pass-metrics") which automatically:

  1. Binds a ZMQ PUB on an auto-assigned port (or connects to NATS)
  2. Registers as DiscoveryInstance::EventChannel with topic: "forward-pass-metrics" in the discovery plane

Subscribers use EventSubscriber::for_component("forward-pass-metrics") which watches discovery and dynamically connects/disconnects as engines come and go. Works across all discovery backends (K8s CRD, etcd, file, mem).

Testing

# Terminal 1: Start engine with FPM
DYN_VLLM_FORWARDPASS_METRIC_PORT=20380 bash examples/backends/vllm/launch/agg.sh

# Terminal 2: Receive via event plane (auto-discovery)
python -m dynamo.common.recv_forward_pass_metrics \
  --namespace dynamo --component backend --endpoint generate

# Terminal 3: Load test
aiperf profile --model Qwen/Qwen3-0.6B --endpoint-type chat \
  --endpoint /v1/chat/completions --streaming --url http://localhost:8000 \
  --synthetic-input-tokens-mean 500 --output-tokens-mean 50 \
  --concurrency 50 --request-count 500

Summary by CodeRabbit

Release Notes

  • New Features

    • Added forward pass metrics collection and consumption capabilities via new APIs for event-based metric streaming.
    • Introduced utility tools to subscribe to and display forward pass metrics.
  • Chores

    • Enhanced documentation describing forward pass metrics architecture and data flow.

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
@tedzhouhk tedzhouhk requested review from a team as code owners March 11, 2026 22:35
@github-actions github-actions bot added feat backend::vllm Relates to the vllm backend labels Mar 11, 2026
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 11, 2026

Walkthrough

This change introduces a complete forward-pass metrics (FPM) infrastructure that bridges local ZMQ metrics from vLLM to the Dynamo event plane. The implementation includes Rust bindings for relay and subscriber components, Python utilities for metrics consumption, vLLM integration for lifecycle management, and architectural documentation describing the two-layer event flow system.

Changes

Cohort / File(s) Summary
Architectural Documentation & Utilities
components/src/dynamo/common/forward_pass_metrics.py, components/src/dynamo/common/recv_forward_pass_metrics.py
Updated module docstring detailing FPM architecture, data flow, and relay design. Added new consumer utility that subscribes to ForwardPassMetrics via event plane, auto-discovers publishers, and prints metrics as JSON with CLI configuration options.
vLLM Integration
components/src/dynamo/vllm/handlers.py, components/src/dynamo/vllm/main.py
Added fpm_relays attribute to BaseWorkerHandler with shutdown cleanup. Introduced setup_fpm_relay() function that creates FpmEventRelay instances per data-parallel rank, gated by environment variable, integrated into both prefill and standard engine initialization paths.
Python Bindings & Type Stubs
lib/bindings/python/src/dynamo/_core.pyi, lib/bindings/python/src/dynamo/llm/__init__.py
Added type stubs for FpmEventRelay and FpmEventSubscriber classes with lifecycle and message reception methods. Exposed both classes via llm module public imports.
Rust Module Wiring
lib/bindings/python/rust/lib.rs, lib/bindings/python/rust/llm.rs
Registered FpmEventRelay and FpmEventSubscriber as Python-accessible classes. Added fpm module declarations alongside model_card and preprocessor modules.
Rust Implementation
lib/bindings/python/rust/llm/fpm.rs, lib/llm/src/fpm_publisher.rs, lib/llm/src/lib.rs
Implemented FpmEventRelay bridging ZMQ SUB socket to event plane with background relay task, multipart frame parsing, consecutive error tracking, and graceful shutdown. Implemented FpmEventSubscriber with async message reception, background spawning, and GIL-aware synchronous recv() method. Exported both as public modules.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A relay hops through ZMQ's dance,
Metrics flow through Dynamo's expanse,
With Rust bindings spun and Python compiled,
Forward-pass wisdom flows undefiled! 🌟

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: ForwardPassMetrics dynamo event plane integration' accurately and specifically describes the main feature being added: integration of ForwardPassMetrics with the Dynamo event plane.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering Summary, Motivation, Architecture, Changes, Discovery, and Testing. It exceeds the repository template requirements with detailed technical context.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@PeaBrane PeaBrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean integration that follows the existing KV event relay patterns well. One minor nit on cleanup consistency.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
components/src/dynamo/vllm/handlers.py (1)

879-882: Keep the shutdown traceback here.

Line 881-882 logs only str(e), which makes relay teardown failures much harder to debug across the Python/Rust boundary. Prefer logger.exception(...) or exc_info=True so the full stack is preserved.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/vllm/handlers.py` around lines 879 - 882, The current
except block around relay.shutdown() only logs str(e) which loses the traceback;
update the handler catching Exception in the shutdown try/except so it logs the
full exception context (either call logger.exception("Failed to shut down FPM
relay") or logger.warning("Failed to shut down FPM relay", exc_info=True))
instead of logger.warning(f"Failed to shut down FPM relay: {e}"), leaving the
message text intact and ensuring the stacktrace from relay.shutdown() is
preserved for debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/common/recv_forward_pass_metrics.py`:
- Around line 60-62: The runtime is being created with enable_nats forcibly set
to False which breaks topologies where the event plane uses NATS; update the
DistributedRuntime instantiation (the runtime variable / DistributedRuntime
constructor call) to stop forcing enable_nats=False — either remove that
argument so the binding uses its default behavior, or wire an explicit
CLI/runtime flag into the constructor (e.g., pass args.enable_nats) so callers
can control NATS instead of hardcoding False.

In `@components/src/dynamo/vllm/main.py`:
- Around line 724-727: The FPM relays are being started for engines built from
snapshots without a seeded worker ID, causing blank worker_id in
ForwardPassMetrics; update the checkpoint/snapshot path in worker()—where
snapshot_engine is created via setup_vllm_engine()—to either seed fpm_worker_id
before engine construction (pass the fpm_worker_id into setup_vllm_engine) or
skip calling setup_fpm_relay entirely for snapshot-created engines;
specifically, ensure the code that currently calls
setup_fpm_relay(generate_endpoint, vllm_config) and assigns handler.fpm_relays
only runs when a valid fpm_worker_id is present (or when the engine was not
created from a snapshot), or alternatively inject fpm_worker_id into
setup_vllm_engine so subsequent setup_fpm_relay sees a non-empty worker id
(affecting fpm_relays, setup_fpm_relay, handler.fpm_relays, snapshot_engine,
worker(), setup_vllm_engine, and fpm_worker_id).

In `@lib/bindings/python/rust/llm/fpm.rs`:
- Around line 68-69: Change the unbounded handoff to a bounded channel to
prevent unbounded memory growth: replace the rx field type
Arc<std::sync::Mutex<tokio::sync::mpsc::UnboundedReceiver<Vec<u8>>>> with
Arc<std::sync::Mutex<tokio::sync::mpsc::Receiver<Vec<u8>>>> (or the equivalent
bounded receiver type), create the channel with
tokio::sync::mpsc::channel(CAPACITY) instead of unbounded_channel in the
producer/initialization code (the site currently calling unbounded_channel
around line 86), and update the producer send logic to use .send().await
(backpressure) or an explicit overflow policy (e.g., call try_send() and drop
oldest when Err(TrySendError::Full) if you prefer a drop policy); also adjust
any recv/send call sites and error handling to handle the bounded channel
semantics and closed-channel errors while keeping the CancellationToken usage
unchanged.

In `@lib/llm/src/fpm_publisher.rs`:
- Around line 89-103: The code currently accepts any ZMQ multipart message with
frames.len() >= 3 and silently discards extra frames; update the validation so
only messages with exactly 3 frames are accepted: check frames.len() == 3,
extract the payload only when that is true (instead of swap_remove on index 2
for >=3), and treat any other frame count as a malformed message (log/warn and
skip publishing via publisher.publish_bytes). Ensure you reference the existing
frames Vec<Vec<u8>> handling and the call to
publisher.publish_bytes(payload).await when implementing this strict check.

---

Nitpick comments:
In `@components/src/dynamo/vllm/handlers.py`:
- Around line 879-882: The current except block around relay.shutdown() only
logs str(e) which loses the traceback; update the handler catching Exception in
the shutdown try/except so it logs the full exception context (either call
logger.exception("Failed to shut down FPM relay") or logger.warning("Failed to
shut down FPM relay", exc_info=True)) instead of logger.warning(f"Failed to shut
down FPM relay: {e}"), leaving the message text intact and ensuring the
stacktrace from relay.shutdown() is preserved for debugging.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 507f52c2-8883-4eb3-8ab5-fc1e5c9740ff

📥 Commits

Reviewing files that changed from the base of the PR and between 5178a4a and 327e70f.

📒 Files selected for processing (11)
  • components/src/dynamo/common/forward_pass_metrics.py
  • components/src/dynamo/common/recv_forward_pass_metrics.py
  • components/src/dynamo/vllm/handlers.py
  • components/src/dynamo/vllm/main.py
  • lib/bindings/python/rust/lib.rs
  • lib/bindings/python/rust/llm.rs
  • lib/bindings/python/rust/llm/fpm.rs
  • lib/bindings/python/src/dynamo/_core.pyi
  • lib/bindings/python/src/dynamo/llm/__init__.py
  • lib/llm/src/fpm_publisher.rs
  • lib/llm/src/lib.rs

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
@tedzhouhk tedzhouhk enabled auto-merge (squash) March 11, 2026 23:54
@tedzhouhk tedzhouhk disabled auto-merge March 12, 2026 00:05
@biswapanda
Copy link
Copy Markdown
Contributor

On naming, FPM is not a well-understood acronym yet.
we'd either use long full name (ForwardPassMetrics) or add some notes that fpm=ForwardPassMetrics
FPM = ForwardPassMetrics

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
@tedzhouhk tedzhouhk enabled auto-merge (squash) March 12, 2026 20:20
@tedzhouhk tedzhouhk merged commit cd4773f into main Mar 12, 2026
153 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/fpm-event-plane branch March 12, 2026 20:58
ShounakRay pushed a commit to ShounakRay/fuzzy-dynamo that referenced this pull request Mar 20, 2026
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::vllm Relates to the vllm backend feat size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants