feat: emit per-iteration forward pass metrics via ZMQ PUB by ishandhanani · Pull Request #20569 · sgl-project/sglang

ishandhanani · 2026-03-14T05:14:34Z

Summary

Add ForwardPassMetrics emission from the scheduler on every forward pass iteration via ZMQ PUB socket
Enables external consumers (planners, routers) to observe real-time scheduling behavior without polling Prometheus
Controlled via --forward-pass-metrics-port server arg -- zero overhead when not set
Background publisher thread keeps serialization off the scheduler hot path
Wire format uses msgspec msgpack for zero-copy serialization

Motivation

External orchestration systems need per-iteration scheduling telemetry to make informed routing decisions. The existing KV metrics (from #6721) provide block-level cache occupancy, but planners also need request-level scheduling data: how many prefill/decode requests ran, token counts, KV context lengths, queue depth, and iteration wall time.

This uses sglang's existing SchedulerMetricsMixin -- no scheduler subclass needed.

Data emitted per iteration

Field	Description
`wall_time`	Schedule + forward + output processing duration
`num_prefill_requests`	Prefill requests in this batch
`sum_prefill_tokens`	Freshly computed prefill tokens
`sum_prefill_kv_tokens`	Prefix cache hits (read, not computed)
`var_prefill_length`	Variance of prompt lengths
`num_decode_requests`	Decode requests in this batch
`sum_decode_kv_tokens`	Total KV context length across decode requests
`var_decode_kv_tokens`	Variance of decode KV lengths
Queued request equivalents	Same fields for waiting queue snapshot

Architecture

Scheduler process:
  SchedulerMetricsMixin._emit_forward_pass_metrics()
    -> _FpmPublisherThread (daemon) -> ZMQ PUB (localhost:{port + dp_rank})

External consumer:
  ZMQ SUB -> deserialize ForwardPassMetrics

How to enable

python -m sglang.launch_server --model-path <model> --forward-pass-metrics-port 20380

Files changed

New: python/sglang/srt/observability/forward_pass_metrics.py -- ForwardPassMetrics schema, WelfordAccumulator, _FpmPublisherThread
Modified: python/sglang/srt/observability/scheduler_metrics_mixin.py -- init FPM publisher, _emit_forward_pass_metrics(), _shutdown_fpm()
Modified: python/sglang/srt/managers/scheduler.py -- record batch start time, emit FPM from process_batch_result()
Modified: python/sglang/srt/server_args.py -- add forward_pass_metrics_port field and --forward-pass-metrics-port CLI flag
New: test/manual/test_forward_pass_metrics.py -- schema roundtrip, ZMQ PUB/SUB e2e, heartbeat

Test plan

Unit tests: schema encode/decode roundtrip, WelfordAccumulator correctness
ZMQ PUB/SUB e2e: publisher thread -> subscriber receives correct frames
Heartbeat: idle publisher emits heartbeat within interval
Live server: launched sglang with --forward-pass-metrics-port 20380, sent requests, verified prefill/decode metrics arrive with correct values

gemini-code-assist · 2026-03-14T05:14:38Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Add ForwardPassMetrics emission from the scheduler on every forward pass iteration via ZMQ PUB socket. Enables external consumers (planners, routers) to observe real-time scheduling behavior without polling Prometheus. Controlled via --forward-pass-metrics-port server arg. Zero overhead when not set. Background publisher thread keeps serialization off the scheduler hot path. Idle heartbeats emitted every 1s. Data emitted per iteration: - Scheduled requests: prefill/decode counts, token sums, KV lengths, variance - Queued requests: waiting queue depth and token distribution - wall_time: time.monotonic() at emit time for inter-iteration cadence

tedzhouhk · 2026-03-16T17:57:33Z

python/sglang/srt/managers/scheduler.py

    ) -> Union[GenerationBatchResult, EmbeddingBatchResult]:
        """Run a batch."""
        self.forward_ct += 1
+        batch.fpm_start_time = time.monotonic()


maybe it's better to include scheduling time and log fpm_start_time before get_next_batch_to_run?

tedzhouhk · 2026-03-16T18:02:14Z

seems there are repeated files (FMP struct definitions, var calaculation, etc), is it better to import them from dynamo, or we want to make this feature generic to all sglang users?

ishandhanani · 2026-03-16T18:49:07Z

seems there are repeated files (FMP struct definitions, var calaculation, etc), is it better to import them from dynamo, or we want to make this feature generic to all sglang users?

We don't want a dependency on Dynamo here (even lazy). I'm ok with it being duplicated for now....ideally we could publish the FPM spec properly somewhere. This and events are currently duplicated across FWs

ishandhanani force-pushed the idhanani/forward-pass-metrics branch 2 times, most recently from df1e02d to 61362cc Compare March 14, 2026 05:41

ishandhanani force-pushed the idhanani/forward-pass-metrics branch from 61362cc to 44b59a5 Compare March 14, 2026 05:44

ishandhanani mentioned this pull request Mar 14, 2026

feat(sglang): relay forward pass metrics to event plane ai-dynamo/dynamo#7386

Draft

3 tasks

fix: align forward pass metrics contract

37726ca

tedzhouhk reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: emit per-iteration forward pass metrics via ZMQ PUB#20569

feat: emit per-iteration forward pass metrics via ZMQ PUB#20569
ishandhanani wants to merge 2 commits intomainfrom
idhanani/forward-pass-metrics

ishandhanani commented Mar 14, 2026

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Uh oh!

tedzhouhk Mar 16, 2026

Uh oh!

tedzhouhk commented Mar 16, 2026

Uh oh!

ishandhanani commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ishandhanani commented Mar 14, 2026

Summary

Motivation

Data emitted per iteration

Architecture

How to enable

Files changed

Test plan

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Uh oh!

tedzhouhk Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

tedzhouhk commented Mar 16, 2026

Uh oh!

ishandhanani commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants