feat: emit per-iteration forward pass metrics via ZMQ PUB by ishandhanani · Pull Request #20567 · sgl-project/sglang

ishandhanani · 2026-03-14T04:27:38Z

Summary

Add ForwardPassMetrics emission from the scheduler on every forward pass iteration via ZMQ PUB socket
Enables external consumers (planners, routers) to observe real-time scheduling behavior without polling Prometheus
Controlled via --forward-pass-metrics-port server arg -- zero overhead when not set
Background publisher thread keeps serialization off the scheduler hot path
Wire format uses msgspec msgpack for zero-copy serialization

Motivation

External orchestration systems need per-iteration scheduling telemetry to make informed routing decisions. The existing KV metrics (from #6721) provide block-level cache occupancy, but planners also need request-level scheduling data: how many prefill/decode requests ran, token counts, KV context lengths, queue depth, and iteration wall time.

This uses sglang's existing SchedulerMetricsMixin -- no scheduler subclass needed.

Data emitted per iteration

Field	Description
`wall_time`	Schedule + forward + output processing duration
`num_prefill_requests`	Prefill requests in this batch
`sum_prefill_tokens`	Freshly computed prefill tokens
`sum_prefill_kv_tokens`	Prefix cache hits (read, not computed)
`var_prefill_length`	Variance of prompt lengths
`num_decode_requests`	Decode requests in this batch
`sum_decode_kv_tokens`	Total KV context length across decode requests
`var_decode_kv_tokens`	Variance of decode KV lengths
Queued request equivalents	Same fields for waiting queue snapshot

Architecture

Scheduler process:
  SchedulerMetricsMixin._emit_forward_pass_metrics()
    -> _FpmPublisherThread (daemon) -> ZMQ PUB (localhost:{port + dp_rank})

External consumer:
  ZMQ SUB -> deserialize ForwardPassMetrics

How to enable

python -m sglang.launch_server --model-path <model> --forward-pass-metrics-port 20380

Files changed

New: python/sglang/srt/observability/forward_pass_metrics.py -- ForwardPassMetrics schema, WelfordAccumulator, _FpmPublisherThread
Modified: python/sglang/srt/observability/scheduler_metrics_mixin.py -- init FPM publisher, _emit_forward_pass_metrics(), _shutdown_fpm()
Modified: python/sglang/srt/managers/scheduler.py -- record batch start time, emit FPM from process_batch_result()
Modified: python/sglang/srt/server_args.py -- add forward_pass_metrics_port field and --forward-pass-metrics-port CLI flag
New: test/manual/test_forward_pass_metrics.py -- schema roundtrip, ZMQ PUB/SUB e2e, heartbeat

Test plan

Unit tests: schema encode/decode roundtrip, WelfordAccumulator correctness
ZMQ PUB/SUB e2e: publisher thread -> subscriber receives correct frames
Heartbeat: idle publisher emits heartbeat within interval
Live server: launched sglang with --forward-pass-metrics-port 20380, sent requests, verified prefill/decode metrics arrive with correct values

gemini-code-assist · 2026-03-14T04:27:42Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Add ForwardPassMetrics emission from the scheduler on every forward pass iteration via ZMQ PUB socket. Enables external consumers (planners, routers) to observe real-time scheduling behavior without polling Prometheus. Controlled via --forward-pass-metrics-port server arg. Zero overhead when not set. Background publisher thread keeps serialization off the scheduler hot path. Idle heartbeats emitted every 1s. Data emitted per iteration: - Scheduled requests: prefill/decode counts, token sums, KV lengths, variance - Queued requests: waiting queue depth and token distribution - Wall time: schedule + forward + output processing duration

ishandhanani mentioned this pull request Mar 14, 2026

feat(sglang): relay forward pass metrics to event plane ai-dynamo/dynamo#7376

Closed

3 tasks

ishandhanani closed this Mar 14, 2026

ishandhanani reopened this Mar 14, 2026

ishandhanani force-pushed the idhanani/forward-pass-metrics branch from a0fa49a to 48ccb9a Compare March 14, 2026 05:13

ishandhanani closed this Mar 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: emit per-iteration forward pass metrics via ZMQ PUB#20567

feat: emit per-iteration forward pass metrics via ZMQ PUB#20567
ishandhanani wants to merge 1 commit intosgl-project:mainfrom
ishandhanani:idhanani/forward-pass-metrics

ishandhanani commented Mar 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ishandhanani commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Data emitted per iteration

Architecture

How to enable

Files changed

Test plan

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ishandhanani commented Mar 14, 2026 •

edited

Loading