Skip to content

feat: emit per-iteration forward pass metrics via ZMQ PUB#20567

Closed
ishandhanani wants to merge 1 commit intosgl-project:mainfrom
ishandhanani:idhanani/forward-pass-metrics
Closed

feat: emit per-iteration forward pass metrics via ZMQ PUB#20567
ishandhanani wants to merge 1 commit intosgl-project:mainfrom
ishandhanani:idhanani/forward-pass-metrics

Conversation

@ishandhanani
Copy link
Copy Markdown
Collaborator

@ishandhanani ishandhanani commented Mar 14, 2026

Summary

  • Add ForwardPassMetrics emission from the scheduler on every forward pass iteration via ZMQ PUB socket
  • Enables external consumers (planners, routers) to observe real-time scheduling behavior without polling Prometheus
  • Controlled via --forward-pass-metrics-port server arg -- zero overhead when not set
  • Background publisher thread keeps serialization off the scheduler hot path
  • Wire format uses msgspec msgpack for zero-copy serialization

Motivation

External orchestration systems need per-iteration scheduling telemetry to make informed routing decisions. The existing KV metrics (from #6721) provide block-level cache occupancy, but planners also need request-level scheduling data: how many prefill/decode requests ran, token counts, KV context lengths, queue depth, and iteration wall time.

This uses sglang's existing SchedulerMetricsMixin -- no scheduler subclass needed.

Data emitted per iteration

Field Description
wall_time Schedule + forward + output processing duration
num_prefill_requests Prefill requests in this batch
sum_prefill_tokens Freshly computed prefill tokens
sum_prefill_kv_tokens Prefix cache hits (read, not computed)
var_prefill_length Variance of prompt lengths
num_decode_requests Decode requests in this batch
sum_decode_kv_tokens Total KV context length across decode requests
var_decode_kv_tokens Variance of decode KV lengths
Queued request equivalents Same fields for waiting queue snapshot

Architecture

Scheduler process:
  SchedulerMetricsMixin._emit_forward_pass_metrics()
    -> _FpmPublisherThread (daemon) -> ZMQ PUB (localhost:{port + dp_rank})

External consumer:
  ZMQ SUB -> deserialize ForwardPassMetrics

How to enable

python -m sglang.launch_server --model-path <model> --forward-pass-metrics-port 20380

Files changed

  • New: python/sglang/srt/observability/forward_pass_metrics.py -- ForwardPassMetrics schema, WelfordAccumulator, _FpmPublisherThread
  • Modified: python/sglang/srt/observability/scheduler_metrics_mixin.py -- init FPM publisher, _emit_forward_pass_metrics(), _shutdown_fpm()
  • Modified: python/sglang/srt/managers/scheduler.py -- record batch start time, emit FPM from process_batch_result()
  • Modified: python/sglang/srt/server_args.py -- add forward_pass_metrics_port field and --forward-pass-metrics-port CLI flag
  • New: test/manual/test_forward_pass_metrics.py -- schema roundtrip, ZMQ PUB/SUB e2e, heartbeat

Test plan

  • Unit tests: schema encode/decode roundtrip, WelfordAccumulator correctness
  • ZMQ PUB/SUB e2e: publisher thread -> subscriber receives correct frames
  • Heartbeat: idle publisher emits heartbeat within interval
  • Live server: launched sglang with --forward-pass-metrics-port 20380, sent requests, verified prefill/decode metrics arrive with correct values

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Add ForwardPassMetrics emission from the scheduler on every forward
pass iteration via ZMQ PUB socket. Enables external consumers (planners,
routers) to observe real-time scheduling behavior without polling
Prometheus.

Controlled via --forward-pass-metrics-port server arg. Zero overhead
when not set. Background publisher thread keeps serialization off the
scheduler hot path. Idle heartbeats emitted every 1s.

Data emitted per iteration:
- Scheduled requests: prefill/decode counts, token sums, KV lengths, variance
- Queued requests: waiting queue depth and token distribution
- Wall time: schedule + forward + output processing duration
@ishandhanani ishandhanani force-pushed the idhanani/forward-pass-metrics branch from a0fa49a to 48ccb9a Compare March 14, 2026 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant