feat: emit per-iteration forward pass metrics via ZMQ PUB#20567
Closed
ishandhanani wants to merge 1 commit intosgl-project:mainfrom
Closed
feat: emit per-iteration forward pass metrics via ZMQ PUB#20567ishandhanani wants to merge 1 commit intosgl-project:mainfrom
ishandhanani wants to merge 1 commit intosgl-project:mainfrom
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
3 tasks
Add ForwardPassMetrics emission from the scheduler on every forward pass iteration via ZMQ PUB socket. Enables external consumers (planners, routers) to observe real-time scheduling behavior without polling Prometheus. Controlled via --forward-pass-metrics-port server arg. Zero overhead when not set. Background publisher thread keeps serialization off the scheduler hot path. Idle heartbeats emitted every 1s. Data emitted per iteration: - Scheduled requests: prefill/decode counts, token sums, KV lengths, variance - Queued requests: waiting queue depth and token distribution - Wall time: schedule + forward + output processing duration
a0fa49a to
48ccb9a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ForwardPassMetricsemission from the scheduler on every forward pass iteration via ZMQ PUB socket--forward-pass-metrics-portserver arg -- zero overhead when not setMotivation
External orchestration systems need per-iteration scheduling telemetry to make informed routing decisions. The existing KV metrics (from #6721) provide block-level cache occupancy, but planners also need request-level scheduling data: how many prefill/decode requests ran, token counts, KV context lengths, queue depth, and iteration wall time.
This uses sglang's existing
SchedulerMetricsMixin-- no scheduler subclass needed.Data emitted per iteration
wall_timenum_prefill_requestssum_prefill_tokenssum_prefill_kv_tokensvar_prefill_lengthnum_decode_requestssum_decode_kv_tokensvar_decode_kv_tokensArchitecture
How to enable
Files changed
python/sglang/srt/observability/forward_pass_metrics.py-- ForwardPassMetrics schema, WelfordAccumulator, _FpmPublisherThreadpython/sglang/srt/observability/scheduler_metrics_mixin.py-- init FPM publisher,_emit_forward_pass_metrics(),_shutdown_fpm()python/sglang/srt/managers/scheduler.py-- record batch start time, emit FPM fromprocess_batch_result()python/sglang/srt/server_args.py-- addforward_pass_metrics_portfield and--forward-pass-metrics-portCLI flagtest/manual/test_forward_pass_metrics.py-- schema roundtrip, ZMQ PUB/SUB e2e, heartbeatTest plan
--forward-pass-metrics-port 20380, sent requests, verified prefill/decode metrics arrive with correct values