Skip to content

Conversation

@joerunde
Copy link
Collaborator

@joerunde joerunde commented Oct 9, 2025

Description

This PR adds a debug-mode performance logger that will print the timing stats for each individual request. These are the stats collected by the engine which are aggregated into prometheus metrics. This splits out the timing info into e2e time, queue time, prefill time, and decode time for a better understanding of how time is spent inside of vllm.

Additionally, for each request this will attempt to calculate

  • the amount of time that that request was spent interrupted waiting on a new request to prefill to enter the batch
  • the mean itl from decode passes only, excluding the time that the request was interrupted

These are included as the prefill_interrupt and decode_only_itl fields.

This uses the existing VLLM_SPYRE_PERF_METRIC_LOGGING_ENABLED and VLLM_SPYRE_PERF_METRIC_LOGGING_DIR configs, and writes the results to a .jsonl file with the following fields

{"timestamp": x, "prefill_interrupt_seconds": x, "decode_only_itl_seconds": x, "finish_reason": x, "num_prompt_tokens": x, "num_generation_tokens": x, "max_tokens_param": x, "e2e_latency_seconds": x, "queued_time_seconds": x, "prefill_time_seconds":x, "inference_time_seconds": x, "decode_time_seconds": x, "mean_time_per_output_token_seconds": x}

Extending the vLLM StatLoggers could allow us to create custom prometheus metrics as well, if any of the extra info about prefill interrupt time would be helpful on a dashboard.

Signed-off-by: Joe Runde <[email protected]>
@github-actions
Copy link

github-actions bot commented Oct 9, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
@joerunde
Copy link
Collaborator Author

bot:test
MARKERS="spyre and cb and not multi and not quantized"

Copy link
Collaborator

@tjohnson31415 tjohnson31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of tiny nits, but can merge as is.

Co-authored-by: Travis Johnson <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
@joerunde joerunde merged commit dff277b into main Oct 10, 2025
19 checks passed
@joerunde joerunde deleted the perf-logger branch October 10, 2025 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants