-
Notifications
You must be signed in to change notification settings - Fork 31
✨ add debug perf logger #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Joe Runde <[email protected]>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. Or this can be done with Now you are good to go 🚀 |
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
|
bot:test |
tjohnson31415
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of tiny nits, but can merge as is.
Co-authored-by: Travis Johnson <[email protected]> Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Description
This PR adds a debug-mode performance logger that will print the timing stats for each individual request. These are the stats collected by the engine which are aggregated into prometheus metrics. This splits out the timing info into e2e time, queue time, prefill time, and decode time for a better understanding of how time is spent inside of vllm.
Additionally, for each request this will attempt to calculate
These are included as the
prefill_interruptanddecode_only_itlfields.This uses the existing
VLLM_SPYRE_PERF_METRIC_LOGGING_ENABLEDandVLLM_SPYRE_PERF_METRIC_LOGGING_DIRconfigs, and writes the results to a .jsonl file with the following fieldsExtending the vLLM
StatLoggerscould allow us to create custom prometheus metrics as well, if any of the extra info about prefill interrupt time would be helpful on a dashboard.