From 8070ad563d3017e95959248d96700211183a0117 Mon Sep 17 00:00:00 2001 From: Mark McLoughlin Date: Wed, 12 Mar 2025 14:17:08 -0400 Subject: [PATCH] [V1][Metrics] Updated list of deprecated metrics in v0.8 These are already marked as deprecated in the code, but they should be listed here too. Signed-off-by: Mark McLoughlin --- docs/source/serving/metrics.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/source/serving/metrics.md b/docs/source/serving/metrics.md index 1d55f201503c..647ece3f85f0 100644 --- a/docs/source/serving/metrics.md +++ b/docs/source/serving/metrics.md @@ -39,7 +39,16 @@ The following metrics are exposed: The following metrics are deprecated and due to be removed in a future version: -- *(No metrics are currently deprecated)* +- `vllm:num_requests_swapped`, `vllm:cpu_cache_usage_perc`, and + `vllm:cpu_prefix_cache_hit_rate` because KV cache offloading is not + used in V1. +- `vllm:gpu_prefix_cache_hit_rate` is replaced by queries+hits + counters in V1. +- `vllm:time_in_queue_requests` because it duplicates + `vllm:request_queue_time_seconds`. +- `vllm:model_forward_time_milliseconds` and + `vllm:model_execute_time_milliseconds` because + prefill/decode/inference time metrics should be used instead. Note: when metrics are deprecated in version `X.Y`, they are hidden in version `X.Y+1` but can be re-enabled using the `--show-hidden-metrics-for-version=X.Y` escape hatch,