Commit afd51dc
committed
[WIP][Metrics] Re-work approach to LoRA metrics
The current `vllm:lora_requests_info` Gauge is somewhat similar to an
Info metric (like cache_config_info) except the value is the current
wall-clock time, and is updated every iteration.
The label names used are:
- running_lora_adapters: a list of adapters with running requests,
formatted as a comma-separated string.
- waiting_lora_adapters: similar, except listing adapters with
requests waiting to be scheduled.
- max_lora - the static "max number of LoRAs in a single batch."
configuration.
It looks like this:
```
vllm:lora_requests_info{max_lora="1",running_lora_adapters="",waiting_lora_adapters=""} 1.7395575657589855e+09
vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters=""} 1.7395575723949368e+09
vllm:lora_requests_info{max_lora="1",running_lora_adapters="test-lora",waiting_lora_adapters="test-lora"} 1.7395575717647147e+09
```
I can't really make much sense of this. Encoding a running/waiting
status for multiple adapters in a comma-separated string seems quite
misguided - we should use labels to distinguish between per-adapter
counts instead:
```
vllm:num_lora_requests_running{lora_name="test-lora",model_name="meta-llama/Llama-3.1-8B-Instruct"} 8.0
vllm:num_lora_requests_waiting{lora_name="test-lora",model_name="meta-llama/Llama-3.1-8B-Instruct"} 7.0
```
This was added in #9477 and there is at least one known user. If we
revisit this design and deprecate the old metric, we should reduce the
need for a significant deprecation period by making the change in v0
also and asking this project to move to the new metric.
Signed-off-by: Mark McLoughlin <[email protected]>1 parent c9e2d64 commit afd51dc
3 files changed
+29
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1812 | 1812 | | |
1813 | 1813 | | |
1814 | 1814 | | |
1815 | | - | |
1816 | | - | |
| 1815 | + | |
| 1816 | + | |
1817 | 1817 | | |
1818 | 1818 | | |
1819 | 1819 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
| 57 | + | |
56 | 58 | | |
57 | 59 | | |
58 | 60 | | |
| |||
65 | 67 | | |
66 | 68 | | |
67 | 69 | | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
68 | 81 | | |
69 | 82 | | |
70 | 83 | | |
| |||
517 | 530 | | |
518 | 531 | | |
519 | 532 | | |
520 | | - | |
| 533 | + | |
| 534 | + | |
521 | 535 | | |
522 | | - | |
| 536 | + | |
| 537 | + | |
523 | 538 | | |
524 | 539 | | |
525 | 540 | | |
| |||
561 | 576 | | |
562 | 577 | | |
563 | 578 | | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
564 | 587 | | |
565 | 588 | | |
566 | 589 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | | - | |
67 | | - | |
| 66 | + | |
| 67 | + | |
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| |||
0 commit comments