You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/benchmarking/sweeps.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
## Online Benchmark
4
4
5
-
[`vllm/benchmarks/sweep/serve.py`](../../vllm/benchmarks/sweep/serve.py) automatically starts `vllm serve` and runs `vllm bench serve` to evaluate vLLM over multiple configurations.
5
+
`vllm bench sweepserve` automatically starts `vllm serve` and runs `vllm bench serve` to evaluate vLLM over multiple configurations.
6
6
7
7
Follow these steps to run the script:
8
8
@@ -91,7 +91,7 @@ vllm bench sweep serve \
91
91
92
92
## SLA Auto-Tuner
93
93
94
-
[`vllm/benchmarks/sweep/serve_sla.py`](../../vllm/benchmarks/sweep/serve_sla.py) is a wrapper over [`vllm/benchmarks/sweep/serve.py`](../../vllm/benchmarks/sweep/serve.py) that tunes either the request rate or concurrency (choose using `--sla-variable`) in order to satisfy the SLA constraints given by `--sla-params`.
94
+
`vllm bench sweepserve_sla` is a wrapper over `vllm bench sweepserve` that tunes either the request rate or concurrency (choose using `--sla-variable`) in order to satisfy the SLA constraints given by `--sla-params`.
95
95
96
96
For example, to ensure E2E latency within different target values for 99% of requests:
97
97
@@ -137,9 +137,11 @@ The algorithm for adjusting the SLA variable is as follows:
137
137
138
138
For a given combination of `--serve-params` and `--bench-params`, we share the benchmark results across `--sla-params` to avoid rerunning benchmarks with the same SLA variable value.
139
139
140
-
## Visualizer
140
+
## Visualization
141
141
142
-
[`vllm/benchmarks/sweep/plot.py`](../../vllm/benchmarks/sweep/plot.py) can be used to plot performance curves from parameter sweep results.
142
+
### Basic
143
+
144
+
`vllm bench sweep plot` can be used to plot performance curves from parameter sweep results.
0 commit comments