Skip to content

Commit 2062055

Browse files
committed
Clean
Signed-off-by: DarkLight1337 <[email protected]>
1 parent eb13a5a commit 2062055

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

docs/benchmarking/sweeps.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Online Benchmark
44

5-
[`vllm/benchmarks/sweep/serve.py`](../../vllm/benchmarks/sweep/serve.py) automatically starts `vllm serve` and runs `vllm bench serve` to evaluate vLLM over multiple configurations.
5+
`vllm bench sweep serve` automatically starts `vllm serve` and runs `vllm bench serve` to evaluate vLLM over multiple configurations.
66

77
Follow these steps to run the script:
88

@@ -91,7 +91,7 @@ vllm bench sweep serve \
9191

9292
## SLA Auto-Tuner
9393

94-
[`vllm/benchmarks/sweep/serve_sla.py`](../../vllm/benchmarks/sweep/serve_sla.py) is a wrapper over [`vllm/benchmarks/sweep/serve.py`](../../vllm/benchmarks/sweep/serve.py) that tunes either the request rate or concurrency (choose using `--sla-variable`) in order to satisfy the SLA constraints given by `--sla-params`.
94+
`vllm bench sweep serve_sla` is a wrapper over `vllm bench sweep serve` that tunes either the request rate or concurrency (choose using `--sla-variable`) in order to satisfy the SLA constraints given by `--sla-params`.
9595

9696
For example, to ensure E2E latency within different target values for 99% of requests:
9797

@@ -137,9 +137,11 @@ The algorithm for adjusting the SLA variable is as follows:
137137

138138
For a given combination of `--serve-params` and `--bench-params`, we share the benchmark results across `--sla-params` to avoid rerunning benchmarks with the same SLA variable value.
139139

140-
## Visualizer
140+
## Visualization
141141

142-
[`vllm/benchmarks/sweep/plot.py`](../../vllm/benchmarks/sweep/plot.py) can be used to plot performance curves from parameter sweep results.
142+
### Basic
143+
144+
`vllm bench sweep plot` can be used to plot performance curves from parameter sweep results.
143145

144146
Example command:
145147

@@ -155,7 +157,7 @@ vllm bench sweep plot benchmarks/results/<timestamp> \
155157
!!! tip
156158
You can use `--dry-run` to preview the figures to be plotted.
157159

158-
## Pareto chart
160+
### Pareto chart
159161

160162
`vllm bench sweep plot_pareto` helps pick configurations that balance per-user and per-GPU throughput.
161163

0 commit comments

Comments
 (0)