Skip to content

Conversation

@hnts03-moreh
Copy link

@hnts03-moreh hnts03-moreh commented Dec 8, 2025

Purpose

Support for Multi-Process Benchmark Operation

Benchmarks typically run as a single process and thread. During benchmark execution, each request's output is received and the time (ITL) is calculated. However, when multiple requests are running simultaneously, a bottleneck occurs in output processing, preventing proper ITL calculations.

We added the --max-connections-per-worker option to the benchmark. Based on this value, multiple processes are created to run the benchmark.

Each process processes requests equal to num_requests / process_cnt, and each process also calculates the ITL. The metrics processed by each process are then aggregated to produce the final benchmark result (Serving Benchmark Result).

Test Plan

vllm bench serve \
  --backend vllm \
  --model "deepseek-ai/DeepSeek-R1" \
  --metric-percentiles "90" \
  --percentile-metrics "itl,tps,ttft,e2el" \
  --host "mif-istio.cluster.svc.cluster.local" \
  --port 80 \
  --num-prompts 32400 \
  --max-concurrency 10800 \
  --request-rate 78 \
  --ignore-eos \
  --ready-check-timeout-sec 0 \
  --max-connections-per-worker 1296 \
  --dataset-name sharegpt \
  --dataset-path /app/dataset/ShareGPT_V3_unfiltered_cleaned_split.json \
  --sharegpt-input-len 1000 \
  --sharegpt-output-len 1000

Test Result

  1. Benchmark result contains Number of worker processes
============ Serving Benchmark Result ============
Number of worker processes:              25
Successful requests:                     32400
Maximum request concurrency:             10800
Request rate configured (RPS):           78.00
Benchmark duration (s):                  651.09
Total input tokens:                      32400000
Total generated tokens:                  32400000
Request throughput (req/s):              49.76
Output token throughput (tok/s):         49762.71
Peak output token throughput (tok/s):    66969.00
Peak concurrent requests:                11028.00
Total Token throughput (tok/s):          99525.41
---------------Time to First Token----------------
Mean TTFT (ms):                          2931.03
Median TTFT (ms):                        2162.61
P90 TTFT (ms):                           5907.13
---------------Inter-token Latency----------------
Mean ITL (ms):                           170.31
Median ITL (ms):                         166.72
P90 ITL (ms):                            213.08
----------------End-to-end Latency----------------
Mean E2EL (ms):                          173070.05
Median E2EL (ms):                        176897.12
P90 E2EL (ms):                           180849.78
==================================================
  1. Run $ top command to check multiprocess benchmarks
top - 14:20:50 up 7 days, 19:57, 27 users,  load average: 34.16, 31.91, 29.55
Tasks: 1495 total,   9 running, 1482 sleeping,   3 stopped,   1 zombie
%Cpu(s): 11.1 us, 15.6 sy,  0.0 ni, 72.5 id,  0.0 wa,  0.0 hi,  0.8 si,  0.0 st
MiB Mem : 2321988.+total,  60780.8 free,  97886.2 used, 2163321.+buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 2211563.+avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 188548 root      20   0  267.6g   2.4g  11444 S   5.6   0.1   1:29.51 vLLM Benchmark_
 188593 root      20   0  267.6g   2.4g  11444 S   5.6   0.1   1:30.11 vLLM Benchmark_
 188342 root      20   0  267.6g   2.4g  11444 S   5.3   0.1   1:29.94 vLLM Benchmark_
 188412 root      20   0  267.6g   2.4g  11444 S   5.3   0.1   1:30.06 vLLM Benchmark_
 188513 root      20   0  267.6g   2.4g  11444 S   5.3   0.1   1:30.40 vLLM Benchmark_
 188514 root      20   0  267.7g   2.4g  11444 S   5.3   0.1   1:29.46 vLLM Benchmark_
 188531 root      20   0  267.5g   2.4g  11444 S   5.3   0.1   1:29.09 vLLM Benchmark_
 188576 root      20   0  267.5g   2.4g  11444 S   5.3   0.1   1:29.02 vLLM Benchmark_
 188609 root      20   0  267.5g   2.4g  11444 S   5.3   0.1   1:28.79 vLLM Benchmark_
 188617 root      20   0  267.6g   2.4g  11444 S   5.3   0.1   1:28.78 vLLM Benchmark_
 188278 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:30.17 vLLM Benchmark_
 188283 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:30.36 vLLM Benchmark_
 188295 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:29.43 vLLM Benchmark_
 188303 root      20   0  267.5g   2.4g  11444 S   5.0   0.1   1:30.36 vLLM Benchmark_
 188404 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:29.99 vLLM Benchmark_
 188415 root      20   0  267.7g   2.4g  11444 S   5.0   0.1   1:29.92 vLLM Benchmark_
 188521 root      20   0  267.5g   2.4g  11444 S   5.0   0.1   1:29.93 vLLM Benchmark_
 188525 root      20   0  267.7g   2.4g  11444 S   5.0   0.1   1:29.23 vLLM Benchmark_
 188528 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:29.84 vLLM Benchmark_
 188582 root      20   0  267.5g   2.4g  11444 S   5.0   0.1   1:30.11 vLLM Benchmark_
 188585 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:30.25 vLLM Benchmark_
 188601 root      20   0  267.5g   2.4g  11444 S   5.0   0.1   1:30.71 vLLM Benchmark_
 188614 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:30.06 vLLM Benchmark_
 188515 root      20   0  267.5g   2.4g  11444 S   4.6   0.1   1:29.69 vLLM Benchmark_
 ...

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@jiminpark-moreh jiminpark-moreh changed the title Support multi-process benchmark [Feat] Support multi-process benchmark Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants