[Feat] Support multi-process benchmark #2

hnts03-moreh · 2025-12-08T04:34:40Z

Purpose

Support for Multi-Process Benchmark Operation

Benchmarks typically run as a single process and thread. During benchmark execution, each request's output is received and the time (ITL) is calculated. However, when multiple requests are running simultaneously, a bottleneck occurs in output processing, preventing proper ITL calculations.

We added the --max-connections-per-worker option to the benchmark. Based on this value, multiple processes are created to run the benchmark.

Each process processes requests equal to num_requests / process_cnt, and each process also calculates the ITL. The metrics processed by each process are then aggregated to produce the final benchmark result (Serving Benchmark Result).

Test Plan

vllm bench serve \
  --backend vllm \
  --model "deepseek-ai/DeepSeek-R1" \
  --metric-percentiles "90" \
  --percentile-metrics "itl,tps,ttft,e2el" \
  --host "mif-istio.cluster.svc.cluster.local" \
  --port 80 \
  --num-prompts 32400 \
  --max-concurrency 10800 \
  --request-rate 78 \
  --ignore-eos \
  --ready-check-timeout-sec 0 \
  --max-connections-per-worker 1296 \
  --dataset-name sharegpt \
  --dataset-path /app/dataset/ShareGPT_V3_unfiltered_cleaned_split.json \
  --sharegpt-input-len 1000 \
  --sharegpt-output-len 1000

Test Result

Benchmark result contains Number of worker processes

============ Serving Benchmark Result ============
Number of worker processes:              25
Successful requests:                     32400
Maximum request concurrency:             10800
Request rate configured (RPS):           78.00
Benchmark duration (s):                  651.09
Total input tokens:                      32400000
Total generated tokens:                  32400000
Request throughput (req/s):              49.76
Output token throughput (tok/s):         49762.71
Peak output token throughput (tok/s):    66969.00
Peak concurrent requests:                11028.00
Total Token throughput (tok/s):          99525.41
---------------Time to First Token----------------
Mean TTFT (ms):                          2931.03
Median TTFT (ms):                        2162.61
P90 TTFT (ms):                           5907.13
---------------Inter-token Latency----------------
Mean ITL (ms):                           170.31
Median ITL (ms):                         166.72
P90 ITL (ms):                            213.08
----------------End-to-end Latency----------------
Mean E2EL (ms):                          173070.05
Median E2EL (ms):                        176897.12
P90 E2EL (ms):                           180849.78
==================================================

Run $ top command to check multiprocess benchmarks

top - 14:20:50 up 7 days, 19:57, 27 users,  load average: 34.16, 31.91, 29.55
Tasks: 1495 total,   9 running, 1482 sleeping,   3 stopped,   1 zombie
%Cpu(s): 11.1 us, 15.6 sy,  0.0 ni, 72.5 id,  0.0 wa,  0.0 hi,  0.8 si,  0.0 st
MiB Mem : 2321988.+total,  60780.8 free,  97886.2 used, 2163321.+buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 2211563.+avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 188548 root      20   0  267.6g   2.4g  11444 S   5.6   0.1   1:29.51 vLLM Benchmark_
 188593 root      20   0  267.6g   2.4g  11444 S   5.6   0.1   1:30.11 vLLM Benchmark_
 188342 root      20   0  267.6g   2.4g  11444 S   5.3   0.1   1:29.94 vLLM Benchmark_
 188412 root      20   0  267.6g   2.4g  11444 S   5.3   0.1   1:30.06 vLLM Benchmark_
 188513 root      20   0  267.6g   2.4g  11444 S   5.3   0.1   1:30.40 vLLM Benchmark_
 188514 root      20   0  267.7g   2.4g  11444 S   5.3   0.1   1:29.46 vLLM Benchmark_
 188531 root      20   0  267.5g   2.4g  11444 S   5.3   0.1   1:29.09 vLLM Benchmark_
 188576 root      20   0  267.5g   2.4g  11444 S   5.3   0.1   1:29.02 vLLM Benchmark_
 188609 root      20   0  267.5g   2.4g  11444 S   5.3   0.1   1:28.79 vLLM Benchmark_
 188617 root      20   0  267.6g   2.4g  11444 S   5.3   0.1   1:28.78 vLLM Benchmark_
 188278 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:30.17 vLLM Benchmark_
 188283 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:30.36 vLLM Benchmark_
 188295 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:29.43 vLLM Benchmark_
 188303 root      20   0  267.5g   2.4g  11444 S   5.0   0.1   1:30.36 vLLM Benchmark_
 188404 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:29.99 vLLM Benchmark_
 188415 root      20   0  267.7g   2.4g  11444 S   5.0   0.1   1:29.92 vLLM Benchmark_
 188521 root      20   0  267.5g   2.4g  11444 S   5.0   0.1   1:29.93 vLLM Benchmark_
 188525 root      20   0  267.7g   2.4g  11444 S   5.0   0.1   1:29.23 vLLM Benchmark_
 188528 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:29.84 vLLM Benchmark_
 188582 root      20   0  267.5g   2.4g  11444 S   5.0   0.1   1:30.11 vLLM Benchmark_
 188585 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:30.25 vLLM Benchmark_
 188601 root      20   0  267.5g   2.4g  11444 S   5.0   0.1   1:30.71 vLLM Benchmark_
 188614 root      20   0  267.6g   2.4g  11444 S   5.0   0.1   1:30.06 vLLM Benchmark_
 188515 root      20   0  267.5g   2.4g  11444 S   4.6   0.1   1:29.69 vLLM Benchmark_
 ...

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Support multi-process benchmark

2b8e763

gitgod-bot assigned hnts03-moreh Dec 8, 2025

hnts03-moreh requested a review from jiminpark-moreh December 8, 2025 04:34

jiminpark-moreh approved these changes Dec 8, 2025

View reviewed changes

jiminpark-moreh changed the title ~~Support multi-process benchmark~~ [Feat] Support multi-process benchmark Dec 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] Support multi-process benchmark #2

[Feat] Support multi-process benchmark #2

Uh oh!

hnts03-moreh commented Dec 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Feat] Support multi-process benchmark #2

Are you sure you want to change the base?

[Feat] Support multi-process benchmark #2

Uh oh!

Conversation

hnts03-moreh commented Dec 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hnts03-moreh commented Dec 8, 2025 •

edited by github-actions bot

Loading