Add Multimodal Processor Benchmark #29105

reaganjlee · 2025-11-20T18:08:36Z

Resolves #24171.

Purpose

This PR adds in a benchmark for the multimodal processor. This tests for the MM processor on a single vllm instance on a single GPU.

Originally, I planned to additionally test for 1. DP on multiple GPUs and 2. multiple instances on a single GPU. On second thought, testing 1. DP seems equivalent to just running the benchmark multiple times. For 2. multiple instances on single GPU, I would like direction on which type of load balancing should be done on the #2 point. I could use a nginx round robin, or is the request router from the production_stack repo recommended? This also seems to be a good stopping place since the PR is just around 800 lines.

Test Plan

# Using just HuggingFaceTB/SmolVLM-Instruct to save on compute during development, and random-mm dataset for simplicity
vllm bench multimodal-processor     --model HuggingFaceTB/SmolVLM-Instruct     --dataset-name random-mm     --num-prompts 100     --max-concurrency 10

# concurrency
vllm bench multimodal-processor     --model HuggingFaceTB/SmolVLM-Instruct     --dataset-name random-mm     --num-prompts 100     --max-concurrency 50

Test Result

# Example result
# vllm bench multimodal-processor     --model HuggingFaceTB/SmolVLM-Instruct     --dataset-name random-mm     --num-prompts 100     --max-concurrency 10
================================================================================
Multimodal Processor Benchmark Results
================================================================================

MM Processor Timing (ms):
  hf_processor_time:
    Mean: 204.14
    Median: 201.04
    Std: 48.78
    P99.0: 319.08
  hashing_time:
    Mean: 1.58
    Median: 1.94
    Std: 1.25
    P99.0: 5.57
  cache_lookup_time:
    Mean: 0.16
    Median: 0.14
    Std: 0.10
    P99.0: 0.44
  prompt_update_time:
    Mean: 7.30
    Median: 7.02
    Std: 0.96
    P99.0: 8.80
  total_time:
    Mean: 232.79
    Median: 211.02
    Std: 208.11
    P99.0: 349.57

End-to-End Latency (ms):
  Mean: 2271.75
  Median: 2274.77
  Std: 656.52

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-11-20T18:08:46Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2025-11-20T18:09:11Z

Documentation preview: https://vllm--29105.org.readthedocs.build/en/29105/

change change change change change change change change change Signed-off-by: Reagan Lee <[email protected]>

Signed-off-by: Reagan Lee <[email protected]>

chatgpt-codex-connector

💡 Codex Review

https://github.com/vllm-project/vllm/blob/c4bab7adcbcd8c179e064ae9d8620426523c55d2/benchmarks/multimodal_processor.py#L335-L337
Clearing stats before collection wipes benchmark results

The multimodal benchmark clears the server’s MM processor stats registry right after the run and before fetching stats (should_clear_registry defaults to True). When running with VLLM_ENABLE_MM_PROCESSOR_STATS=1 and VLLM_SERVER_DEV_MODE=1 (required for the /clear_mm_processor_stats endpoint the client calls), this POST removes the just-recorded measurements so collect_mm_processor_stats immediately reads an empty registry and the benchmark prints zeros/warnings instead of the run’s timings. The registry should be cleared before starting a run to drop stale data, not immediately before retrieval.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Isotr0py

Thanks for implementing this feature! I leave some initial comments. PTAL!

Isotr0py · 2025-11-27T03:15:16Z

vllm/multimodal/processing.py

+@dataclass
+class MultiModalProcessorTimingStats:
+    """Per-request timing statistics for multimodal processor stages."""
+


Let's move the stats objects to a new file multimodal/stats.py.

I feel that making the stats container a global is a bit hacky. Perhaps it's better to have it as a member of InputProcessingContext.

Isotr0py · 2025-11-27T03:22:49Z

vllm/envs.py

+    "VLLM_ENABLE_MM_PROCESSOR_STATS": lambda: bool(
+        int(os.getenv("VLLM_ENABLE_MM_PROCESSOR_STATS", "0"))
+    ),


Hmmm, our environment variables are quite bloated. Perhaps we can move this to MultimodalConfig?

Isotr0py · 2025-11-27T03:31:28Z

vllm/benchmarks/multimodal_processor.py

+def _get_instance_urls(
+    args: argparse.Namespace | None,
+    base_url: str | None,
+) -> list[str]:
+    """
+    Get the instance URL for stats collection.
+    
+    Args:
+        args: Command-line arguments (may be None)
+        base_url: Base URL for the server (may be None)
+    
+    Returns:
+        List containing a single instance URL
+    """
+    if base_url:
+        return [base_url]
+
+    host = getattr(args, "host", "127.0.0.1") if args else "127.0.0.1"
+    port = getattr(args, "port", 8000) if args else 8000
+    return [f"http://{host}:{port}"]


IIRC, all bench commands except vllm bench serve are using offline inference?

I think we can extend mm processor benchmarks functionality for vllm bench serve with online serving, and leaving vllm bench multimodal-processor to use offline inference. 🤔

Isotr0py · 2025-11-27T03:36:37Z

vllm/benchmarks/multimodal_processor.py

+    if "mm_processor_stats" in result:
+        print("\nMM Processor Timing (ms):")
+        selected_percentiles = [
+            float(p) for p in getattr(args, "metric_percentiles", "99").split(",")
+        ]
+        for stage, metrics in result["mm_processor_stats"].items():
+            print(f"  {stage}:")
+            print(f"    Mean: {metrics['mean']:.2f}")
+            print(f"    Median: {metrics['median']:.2f}")
+            print(f"    Std: {metrics['std']:.2f}")
+            for p in selected_percentiles:
+                print(f"    P{p}: {metrics.get(f'p{p}', 0.0):.2f}")


I think we can format the output's display as a table using pandas's dataframe. Current output is a bit long with duplicate fields:

hf_processor_time: Mean: 204.14 Median: 201.04 Std: 48.78 P99.0: 319.08 hashing_time: Mean: 1.58 Median: 1.94 Std: 1.25 P99.0: 5.57 cache_lookup_time: Mean: 0.16 Median: 0.14 Std: 0.10 P99.0: 0.44 prompt_update_time: Mean: 7.30 Median: 7.02 Std: 0.96 P99.0: 8.80 total_time: Mean: 232.79 Median: 211.02 Std: 208.11 P99.0: 349.57

mergify bot added documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) performance Performance-related issues v1 labels Nov 20, 2025

reaganjlee changed the title ~~Add Multimodal Processor Benchmark~~ [MM processor]: Add Multimodal Processor Benchmark Nov 21, 2025

reaganjlee changed the title ~~[MM processor]: Add Multimodal Processor Benchmark~~ Add Multimodal Processor Benchmark Nov 26, 2025

reaganjlee force-pushed the mm-processor branch 3 times, most recently from 14fbc67 to ff81b6c Compare November 26, 2025 23:22

reaganjlee added 2 commits November 26, 2025 23:28

change

a2f7634

change change change change change change change change change Signed-off-by: Reagan Lee <[email protected]>

change

c4bab7a

Signed-off-by: Reagan Lee <[email protected]>

reaganjlee force-pushed the mm-processor branch from 81b2c51 to c4bab7a Compare November 26, 2025 23:29

reaganjlee marked this pull request as ready for review November 26, 2025 23:32

reaganjlee requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, hmellor, tjtanaa and ywang96 as code owners November 26, 2025 23:32

chatgpt-codex-connector bot reviewed Nov 26, 2025

View reviewed changes

change

fde1981

Isotr0py reviewed Nov 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Multimodal Processor Benchmark #29105

Add Multimodal Processor Benchmark #29105

reaganjlee commented Nov 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

mergify bot commented Nov 20, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Isotr0py left a comment

Uh oh!

Isotr0py Nov 27, 2025

Uh oh!

DarkLight1337 Nov 27, 2025

Uh oh!

Isotr0py Nov 27, 2025

Uh oh!

Isotr0py Nov 27, 2025

Uh oh!

Isotr0py Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add Multimodal Processor Benchmark #29105

Are you sure you want to change the base?

Add Multimodal Processor Benchmark #29105

Conversation

reaganjlee commented Nov 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

mergify bot commented Nov 20, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

reaganjlee commented Nov 20, 2025 •

edited by github-actions bot

Loading