Skip to content

[RFC]: Generalized Multimodal Testing Framework for vllm-omni #1117

@zzhuoxin1508

Description

@zzhuoxin1508

Motivation.

Currently, the vllm-omni test suite primarily consists of model-specific end-to-end (E2E) offline inference tests . As Omni supports an increasing number of model architectures, the existing testing approach has revealed several limitations:

  • Lack of Automated Accuracy Comparison: Existing tests only verify that the process is runnable, lacking automated numerical consistency checks against HuggingFace (HF) reference implementations.
  • Insufficient Cache Logic Coverage: There is a lack of standardized "Cache Consistency Tests." In multimodal scenarios, placeholders (e.g., <image_pad>) are expanded into a large number of Vision/Audio tokens. If the expansion logic differs between "Cache ON" and "Cache OFF" states, it leads to unpredictable precision drift.
  • Poor Extensibility and High Maintenance Burden: Adding new models requires writing redundant boilerplate code. There is no way to leverage declarative configurations to automatically obtain test coverage upon model registration.

Therefore, it is necessary to introduce a generalized multimodal comparison testing framework similar to the main vllm repository to improve the engineering quality and development efficiency of the Omni-mode inference engine.

This framework defines the testing strategy required to implement the CI system described in #400

Proposed Change.

This proposal suggests an engineered migration of vllm's common testing logic to vllm-omni, with deep adaptations for Omni-mode.

2.1 Directory Structure

A three-layer architecture is recommended under tests/models/multimodal/:

tests/models/multimodal/
├── vlm_utils/              # Core Utilities: Type definitions and automated parameter allocation
├── generation/             # Generation Alignment: Extracting Hidden States and comparing with HF
│   ├── test_common.py      # Common test entry point
│   └── runners/            # Execution engines supporting multi-stage outputs (VllmRunner, HfRunner)
└── processing/             # Preprocessing Tests: Validating input pipeline and cache consistency
    └── test_common.py      # Cache consistency validation logic

2.2 Key Components

  • Common Test Suites: Migrate and adapt test_common.py to establish standard entry points for end-to-end accuracy comparison and cache consistency checks.
  • Multimodal Execution Engines (Runners): Refactor VllmRunner and HfRunner to focus on extracting Hidden States (latent representations) for comparison. By verifying numerical consistency of core representations, model alignment can be efficiently validated without full decoding into video/audio files.
  • Parametrization Engine: Migrate case_filtering.py logic to automatically build comprehensive test matrices based on parameter spaces.
  • Configuration Registry: Migrate the VLM_TEST_SETTINGS pattern to establish a standardized data input framework for models.
  • Process-level Resource Isolation: Introduce the @create_new_process_for_each_test decorator to enforce GPU resource reclamation via sub-process lifecycle management, resolving memory fragmentation issues caused by running large models sequentially.

Feedback Period.

No response

CC List.

@hsliuustc0106 @princepride @congw729

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions