[RFC]: Generalized Multimodal Testing Framework for vllm-omni

### Motivation.

Currently, the `vllm-omni` test suite primarily consists of model-specific end-to-end (E2E) offline inference tests . As Omni supports an increasing number of model architectures, the existing testing approach has revealed several limitations:

* **Lack of Automated Accuracy Comparison**: Existing tests only verify that the process is runnable, lacking automated numerical consistency checks against HuggingFace (HF) reference implementations.
* **Insufficient Cache Logic Coverage**: There is a lack of standardized "Cache Consistency Tests." In multimodal scenarios, placeholders (e.g., `<image_pad>`) are expanded into a large number of Vision/Audio tokens. If the expansion logic differs between "Cache ON" and "Cache OFF" states, it leads to unpredictable precision drift.
* **Poor Extensibility and High Maintenance Burden**: Adding new models requires writing redundant boilerplate code. There is no way to leverage declarative configurations to automatically obtain test coverage upon model registration.

Therefore, it is necessary to introduce a generalized multimodal comparison testing framework similar to the main `vllm` repository to improve the engineering quality and development efficiency of the Omni-mode inference engine.

This framework defines the testing strategy required to implement the CI system described in #400 
### Proposed Change.

This proposal suggests an engineered migration of `vllm`'s common testing logic to `vllm-omni`, with deep adaptations for Omni-mode.

### 2.1 Directory Structure

A three-layer architecture is recommended under `tests/models/multimodal/`:

```text
tests/models/multimodal/
├── vlm_utils/              # Core Utilities: Type definitions and automated parameter allocation
├── generation/             # Generation Alignment: Extracting Hidden States and comparing with HF
│   ├── test_common.py      # Common test entry point
│   └── runners/            # Execution engines supporting multi-stage outputs (VllmRunner, HfRunner)
└── processing/             # Preprocessing Tests: Validating input pipeline and cache consistency
    └── test_common.py      # Cache consistency validation logic
```
###  2.2 Key Components

* **Common Test Suites**: Migrate and adapt `test_common.py` to establish standard entry points for end-to-end accuracy comparison and cache consistency checks.
* **Multimodal Execution Engines (Runners)**: Refactor `VllmRunner` and `HfRunner` to focus on extracting **Hidden States** (latent representations) for comparison. By verifying numerical consistency of core representations, model alignment can be efficiently validated without full decoding into video/audio files.
* **Parametrization Engine**: Migrate `case_filtering.py` logic to automatically build comprehensive test matrices based on parameter spaces.
* **Configuration Registry**: Migrate the `VLM_TEST_SETTINGS` pattern to establish a standardized data input framework for models.
* **Process-level Resource Isolation**: Introduce the `@create_new_process_for_each_test` decorator to enforce GPU resource reclamation via sub-process lifecycle management, resolving memory fragmentation issues caused by running large models sequentially.

### Feedback Period.

_No response_

### CC List.

@hsliuustc0106 @princepride @congw729 

### Any Other Things.

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://vllm-omni.readthedocs.io), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Generalized Multimodal Testing Framework for vllm-omni #1117

Motivation.

Proposed Change.

2.1 Directory Structure

2.2 Key Components

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: Generalized Multimodal Testing Framework for vllm-omni #1117

Description

Motivation.

Proposed Change.

2.1 Directory Structure

2.2 Key Components

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions