[RFC]: Roadmap to support the qwen-omni model in vllm-omni

### Motivation.

The Qwen-omni model represents a significant advancement in multimodal AI capabilities, combining text, image, and other modalities in a unified architecture. Currently, vLLM-omni supports basic AR (Autoregressive) and DiT (Diffusion Transformer) stages, but lacks support for the sophisticated multimodal capabilities that Qwen-omni offers. Supporting Qwen-omni would:

- Enable advanced multimodal reasoning and generation
- Provide a bridge between traditional text-based LLMs and vision-language models
- Demonstrate vLLM-omni's capability to handle complex multimodal architectures
- Attract users working on cutting-edge multimodal applications
- Establish vLLM-omni as a leading platform for multimodal model serving

### Proposed Change.

```
from vllm_omni.entripoints.omni_llm import OmniLLM

# Initialize Qwen-omni pipeline with huggingface transformers format
qwen_omni = OmniLLM(model="Qwen/Qwen2.5-Omni-7B")

# Prepare sampling parameters for each stage
sampling_params_list = [thinker_sampling_params,
                            talker_sampling_params,
                            code2wav_sampling_params]

# Prepare prompts as inputs
prompt = [make_omni_prompt(args, prompt) for prompt in args.prompts]

# Generate as consistent vLLM usage.
omni_outputs = omni_lm.generate(prompt, sampling_params_list)
```

Phase 1: Entrypoint classes and Model Stage management

- [x] Basic OmniLLM class for initializing model stages
- [x] Stage initizalization and configuration mechanism
- [x] Omni EngineArgs and model registration system
- [x] offline model inference pipeline

Phase 2: Core Processing Components
- [x] Basic input/output data structures
- [x] Basic request, input/output processors
- [x] omni schedulers for autoregressive models and DiT models respectively
- [x] omni worker and model runner for autoregressive models
- [x] omni worker and model runner for DiT models

Phase 3: Qwen2.5-omni model integration
- [x] Adaptation from transformers implementation of Qwen2.5-omni
- [x] stage config and model registration

Phase 4: Examples and Documentation
- [x] examples of end2end offline inference
- [x] Documentation for offline inference inference of Qwen2.5-omni
- [ ] Documentation for environment setup and contribution guide



### Feedback Period.

_No response_

### CC List.

@Gaohan123 @tzhouam @congw729 

### Any Other Things.

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://vllm-omni.readthedocs.io), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Roadmap to support the qwen-omni model in vllm-omni #10

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: Roadmap to support the qwen-omni model in vllm-omni #10

Description

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions