Skip to content

[RFC]: Roadmap to support the qwen-omni model in vllm-omni #10

@hsliuustc0106

Description

@hsliuustc0106

Motivation.

The Qwen-omni model represents a significant advancement in multimodal AI capabilities, combining text, image, and other modalities in a unified architecture. Currently, vLLM-omni supports basic AR (Autoregressive) and DiT (Diffusion Transformer) stages, but lacks support for the sophisticated multimodal capabilities that Qwen-omni offers. Supporting Qwen-omni would:

  • Enable advanced multimodal reasoning and generation
  • Provide a bridge between traditional text-based LLMs and vision-language models
  • Demonstrate vLLM-omni's capability to handle complex multimodal architectures
  • Attract users working on cutting-edge multimodal applications
  • Establish vLLM-omni as a leading platform for multimodal model serving

Proposed Change.

from vllm_omni.entripoints.omni_llm import OmniLLM

# Initialize Qwen-omni pipeline with huggingface transformers format
qwen_omni = OmniLLM(model="Qwen/Qwen2.5-Omni-7B")

# Prepare sampling parameters for each stage
sampling_params_list = [thinker_sampling_params,
                            talker_sampling_params,
                            code2wav_sampling_params]

# Prepare prompts as inputs
prompt = [make_omni_prompt(args, prompt) for prompt in args.prompts]

# Generate as consistent vLLM usage.
omni_outputs = omni_lm.generate(prompt, sampling_params_list)

Phase 1: Entrypoint classes and Model Stage management

  • Basic OmniLLM class for initializing model stages
  • Stage initizalization and configuration mechanism
  • Omni EngineArgs and model registration system
  • offline model inference pipeline

Phase 2: Core Processing Components

  • Basic input/output data structures
  • Basic request, input/output processors
  • omni schedulers for autoregressive models and DiT models respectively
  • omni worker and model runner for autoregressive models
  • omni worker and model runner for DiT models

Phase 3: Qwen2.5-omni model integration

  • Adaptation from transformers implementation of Qwen2.5-omni
  • stage config and model registration

Phase 4: Examples and Documentation

  • examples of end2end offline inference
  • Documentation for offline inference inference of Qwen2.5-omni
  • Documentation for environment setup and contribution guide

Feedback Period.

No response

CC List.

@Gaohan123 @tzhouam @congw729

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions