-
Notifications
You must be signed in to change notification settings - Fork 451
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Motivation.
The Qwen-omni model represents a significant advancement in multimodal AI capabilities, combining text, image, and other modalities in a unified architecture. Currently, vLLM-omni supports basic AR (Autoregressive) and DiT (Diffusion Transformer) stages, but lacks support for the sophisticated multimodal capabilities that Qwen-omni offers. Supporting Qwen-omni would:
- Enable advanced multimodal reasoning and generation
- Provide a bridge between traditional text-based LLMs and vision-language models
- Demonstrate vLLM-omni's capability to handle complex multimodal architectures
- Attract users working on cutting-edge multimodal applications
- Establish vLLM-omni as a leading platform for multimodal model serving
Proposed Change.
from vllm_omni.entripoints.omni_llm import OmniLLM
# Initialize Qwen-omni pipeline with huggingface transformers format
qwen_omni = OmniLLM(model="Qwen/Qwen2.5-Omni-7B")
# Prepare sampling parameters for each stage
sampling_params_list = [thinker_sampling_params,
talker_sampling_params,
code2wav_sampling_params]
# Prepare prompts as inputs
prompt = [make_omni_prompt(args, prompt) for prompt in args.prompts]
# Generate as consistent vLLM usage.
omni_outputs = omni_lm.generate(prompt, sampling_params_list)
Phase 1: Entrypoint classes and Model Stage management
- Basic OmniLLM class for initializing model stages
- Stage initizalization and configuration mechanism
- Omni EngineArgs and model registration system
- offline model inference pipeline
Phase 2: Core Processing Components
- Basic input/output data structures
- Basic request, input/output processors
- omni schedulers for autoregressive models and DiT models respectively
- omni worker and model runner for autoregressive models
- omni worker and model runner for DiT models
Phase 3: Qwen2.5-omni model integration
- Adaptation from transformers implementation of Qwen2.5-omni
- stage config and model registration
Phase 4: Examples and Documentation
- examples of end2end offline inference
- Documentation for offline inference inference of Qwen2.5-omni
- Documentation for environment setup and contribution guide
Feedback Period.
No response
CC List.
Any Other Things.
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request