Skip to content

[RFC]: vLLM-Omni NPU 2026 Q1 Roadmap #886

@gcanlin

Description

@gcanlin

Background

We have completed the initial Ascend NPU enablement in vllm-omni v0.11.0rc1 and v0.12.0rc1, with support for most mainstream models such as Qwen3-Omni and the Qwen-Image series.

Building on this foundation, the next phase will focus on systematically expanding model coverage and prioritizing performance optimization efforts, with a clear roadmap to improve scalability, stability, and overall serving efficiency on Ascend NPU.

Version match

Currently, vLLM-Omni’s NPU support depends on vLLM-Ascend, the Ascend support plugin of vLLM. The AR (auto-regressive) path is jointly supported by vLLM and vLLM-Ascend.

Meanwhile, MindIE-SD serves as a standalone Ascend-optimized diffusion operator library. It is currently integrated through the FlashAttentionBackend and a set of CustomOp, delivering Ascend-native operators to improve the performance of diffusion models.

We're also building the separate plugin platform in vLLM-Omni to support scalable hardware better in the future.

vLLM vLLM-Ascend vLLM-Omni MindIE-SD(Optional) status
v0.11.0 v0.11.0rc2 v0.11.0rc1 NA released
v0.12.0 v0.12.0rc1 v0.12.0rc1 main released
v0.14.0 v0.14.0rc1 v0.14.0 main released
v0.15.0 v0.15.0rc1 v0.15.0rc1 main skipped
v0.16.0 e2175d9 v0.16.0 main released
v0.16.0 0.16.0rc1 v0.16.0 main pending

How to install MindIE-SD

Official Link: MindIE-SD

We are actively working to simplify the installation of mindie-sd. Eventually, it will be available via pip install mindie-sd. At the moment, however, some additional work is required.

git clone https://gitcode.com/Ascend/MindIE-SD.git && cd MindIE-SD
# Need to comment the line `source ${current_script_dir}/build_tik_ops.sh` in build/build_ops.sh
sed -i 's|^\(\s*\)source ${current_script_dir}/build_tik_ops.sh|\1# source ${current_script_dir}/build_tik_ops.sh|' build/build_ops.sh
python setup.py bdist_wheel
cd dist
pip install mindiesd-*.whl

Feature Support

Omni(AR+Generator) Pipeline

Diffusion Pipeline

Others(UX & Hardware Scalable)

Docs

Known Issues

Model Support List

Architecture Models Example HF Models NPU support
Qwen3OmniMoeForConditionalGeneration Qwen3-Omni Qwen/Qwen3-Omni-30B-A3B-Instruct
Qwen2_5OmniForConditionalGeneration Qwen2.5-Omni Qwen/Qwen2.5-Omni-7B, Qwen/Qwen2.5-Omni-3B
BagelForConditionalGeneration BAGEL (DiT-only) ByteDance-Seed/BAGEL-7B-MoT
QwenImagePipeline Qwen-Image Qwen/Qwen-Image
QwenImagePipeline Qwen-Image-2512 Qwen/Qwen-Image-2512
QwenImageEditPipeline Qwen-Image-Edit Qwen/Qwen-Image-Edit
QwenImageEditPlusPipeline Qwen-Image-Edit-2509 Qwen/Qwen-Image-Edit-2509
QwenImageLayeredPipeline Qwen-Image-Layered Qwen/Qwen-Image-Layered
ZImagePipeline Z-Image Tongyi-MAI/Z-Image-Turbo
WanPipeline Wan2.2-T2V, Wan2.2-TI2V Wan-AI/Wan2.2-T2V-A14B-Diffusers, Wan-AI/Wan2.2-TI2V-5B-Diffusers
WanImageToVideoPipeline Wan2.2-I2V Wan-AI/Wan2.2-I2V-A14B-Diffusers
OvisImagePipeline Ovis-Image OvisAI/Ovis-Image
LongcatImagePipeline LongCat-Image meituan-longcat/LongCat-Image
LongCatImageEditPipeline LongCat-Image-Edit meituan-longcat/LongCat-Image-Edit
StableDiffusion3Pipeline Stable-Diffusion-3 stabilityai/stable-diffusion-3.5-medium
Flux2KleinPipeline FLUX.2-klein black-forest-labs/FLUX.2-klein-4B, black-forest-labs/FLUX.2-klein-9B
StableAudioPipeline Stable-Audio-Open stabilityai/stable-audio-open-1.0
Qwen3TTSForConditionalGeneration Qwen3-TTS-12Hz-1.7B-CustomVoice Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
Qwen3TTSForConditionalGeneration Qwen3-TTS-12Hz-1.7B-VoiceDesign Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
Qwen3TTSForConditionalGeneration Qwen3-TTS-12Hz-1.7B-Base Qwen/Qwen3-TTS-12Hz-0.6B-Base

Feedback Period.

No response

CC List.

No response

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Sub-issues

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions