Skip to content

[RFC]: Support Ming-Flash-Omni in vLLM-Omni #692

@raindaywhu

Description

@raindaywhu

Motivation.

Ming-flash-omni Preview, an upgraded version of Ming-Omni, built upon a sparser Mixture-of-Experts (MoE) variant of Ling-Flash-2.0. Ming-flash-omni Preview shows competitive performance in vision-text understanding, image generation, audio understanding and text-to-speech capabilitiestechinical report.
The primary objective of this proposal is to adapt this model to the vllm-omni framework.

Proposed Change.

The implementation follows a three-phase roadmap:
Phase 1 : This phase focuses on adapting Ming-flash-omni runing on Ascend with current vllm adaptor.
Phase 2: This phase integrates Ming-flash-omni enabling the full multi-stage pipeline. Thinker for multi-modal data encoder, Talker for llm, Show for VisionDecoder or AudioDecoder.
Phase 3: This phase performance tuning to maximize NPU throughput.

Feedback Period.

No response

CC List.

No response

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

NPUPR related to Ascend NPUnew modeladd new model

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions