Motivation.
Ming-flash-omni Preview, an upgraded version of Ming-Omni, built upon a sparser Mixture-of-Experts (MoE) variant of Ling-Flash-2.0. Ming-flash-omni Preview shows competitive performance in vision-text understanding, image generation, audio understanding and text-to-speech capabilitiestechinical report.
The primary objective of this proposal is to adapt this model to the vllm-omni framework.
Proposed Change.
The implementation follows a three-phase roadmap:
Phase 1 : This phase focuses on adapting Ming-flash-omni runing on Ascend with current vllm adaptor.
Phase 2: This phase integrates Ming-flash-omni enabling the full multi-stage pipeline. Thinker for multi-modal data encoder, Talker for llm, Show for VisionDecoder or AudioDecoder.
Phase 3: This phase performance tuning to maximize NPU throughput.
Feedback Period.
No response
CC List.
No response
Any Other Things.
No response
Before submitting a new issue...