model: add Qwen3-Omni Thinker support (qwen3omnimoe)#18420
model: add Qwen3-Omni Thinker support (qwen3omnimoe)#18420TrevorS wants to merge 2 commits intoggml-org:masterfrom
Conversation
|
Nice job. I think deepcopy might not be needed since you're not modifying anything nested. |
Add support for Qwen3-Omni Thinker, a 48-layer MoE model with 128 experts (8 active per token) and optional shared expert. This enables text-only inference as the foundation for full multimodal support. Key changes: - New architecture: LLM_ARCH_QWEN3OMNIMOE - GGUF conversion with nested thinker_config handling - IMRoPE (Interleaved M-RoPE) with sections [24, 20, 20, 0] - Shared expert support in qwen3vl-moe graph builder - Reuses llm_build_qwen3vlmoe for graph construction
Address review feedback: - Rename class to Qwen3OmniMoeModel, inherit from Qwen2MoeModel - Remove __init__ override (thinker_config handled at L720-722) - Remove set_gguf_parameters (mrope_section via rope_scaling) Keep set_vocab for EOS/PAD: Qwen3-Omni lacks tokenizer.json (uses vocab.json + merges.txt), so SpecialVocab can't discover token IDs automatically.
5969085 to
d4ee36e
Compare
| # Qwen3-Omni lacks tokenizer.json, so token IDs must be set explicitly | ||
| self.gguf_writer.add_eos_token_id(151645) # <|im_end|> - required for generation | ||
| self.gguf_writer.add_pad_token_id(151643) # <|endoftext|> - required for batching |
There was a problem hiding this comment.
The comment is incorrect, it's because they for some reason are explicitly set to null in config.json.
| layer.ffn_up_exps = create_tensor(tn(LLM_TENSOR_FFN_UP_EXPS, "weight", i), { n_embd, n_ff_exp, n_expert}, 0); | ||
| } | ||
| } break; | ||
| case LLM_ARCH_QWEN3OMNIMOE: |
There was a problem hiding this comment.
Since this is only Qwen3VLMoe with shared experts added and you are adding shared experts support to qwen3vl-moe.cpp I suggest you do the same here instead of duplicating code.
|
If I understand correctly, qwen3 omni is just qwen3vl with whisper encoder for audio. There is no need to introduce this much changes. The conversation script can simply mark this info. Beside, I don't feel comfortable using AI for anything related to mtmd, it generates too much redundant and overkill code. I will replace this PR with another approach which is much simpler |
|
appreciate the feedback, thanks! |
Hello @ngxson, I'm back! How does this look for the first PR? I'm open to any feedback.
Original Model: https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct
GGUFs: https://huggingface.co/TrevorJS/Qwen3-Omni-30B-A3B-GGUF
This PR implements the
thinkermodel only, providing justtext -> text.thinker-f16 on dgx-spark:AI Disclosure
AI was used to write this code, but it was then reviewed, tested, and benchmarked by a human!