Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,7 @@ th {
| `OLMoEForCausalLM` | OLMoE | `allenai/OLMoE-1B-7B-0924`, `allenai/OLMoE-1B-7B-0924-Instruct`, etc. | | ✅︎ |
| `OPTForCausalLM` | OPT, OPT-IML | `facebook/opt-66b`, `facebook/opt-iml-max-30b`, etc. | ✅︎ | ✅︎ |
| `OrionForCausalLM` | Orion | `OrionStarAI/Orion-14B-Base`, `OrionStarAI/Orion-14B-Chat`, etc. | | ✅︎ |
| `PanguUltraMoEForCausalLM` |openpangu-ultra-moe-718b-model | | ✅︎ | ✅︎ |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this model have a publicly accessible link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a publicly accessible version in https://ai.gitcode.com/ascend-tribe/openPangu-Ultra-MoE-718B-V1.1. However, it has not been upload to huggingface yet. The config file in this repo https://ai.gitcode.com/ascend-tribe/openPangu-Ultra-MoE-718B-V1.1 needs to be modified to align with the common practice in vllm. Therefore, I basically test the model in my local environments and it works well.

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model will be upload to Huggingface soon :)

| `PhiForCausalLM` | Phi | `microsoft/phi-1_5`, `microsoft/phi-2`, etc. | ✅︎ | ✅︎ |
| `Phi3ForCausalLM` | Phi-4, Phi-3 | `microsoft/Phi-4-mini-instruct`, `microsoft/Phi-4`, `microsoft/Phi-3-mini-4k-instruct`, `microsoft/Phi-3-mini-128k-instruct`, `microsoft/Phi-3-medium-128k-instruct`, etc. | ✅︎ | ✅︎ |
| `PhiMoEForCausalLM` | Phi-3.5-MoE | `microsoft/Phi-3.5-MoE-instruct`, etc. | ✅︎ | ✅︎ |
Expand Down
3 changes: 3 additions & 0 deletions vllm/config/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -1257,6 +1257,8 @@ def is_deepseek_mla(self) -> bool:
"deepseek_mtp",
"kimi_k2",
"longcat_flash",
"pangu_ultra_moe",
"pangu_ultra_moe_mtp",
):
return self.hf_text_config.kv_lora_rank is not None
elif self.hf_text_config.model_type == "eagle":
Expand Down Expand Up @@ -1405,6 +1407,7 @@ def get_layers_start_end_indices(
or self.hf_config.model_type == "glm4_moe_mtp"
or self.hf_config.model_type == "ernie_mtp"
or self.hf_config.model_type == "qwen3_next_mtp"
or self.hf_config.model_type == "pangu_ultra_moe_mtp"
):
total_num_hidden_layers = getattr(
self.hf_text_config, "num_nextn_predict_layers", 0
Expand Down
9 changes: 9 additions & 0 deletions vllm/config/speculative.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
"qwen3_next_mtp",
"mimo_mtp",
"longcat_flash_mtp",
"pangu_ultra_moe_mtp",
"mtp",
]
MTP_MODEL_TYPES = (
Expand All @@ -51,6 +52,7 @@
"ernie_mtp",
"qwen3_next_mtp",
"longcat_flash_mtp",
"pangu_ultra_moe_mtp",
)


Expand Down Expand Up @@ -168,6 +170,13 @@ def hf_config_override(hf_config: PretrainedConfig) -> PretrainedConfig:
hf_config.update(
{"n_predict": n_predict, "architectures": ["DeepSeekMTPModel"]}
)
if hf_config.model_type in ("pangu_ultra_moe"):
hf_config.model_type = "pangu_ultra_moe_mtp"
if hf_config.model_type == "pangu_ultra_moe_mtp":
n_predict = getattr(hf_config, "num_nextn_predict_layers", None)
hf_config.update(
{"n_predict": n_predict, "architectures": ["OpenPanguMTPModel"]}
)
Comment on lines +184 to +190
Copy link
Member

@hmellor hmellor Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this override be done in vllm/model_executor/models/openpangu.py and vllm/model_executor/models/openpangu_mtp.py instead of putting model specific config overrides in the global configs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see this is currently done for quite a few models... We should do this in a follow up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the common practice (like qwen3_next_mtp, longcat_flash_mtp) and place the override for mtp in vllm/config/speculative.py. I also thought about move the override to the modeling definition file, but it seems the initialization work flow for mtp makes it infeasible. Could you please provide some suggestions on the implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, please follow the existing pattern as you have already done. Refactoring MTP config is a separate task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Looking forward to it.


if hf_config.architectures[0] == "MiMoForCausalLM":
hf_config.model_type = "mimo_mtp"
Expand Down
Loading