-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[model] Add support for openPangu_Ultra_MoE #27521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
2aa3938
8cc5682
433a7ee
8ae429b
307b825
6191527
9804db4
5a959b6
5a690e4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,6 +42,7 @@ | |
| "qwen3_next_mtp", | ||
| "mimo_mtp", | ||
| "longcat_flash_mtp", | ||
| "pangu_ultra_moe_mtp", | ||
| "mtp", | ||
| ] | ||
| MTP_MODEL_TYPES = ( | ||
|
|
@@ -51,6 +52,7 @@ | |
| "ernie_mtp", | ||
| "qwen3_next_mtp", | ||
| "longcat_flash_mtp", | ||
| "pangu_ultra_moe_mtp", | ||
| ) | ||
|
|
||
|
|
||
|
|
@@ -168,6 +170,13 @@ def hf_config_override(hf_config: PretrainedConfig) -> PretrainedConfig: | |
| hf_config.update( | ||
| {"n_predict": n_predict, "architectures": ["DeepSeekMTPModel"]} | ||
| ) | ||
| if hf_config.model_type in ("pangu_ultra_moe"): | ||
| hf_config.model_type = "pangu_ultra_moe_mtp" | ||
| if hf_config.model_type == "pangu_ultra_moe_mtp": | ||
| n_predict = getattr(hf_config, "num_nextn_predict_layers", None) | ||
| hf_config.update( | ||
| {"n_predict": n_predict, "architectures": ["OpenPanguMTPModel"]} | ||
| ) | ||
|
Comment on lines
+184
to
+190
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could this override be done in
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm I see this is currently done for quite a few models... We should do this in a follow up
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I follow the common practice (like qwen3_next_mtp, longcat_flash_mtp) and place the override for mtp in
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For this PR, please follow the existing pattern as you have already done. Refactoring MTP config is a separate task.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool! Looking forward to it. |
||
|
|
||
| if hf_config.architectures[0] == "MiMoForCausalLM": | ||
| hf_config.model_type = "mimo_mtp" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this model have a publicly accessible link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a publicly accessible version in https://ai.gitcode.com/ascend-tribe/openPangu-Ultra-MoE-718B-V1.1. However, it has not been upload to huggingface yet. The config file in this repo https://ai.gitcode.com/ascend-tribe/openPangu-Ultra-MoE-718B-V1.1 needs to be modified to align with the common practice in vllm. Therefore, I basically test the model in my local environments and it works well.
This comment was marked as resolved.
Sorry, something went wrong.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model will be upload to Huggingface soon :)