Skip to content

Conversation

@nazneenn
Copy link

@nazneenn nazneenn commented Sep 16, 2025

This PR fixes two issues: one for deepseek-distill and another on Llama4 model using calibrate_model.sh

[HS-6944] In the current v1.22.0 release and in the future branch, deepseek distill model fails because it incorrectly falls into the DeepSeek if condition and triggers expert parallelism, which results in the error: “Value error, Number of experts in the model must be greater than 0 when expert parallelism is enabled.”
The proposed fix should remove this dependency for deepSeek distill models and ensures that expert parallelism is enabled only for DeepSeek models.

[HS-6933] Details of llama4 FP8 are captured in this JIRA, with two significant changes as :
1- quant.json has "experts" in the allowlist
"whitelist": {
"types": [],
"names": ["experts"]
},
2-update calibrate_model.sh by including
EXTRA_FLAGS_STEP_2+="--expert-parallel "
EXTRA_FLAGS_STEP_4+="--expert-parallel "

./calibrate_model.sh -m meta-llama/Llama-4-Scout-17B-16E-Instruct -d NeelNanda/pile-10k -o inc -b 32 -t 4 -l 512

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant