Llama-4 FP8 quantization with expert parallel support #364

nazneenn · 2025-09-16T07:23:06Z

This PR fixes two issues: one for deepseek-distill and another on Llama4 model using calibrate_model.sh

[HS-6944] In the current v1.22.0 release and in the future branch, deepseek distill model fails because it incorrectly falls into the DeepSeek if condition and triggers expert parallelism, which results in the error: “Value error, Number of experts in the model must be greater than 0 when expert parallelism is enabled.”
The proposed fix should remove this dependency for deepSeek distill models and ensures that expert parallelism is enabled only for DeepSeek models.

[HS-6933] Details of llama4 FP8 are captured in this JIRA, with two significant changes as :
1- quant.json has "experts" in the allowlist
"whitelist": {
"types": [],
"names": ["experts"]
},
2-update calibrate_model.sh by including
EXTRA_FLAGS_STEP_2+="--expert-parallel "
EXTRA_FLAGS_STEP_4+="--expert-parallel "

./calibrate_model.sh -m meta-llama/Llama-4-Scout-17B-16E-Instruct -d NeelNanda/pile-10k -o inc -b 32 -t 4 -l 512

root and others added 3 commits September 10, 2025 07:55

deepseek-distill changes

f04f5c2

deepseek-distill changes

92e494f

llama4_fp8

6da27b2

nazneenn requested review from afierka-intel, jikunshang, kzawora-intel, madamczyk-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, tzielinski-habana and xuechendi as code owners September 16, 2025 07:23

nazneenn marked this pull request as draft October 14, 2025 11:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama-4 FP8 quantization with expert parallel support #364

Llama-4 FP8 quantization with expert parallel support #364

Uh oh!

nazneenn commented Sep 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Llama-4 FP8 quantization with expert parallel support #364

Are you sure you want to change the base?

Llama-4 FP8 quantization with expert parallel support #364

Uh oh!

Conversation

nazneenn commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nazneenn commented Sep 16, 2025 •

edited

Loading