Skip to content

Conversation

@wenbinc-Bin
Copy link

@wenbinc-Bin wenbinc-Bin commented Oct 29, 2025

This script dynamic quant qwen3 dense and qwen3 moe model.
example cmd:

python dynamic_quant_multimodal_for_gaudi2.py -i /data/Qwen3-VL-30B-A3B-Instruct \
       -o /data/Qwen3-VL-30B-A3B-Instruct-FP8-G2-Dynamic

example cmd:
python dynamic_quant_for_gaudi2.py -i /data/Qwen3-VL-30B-A3B-Instruct \
       -o /data/Qwen3-VL-30B-A3B-Instruct-FP8-G2-Dynamic

Signed-off-by: Chen, Wenbin <[email protected]>
@wenbinc-Bin
Copy link
Author

@czhu15 @Wei-Lin-Intel Please hep to review

Copy link

@Wei-Lin-Intel Wei-Lin-Intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to change the script name with MLLM or multimodal to indicate the usage, because we have visual exclusion here.

Signed-off-by: Chen, Wenbin <[email protected]>
@wenbinc-Bin
Copy link
Author

Suggest to change the script name with MLLM or multimodal to indicate the usage, because we have visual exclusion here.

I change script name to dynamic_quant_multimodal_for_gaudi2.py

Copy link
Contributor

@ranzhejiang ranzhejiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have changed to "quant_scheme": "channel"

@wenbinc-Bin
Copy link
Author

Have changed to "quant_scheme": "channel"
PR is updated, Thanks for the information.

Signed-off-by: Chen, Wenbin <[email protected]>
Copy link

@Wei-Lin-Intel Wei-Lin-Intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@czhu15 czhu15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM expect some comments to improve the description on this tool.


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Convert tensors to float8 format."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be good to add more detailed description on this tool. e.g. highlight this tool converts a normal bf16 checkpoint to fp8 format that can run on Gaudi2.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is updated, thanks.

quantization_config["activation_scheme"] = "dynamic"
quantization_config["fmt"] = "e4m3"
quantization_config["quant_method"] = "fp8"
quantization_config["quant_scheme"] = "channel"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only support channel wise?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if yes, pls also add it in this tool's description part.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is updated, thanks.

Signed-off-by: Chen, Wenbin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants