-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[Feature][Quantization] auto_round format add support for regex #24024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: n1ck-guo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for regular expressions in AutoRound's extra_config, which is a valuable feature for defining quantization settings for groups of layers. However, the current implementation has a critical correctness issue where literal layer names can be misinterpreted as regex patterns, potentially leading to incorrect quantization. My review provides a comment with a suggested code change to address this by using a heuristic to differentiate between literal names and regex patterns, which also improves performance.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Heng Guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
…to autoround_regex
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
|
@mgoin @robertgshaw2-redhat @tlrmchlsmth @yewentao256 could you please help to review this pr. |
yewentao256
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you introduce more context about this PR?
Eg, which model is using it. Without this PR what it would be like, and with this PR, etc
Also, showing lm_eval result for accuracy and vllm bench for perf is helpful.
|
@yewentao256 Currently, auto_round is implemented for mixed-precision quantization models by saving the full name of all models. In the future, with the support of this pr, we hope to add support for regularization. This pr primarily reads the regularization configuration. All the model quantized by auto_round will use this pr in future. For example, if I use this script to generate a quantized qwen model: from auto_round import AutoRound
model_path = "Qwen/Qwen3-15B-A2B-Base/"
layer_config = {
"self_attn.[koqv]_proj$": {"bits": 8},
}
ar = AutoRound(model=model_path, scheme="W4A16", layer_config=layer_config, iters=1)
ar.quantize_and_save("Qwen3-15B-A2B-Base-vllm-regex-test")This config.json will include a parameter to show that all non-expert linear will fallback to 16 bits. For old version, it should look like this: "quantization_config": {
"autoround_version": "0.8.0.dev",
"bits": 4,
"data_type": "int",
"extra_config": {
"model.layers.0.self_attn.k_proj": {
"bits": 8
},
"model.layers.0.self_attn.o_proj": {
"bits": 8
},
"model.layers.0.self_attn.q_proj": {
"bits": 8
},
"model.layers.0.self_attn.v_proj": {
"bits": 8
},
"model.layers.1.self_attn.k_proj": {
"bits": 8
},
"model.layers.1.self_attn.o_proj": {
"bits": 8
},
"model.layers.1.self_attn.q_proj": {
"bits": 8
},And with the support of this pr, it can be simplified to "quantization_config": {
"autoround_version": "0.8.0.dev",
"bits": 4,
"data_type": "int",
"extra_config": {
"self_attn.[koqv]_proj$": {"bits": 8},
} |
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
…to autoround_regex
|
This pr will not affect the accuracy of the model, this is our test result:
|
Signed-off-by: n1ck-guo <[email protected]>
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one issue
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
…-project#24024) Signed-off-by: n1ck-guo <[email protected]> Signed-off-by: Heng Guo <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: 1994 <[email protected]>
…-project#24024) Signed-off-by: n1ck-guo <[email protected]> Signed-off-by: Heng Guo <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Dhruvil Bhatt <[email protected]>
…-project#24024) Signed-off-by: n1ck-guo <[email protected]> Signed-off-by: Heng Guo <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: bbartels <[email protected]>
…-project#24024) Signed-off-by: n1ck-guo <[email protected]> Signed-off-by: Heng Guo <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…-project#24024) Signed-off-by: n1ck-guo <[email protected]> Signed-off-by: Heng Guo <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…-project#24024) Signed-off-by: n1ck-guo <[email protected]> Signed-off-by: Heng Guo <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: xuebwang-amd <[email protected]>
…-project#24024) Signed-off-by: n1ck-guo <[email protected]> Signed-off-by: Heng Guo <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: xuebwang-amd <[email protected]>
…-project#24024) Signed-off-by: n1ck-guo <[email protected]> Signed-off-by: Heng Guo <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: 0xrushi <[email protected]>
…-project#24024) Signed-off-by: n1ck-guo <[email protected]> Signed-off-by: Heng Guo <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: 0xrushi <[email protected]>
…-project#24024) Signed-off-by: n1ck-guo <[email protected]> Signed-off-by: Heng Guo <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…-project#24024) Signed-off-by: n1ck-guo <[email protected]> Signed-off-by: Heng Guo <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…-project#24024) Signed-off-by: n1ck-guo <[email protected]> Signed-off-by: Heng Guo <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Purpose
auto_round format add support for regex
Test Plan
Load auto_round quantized model with extra_config including regular expressions and full name of layers.
Test Result
With the change, each layer of linear that satisfies the regex in extra_config (for example, ".*mlp.down_proj": {"bits": 16}) can obtain the correct bits
Successful load mixed bits quantization model with auto-round quant_method
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.