Skip to content

Conversation

@n1ck-guo
Copy link
Contributor

@n1ck-guo n1ck-guo commented Sep 1, 2025

Purpose

auto_round format add support for regex

Test Plan

Load auto_round quantized model with extra_config including regular expressions and full name of layers.

Test Result

With the change, each layer of linear that satisfies the regex in extra_config (for example, ".*mlp.down_proj": {"bits": 16}) can obtain the correct bits
Successful load mixed bits quantization model with auto-round quant_method

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: n1ck-guo <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for regular expressions in AutoRound's extra_config, which is a valuable feature for defining quantization settings for groups of layers. However, the current implementation has a critical correctness issue where literal layer names can be misinterpreted as regex patterns, potentially leading to incorrect quantization. My review provides a comment with a suggested code change to address this by using a heuristic to differentiate between literal names and regex patterns, which also improves performance.

n1ck-guo and others added 5 commits September 1, 2025 14:25
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Heng Guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
@n1ck-guo n1ck-guo changed the title auto_round format add support for regex [Feature][Quantization] auto_round format add support for regex Sep 2, 2025
@n1ck-guo
Copy link
Contributor Author

@mgoin @robertgshaw2-redhat @tlrmchlsmth @yewentao256 could you please help to review this pr.

Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you introduce more context about this PR?
Eg, which model is using it. Without this PR what it would be like, and with this PR, etc
Also, showing lm_eval result for accuracy and vllm bench for perf is helpful.

@n1ck-guo
Copy link
Contributor Author

@yewentao256 Currently, auto_round is implemented for mixed-precision quantization models by saving the full name of all models. In the future, with the support of this pr, we hope to add support for regularization. This pr primarily reads the regularization configuration. All the model quantized by auto_round will use this pr in future.

For example, if I use this script to generate a quantized qwen model:

from auto_round import AutoRound
model_path = "Qwen/Qwen3-15B-A2B-Base/"
layer_config = {
    "self_attn.[koqv]_proj$": {"bits": 8},
}
ar = AutoRound(model=model_path, scheme="W4A16", layer_config=layer_config, iters=1)
ar.quantize_and_save("Qwen3-15B-A2B-Base-vllm-regex-test")

This config.json will include a parameter to show that all non-expert linear will fallback to 16 bits. For old version, it should look like this:

"quantization_config": {
    "autoround_version": "0.8.0.dev",
    "bits": 4,
    "data_type": "int",
    "extra_config": {
      "model.layers.0.self_attn.k_proj": {
        "bits": 8
      },
      "model.layers.0.self_attn.o_proj": {
        "bits": 8
      },
      "model.layers.0.self_attn.q_proj": {
        "bits": 8
      },
      "model.layers.0.self_attn.v_proj": {
        "bits": 8
      },
      "model.layers.1.self_attn.k_proj": {
        "bits": 8
      },
      "model.layers.1.self_attn.o_proj": {
        "bits": 8
      },
      "model.layers.1.self_attn.q_proj": {
        "bits": 8
      },

And with the support of this pr, it can be simplified to

"quantization_config": {
    "autoround_version": "0.8.0.dev",
    "bits": 4,
    "data_type": "int",
    "extra_config": {
        "self_attn.[koqv]_proj$": {"bits": 8},
}

@n1ck-guo
Copy link
Contributor Author

This pr will not affect the accuracy of the model, this is our test result:
model is build by the script above.

task bf16 this pr main branch
average 0.6324 0.6305 0.6299
arc_challenge 0.4838 0.5026 0.5077
arc_easy 0.7866 0.7942 0.7963
boolq 0.8361 0.8287 0.8269
hellaswag 0.5888 0.5765 0.5772
lambada_openai 0.7402 0.7339 0.7320
mmlu 0.7347 0.7213 0.7200
openbookqa 0.3180 0.3140 0.3060
piqa 0.7894 0.7862 0.7840
truthfulqa_mc1 0.3745 0.3758 0.3733
wikitext 0.5871 0.6022 0.6023
winogrande 0.7174 0.7001 0.7032

Signed-off-by: n1ck-guo <[email protected]>
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one issue

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
@mgoin mgoin added quantization ready ONLY add when PR is ready to merge/full CI is needed labels Oct 14, 2025
@mgoin mgoin enabled auto-merge (squash) October 14, 2025 00:53
@mgoin mgoin merged commit 2935092 into vllm-project:main Oct 14, 2025
55 checks passed
1994 pushed a commit to 1994/vllm that referenced this pull request Oct 14, 2025
…-project#24024)

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: Heng Guo <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: 1994 <[email protected]>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
…-project#24024)

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: Heng Guo <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Dhruvil Bhatt <[email protected]>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…-project#24024)

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: Heng Guo <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: bbartels <[email protected]>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…-project#24024)

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: Heng Guo <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…-project#24024)

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: Heng Guo <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…-project#24024)

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: Heng Guo <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: xuebwang-amd <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…-project#24024)

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: Heng Guo <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: xuebwang-amd <[email protected]>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…-project#24024)

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: Heng Guo <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: 0xrushi <[email protected]>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…-project#24024)

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: Heng Guo <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: 0xrushi <[email protected]>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…-project#24024)

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: Heng Guo <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025
…-project#24024)

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: Heng Guo <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…-project#24024)

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: Heng Guo <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

quantization ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants