Update CT WNA16MarlinMoE integration by mgoin · Pull Request #16666 · vllm-project/vllm

mgoin · 2025-04-15T14:59:16Z

Updates to enable the kernel changes from #14447 and #16850 in CT

vllm serve RedHatAI/Qwen3-30B-A3B-quantized.w4a16 --port 9000
lm_eval --model local-completions --model_args model=RedHatAI/Qwen3-30B-A3B-quantized.w4a16,base_url=http://0.0.0.0:9000/v1/completions,num_concurrent=500,tokenized_requests=False --tasks gsm8k --num_fewshot 5
local-completions (model=RedHatAI/Qwen3-30B-A3B-quantized.w4a16,base_url=http://0.0.0.0:9000/v1/completions,num_concurrent=500,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8779|±  |0.0090|
|     |       |strict-match    |     5|exact_match|↑  |0.8802|±  |0.0089|

Signed-off-by: mgoin <mgoin64@gmail.com>

github-actions · 2025-04-15T14:59:27Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: mgoin <mgoin64@gmail.com>

tlrmchlsmth · 2025-05-08T20:32:59Z

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py

+                    raise ValueError(
+                        "WNA16MoE is not supported with actorder=group/dynamic."
+                    )


Are there cases where we raise a ValueError now where we didn't before?

Not really. The limitations of the marlin or triton kernels have not changed under the hood, however we are now actually checking these at a higher level. So this is just failing faster

tlrmchlsmth

Just one Q on the code that chooses between CompressedTensorsWNA16MoEMethod and CompressedTensorsWNA16MarlinMoEMethod. Otherwise LGTM

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

Signed-off-by: mgoin <mgoin64@gmail.com>

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

Update CT WNA16MarlinMoE integration

51d631c

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin requested review from robertgshaw2-redhat and tlrmchlsmth as code owners April 15, 2025 14:59

mgoin added 3 commits May 7, 2025 02:18

Merge branch 'main' into update-ct-marlin-moe

b26b2b0

Update with latest main

68f5124

Signed-off-by: mgoin <mgoin64@gmail.com>

Add log

931273c

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin added quantization ready ONLY add when PR is ready to merge/full CI is needed labels May 7, 2025

tlrmchlsmth reviewed May 8, 2025

View reviewed changes

tlrmchlsmth approved these changes May 9, 2025

View reviewed changes

tlrmchlsmth merged commit 22481fb into vllm-project:main May 9, 2025
77 checks passed

princepride pushed a commit to princepride/vllm that referenced this pull request May 10, 2025

Update CT WNA16MarlinMoE integration (vllm-project#16666)

734bac4

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

Update CT WNA16MarlinMoE integration (vllm-project#16666)

8f9bc1c

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025

Update CT WNA16MarlinMoE integration (vllm-project#16666)

57651c7

Signed-off-by: mgoin <mgoin64@gmail.com>

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

Update CT WNA16MarlinMoE integration (vllm-project#16666)

f74ab2c

Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update CT WNA16MarlinMoE integration#16666

Update CT WNA16MarlinMoE integration#16666
tlrmchlsmth merged 4 commits intovllm-project:mainfrom
neuralmagic:update-ct-marlin-moe

mgoin commented Apr 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

tlrmchlsmth May 8, 2025

Uh oh!

mgoin May 8, 2025

Uh oh!

tlrmchlsmth left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mgoin commented Apr 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

tlrmchlsmth May 8, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin May 8, 2025

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgoin commented Apr 15, 2025 •

edited by github-actions bot

Loading