-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next #27492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9aaf36c to
aa947da
Compare
c3863df to
15b457c
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
15b457c to
fccb4d0
Compare
08dcd1b to
2b9022e
Compare
|
@mgoin @pavanimajety may you help review the PR? |
|
If this PR is merged, can vllm still run with older flashinfer? We are internally just upgrading to flashinfer nightly-v0.4.1-20251027. This seems to bump flashinfer version again. Is it possible to consider some backward compatibility with older flashinfer version? |
Hi @mxz297 , |
|
@alexm-redhat / @mgoin Could you please review? Thanks! |
pavanimajety
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the PR
|
There is already 1 PR update FI version:#27952 |
5e99086 to
1b3d32c
Compare
Rebased. Ready to merge 😄 |
1b3d32c to
09dd654
Compare
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
09dd654 to
ec5ba87
Compare
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to get in now. We should make an issue to use RoutingMethod more broadly
…xt (vllm-project#27492) Signed-off-by: jiahanc <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
…xt (vllm-project#27492) Signed-off-by: jiahanc <[email protected]>

Purpose
Test Plan
Qwen3-Next-80B-A3B-Instruct-FP8 on 2xB200 TP2
Qwen3-30B-A3B-Instruct-2507-FP8 on 2xB200 TP2
Test Result
Qwen3-Next-80B-A3B-Instruct-FP8
Qwen/Qwen3-30B-A3B-Instruct-2507-FP8
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.