[Hopper] optimize decoding performance for headdim 128 fp8 #96

jmkuebler · 2025-09-29T14:45:48Z

Similar to #91 this PR optimizes the FP8 tiles sizes for headdim 128. This applies for example to Qwen3-Coder-30B-A3B-Instruct where it reduces the context-length dependent inter-token latency by ~1.39x against BF16.
With the current mainline config, FP8 decoding is not faster at all than BF16 for Qwen3-Coder-30B-A3B-Instruct.

Signed-off-by: Jonas Kuebler <[email protected]>

jmkuebler · 2025-09-29T14:47:34Z

@LucasWilkinson could you please check

LucasWilkinson

LGTM

optimize decoding performance for headdim 128 fp8

9b9c5f8

Signed-off-by: Jonas Kuebler <[email protected]>

jmkuebler force-pushed the headdim128_fp8_optim branch from af207d5 to 9b9c5f8 Compare September 29, 2025 14:47

LucasWilkinson approved these changes Sep 29, 2025

View reviewed changes

LucasWilkinson merged commit 07602ad into vllm-project:main Sep 29, 2025
1 check passed

jmkuebler mentioned this pull request Oct 1, 2025

[FA/Chore] Bump vllm-flash-attention vllm-project/vllm#25537

Merged

LucasWilkinson mentioned this pull request Oct 2, 2025

[FA/Chore] Bump FA version vllm-project/vllm#26109

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Hopper] optimize decoding performance for headdim 128 fp8 #96

[Hopper] optimize decoding performance for headdim 128 fp8 #96

Uh oh!

jmkuebler commented Sep 29, 2025

Uh oh!

jmkuebler commented Sep 29, 2025

Uh oh!

LucasWilkinson left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Hopper] optimize decoding performance for headdim 128 fp8 #96

[Hopper] optimize decoding performance for headdim 128 fp8 #96

Uh oh!

Conversation

jmkuebler commented Sep 29, 2025

Uh oh!

jmkuebler commented Sep 29, 2025

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants