[CI Failure] Fix fp8 kv cache on <SM90 #25396

mgoin · 2025-09-22T15:26:05Z

Purpose

The "Quantization Test" has been broken for a few days due to FA2 being chosen for fp8 kv cache on SM80 and SM89 after this PR removed V0 fallback #25033

https://buildkite.com/vllm/ci/builds/31806/steps/canvas?jid=0199716c-38f4-4409-affd-a1c35d55bc0e

[2025-09-22T14:14:38Z] (EngineCore_DP0 pid=12418) NotImplementedError: FlashAttention does not support fp8 kv-cache on this device.

Test Plan

Test Result

pytest -s -v "tests/quantization/test_fp8.py::test_kv_cache_model_load_and_run[False-neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV]" works on L40s

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: mgoin <[email protected]>

gemini-code-assist

Code Review

This pull request aims to fix an issue where FlashAttention was incorrectly chosen for fp8 kv-cache on devices with compute capability less than 9.0, leading to a NotImplementedError. The change correctly modifies the attention backend selection logic to fall back to the Triton backend in this scenario. My main concern is that a pre-check in is_kv_cache_dtype_supported might still fail because it uses a simplified logic that doesn't account for this new fallback, potentially preventing the fix from being effective. I've left a specific comment detailing this critical issue. This should be addressed to ensure the fix is complete.

vllm/platforms/cuda.py

chatgpt-codex-connector

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

vllm/platforms/cuda.py

Signed-off-by: mgoin <[email protected]>

Signed-off-by: mgoin <[email protected]> Signed-off-by: charlifu <[email protected]>

Signed-off-by: mgoin <[email protected]> Signed-off-by: yewentao256 <[email protected]>

Signed-off-by: mgoin <[email protected]> Signed-off-by: gaojc <[email protected]>

Signed-off-by: mgoin <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: mgoin <[email protected]>

Signed-off-by: mgoin <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: mgoin <[email protected]>

Fix fp8 kv cache on <SM90

b7a1bf5

Signed-off-by: mgoin <[email protected]>

mgoin changed the title ~~Fix fp8 kv cache on <SM90~~ [CI Failure] Fix fp8 kv cache on <SM90 Sep 22, 2025

gemini-code-assist bot reviewed Sep 22, 2025

View reviewed changes

vllm/platforms/cuda.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Sep 22, 2025

View reviewed changes

vllm/platforms/cuda.py Show resolved Hide resolved

Fix precommit

c8e7ea4

Signed-off-by: mgoin <[email protected]>

mgoin added ready ONLY add when PR is ready to merge/full CI is needed ci-failure Issue about an unexpected test failure in CI labels Sep 22, 2025

github-project-automation bot added this to CI Failures Sep 22, 2025

zhuohan123 approved these changes Sep 22, 2025

View reviewed changes

simon-mo approved these changes Sep 22, 2025

View reviewed changes

simon-mo enabled auto-merge (squash) September 22, 2025 17:41

zhuohan123 mentioned this pull request Sep 22, 2025

[DP] support torchrun external launcher with Data Parallelism #24899

Merged

5 tasks

simon-mo merged commit 239ef0c into vllm-project:main Sep 22, 2025
55 of 56 checks passed

github-project-automation bot moved this to Done in CI Failures Sep 22, 2025

mgoin deleted the fix-fp8-kv-cache-sm80 branch September 22, 2025 18:36

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[CI Failure] Fix fp8 kv cache on <SM90 (vllm-project#25396)

aa81d8d

Signed-off-by: mgoin <[email protected]>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

[CI Failure] Fix fp8 kv cache on <SM90 (vllm-project#25396)

4902a1e

Signed-off-by: mgoin <[email protected]> Signed-off-by: charlifu <[email protected]>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[CI Failure] Fix fp8 kv cache on <SM90 (#25396)

7337ec6

Signed-off-by: mgoin <[email protected]> Signed-off-by: yewentao256 <[email protected]>

gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025

[CI Failure] Fix fp8 kv cache on <SM90 (vllm-project#25396)

dd9adce

Signed-off-by: mgoin <[email protected]> Signed-off-by: gaojc <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[CI Failure] Fix fp8 kv cache on <SM90 (vllm-project#25396)

8bd73f8

Signed-off-by: mgoin <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[CI Failure] Fix fp8 kv cache on <SM90 (vllm-project#25396)

c4d5755

Signed-off-by: mgoin <[email protected]>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[CI Failure] Fix fp8 kv cache on <SM90 (vllm-project#25396)

0f40507

Signed-off-by: mgoin <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[CI Failure] Fix fp8 kv cache on <SM90 (vllm-project#25396)

3811a4f

Signed-off-by: mgoin <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[CI Failure] Fix fp8 kv cache on <SM90 (vllm-project#25396)

952b3dc

Signed-off-by: mgoin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI Failure] Fix fp8 kv cache on <SM90 #25396

[CI Failure] Fix fp8 kv cache on <SM90 #25396

Uh oh!

mgoin commented Sep 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[CI Failure] Fix fp8 kv cache on <SM90 #25396

[CI Failure] Fix fp8 kv cache on <SM90 #25396

Uh oh!

Conversation

mgoin commented Sep 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mgoin commented Sep 22, 2025 •

edited by github-actions bot

Loading