[FLA] Introduce Kimi Delta Attention(KDA) to VLLM #27654

zhiyuan1i · 2025-10-28T13:18:39Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: lizhiyuan <[email protected]>

gemini-code-assist

Code Review

This pull request introduces Kimi Delta Attention (KDA) into vLLM by adding new kernels and modifying existing ones. The changes are extensive and add a significant new feature. I have identified a couple of critical bugs related to incorrect tensor shapes and memory strides in the Triton kernels, which could lead to incorrect outputs. Additionally, there's a performance-related issue in the autotuning configuration of one of the kernels. Addressing these points will be crucial for the correctness and efficiency of the new implementation.

vllm/model_executor/layers/fla/ops/chunk_delta_h.py

gemini-code-assist · 2025-10-28T13:21:17Z

vllm/model_executor/layers/fla/ops/kda.py

+    num_stages = 3
+    num_warps = 1
+
+    o = torch.empty_like(k)


The output tensor o is being allocated with the shape of the key tensor k (torch.empty_like(k)). However, the output of an attention operation should have the shape of the value tensor v. The shape of k is [B, T, H, K] while v is [B, T, HV, V], which can be different. This will lead to a shape mismatch and incorrect output. Please allocate o with the shape of v.

Suggested change

o = torch.empty_like(k)

o = torch.empty_like(v)

In KDA models, q, k, v, and o share the same shape, so it's safe to use empty_like(k).

vllm/model_executor/layers/fla/ops/chunk_delta_h.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/layers/fla/ops/kda.py

vllm/model_executor/layers/fla/ops/chunk_delta_h.py

Signed-off-by: lizhiyuan <[email protected]>

youkaichao

looking forward to the new model 👍

Signed-off-by: lizhiyuan <[email protected]> Signed-off-by: Bhagyashri <[email protected]>

Signed-off-by: lizhiyuan <[email protected]>

### What this PR does / why we need it? adapt vllm-ascend main branch with vllm releases/v0.11.1 fix `forward context not set` in test_vlm.py caused by: vllm-project/vllm#23207 fix import `cdiv round` failed caused by: vllm-project/vllm#27188 fix import `init_cached_hf_modules` failed caused by: vllm-project/vllm#27567 adapt triton kernel `fused_recurrent_gated_delta_rule_fwd_kernel` caused by: vllm-project/vllm#27654 - remove unused code in sigmoid_gating.py - `class FusedRecurrentFunction` , `fused_recurrent_gated_delta_rule`, `fused_recurrent_gated_delta_rule_fwd` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: 22dimensions <[email protected]>

### What this PR does / why we need it? adapt vllm-ascend main branch with vllm releases/v0.11.1 fix `forward context not set` in test_vlm.py caused by: vllm-project/vllm#23207 fix import `cdiv round` failed caused by: vllm-project/vllm#27188 fix import `init_cached_hf_modules` failed caused by: vllm-project/vllm#27567 adapt triton kernel `fused_recurrent_gated_delta_rule_fwd_kernel` caused by: vllm-project/vllm#27654 - remove unused code in sigmoid_gating.py - `class FusedRecurrentFunction` , `fused_recurrent_gated_delta_rule`, `fused_recurrent_gated_delta_rule_fwd` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: 22dimensions <[email protected]> Signed-off-by: luolun <[email protected]>

### What this PR does / why we need it? adapt vllm-ascend main branch with vllm releases/v0.11.1 fix `forward context not set` in test_vlm.py caused by: vllm-project/vllm#23207 fix import `cdiv round` failed caused by: vllm-project/vllm#27188 fix import `init_cached_hf_modules` failed caused by: vllm-project/vllm#27567 adapt triton kernel `fused_recurrent_gated_delta_rule_fwd_kernel` caused by: vllm-project/vllm#27654 - remove unused code in sigmoid_gating.py - `class FusedRecurrentFunction` , `fused_recurrent_gated_delta_rule`, `fused_recurrent_gated_delta_rule_fwd` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: 22dimensions <[email protected]> Signed-off-by: hwhaokun <[email protected]>

### What this PR does / why we need it? adapt vllm-ascend main branch with vllm releases/v0.11.1 fix `forward context not set` in test_vlm.py caused by: vllm-project/vllm#23207 fix import `cdiv round` failed caused by: vllm-project/vllm#27188 fix import `init_cached_hf_modules` failed caused by: vllm-project/vllm#27567 adapt triton kernel `fused_recurrent_gated_delta_rule_fwd_kernel` caused by: vllm-project/vllm#27654 - remove unused code in sigmoid_gating.py - `class FusedRecurrentFunction` , `fused_recurrent_gated_delta_rule`, `fused_recurrent_gated_delta_rule_fwd` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: 22dimensions <[email protected]> Signed-off-by: nsdie <[email protected]>

Signed-off-by: lizhiyuan <[email protected]>

[FLA] Introduce Kimi Delta Attention(KDA) to VLLM

7187555

Signed-off-by: lizhiyuan <[email protected]>

gemini-code-assist bot reviewed Oct 28, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 28, 2025

View reviewed changes

vllm/model_executor/layers/fla/ops/kda.py Show resolved Hide resolved

vllm/model_executor/layers/fla/ops/chunk_delta_h.py Show resolved Hide resolved

fix

f48a1cd

Signed-off-by: lizhiyuan <[email protected]>

youkaichao approved these changes Oct 28, 2025

View reviewed changes

youkaichao merged commit e88bdd6 into vllm-project:main Oct 28, 2025
5 checks passed

bhagyashrigai pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Oct 29, 2025

[FLA] Introduce Kimi Delta Attention(KDA) to VLLM (vllm-project#27654)

0598162

Signed-off-by: lizhiyuan <[email protected]> Signed-off-by: Bhagyashri <[email protected]>

zhiyuan1i mentioned this pull request Oct 30, 2025

[Model] Introduce Kimi Linear to vLLM #27809

Merged

5 tasks

ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025

[FLA] Introduce Kimi Delta Attention(KDA) to VLLM (vllm-project#27654)

4936c2c

Signed-off-by: lizhiyuan <[email protected]>

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

[FLA] Introduce Kimi Delta Attention(KDA) to VLLM (vllm-project#27654)

1c2a8f6

Signed-off-by: lizhiyuan <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[FLA] Introduce Kimi Delta Attention(KDA) to VLLM (vllm-project#27654)

c903f39

Signed-off-by: lizhiyuan <[email protected]>

22dimensions mentioned this pull request Nov 12, 2025

Upgrade to 0.11.1 newest vllm commit vllm-project/vllm-ascend#3982

Merged

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[FLA] Introduce Kimi Delta Attention(KDA) to VLLM (vllm-project#27654)

1a34f34

Signed-off-by: lizhiyuan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FLA] Introduce Kimi Delta Attention(KDA) to VLLM #27654

[FLA] Introduce Kimi Delta Attention(KDA) to VLLM #27654

Uh oh!

zhiyuan1i commented Oct 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Oct 28, 2025

Uh oh!

zhiyuan1i Oct 28, 2025

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

youkaichao left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[FLA] Introduce Kimi Delta Attention(KDA) to VLLM #27654

[FLA] Introduce Kimi Delta Attention(KDA) to VLLM #27654

Uh oh!

Conversation

zhiyuan1i commented Oct 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

zhiyuan1i Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhiyuan1i commented Oct 28, 2025 •

edited by github-actions bot

Loading