[Misc] Refactor Attention kv transfer methods into decorator #27816

NickLucche · 2025-10-30T15:26:44Z

Small quality of life improvement by removing some of the kv transfer-specifc code in layer.py and refactoring that into a decorator. I believe the on entry-on exit pattern (wait_read-wait_write) here is very suitable for that.
The result is simply that there's less non-attention related code in the file. Behavior should be unchanged.

Also, I found that after grouping some common boilerplate code for both maybe_save_kv_layer_to_connector and wait_for_kv_layer_from_connector, I think there's too little left to justify a separate function for both, hence I ended-up inlining both connector method calls.

cc @ApostaC who wrote the initial connector code

gemini-code-assist

Code Review

This pull request refactors the KV cache transfer logic into a decorator, which is a great improvement for code clarity and maintainability in layer.py. The implementation is clean and correctly captures the on-entry/on-exit pattern.

However, I've found a critical issue in the new decorator. It attempts to access the layer_name parameter from keyword arguments, but this parameter is passed positionally in all call sites. This will cause a KeyError at runtime. I've provided a suggestion to fix this by using the inspect module to robustly retrieve the argument.

Once this is addressed, the PR will be in excellent shape.

vllm/attention/utils/kv_transfer_utils.py

NickLucche · 2025-10-30T15:40:13Z

@codex review

markmc · 2025-10-31T17:55:36Z

Rebase to pick up #27861 and #27811

markmc

Love the idea in general, suggestion inline

vllm/attention/utils/kv_transfer_utils.py

markmc · 2025-10-31T18:45:21Z

Oh, see also #27026 - I'm proposing skipping the wait_for_layer_load() and save_kv_layer() if there is no KV connector metadata bound as a quick fix for #26675

NickLucche · 2025-11-03T13:46:56Z

Thanks for reviewing @markmc !

ProExpertProg

Just a few nits and Qs

vllm/attention/layer.py

vllm/attention/utils/kv_transfer_utils.py

vllm/attention/layer.py

mergify · 2025-11-11T12:53:33Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

NickLucche · 2025-11-11T18:16:56Z

@hmellor can you tell why the docs build really dislikes this PR? :)

ProExpertProg

Nice cleanup

hmellor · 2025-11-11T18:21:37Z

vllm/attention/layer.py

+    Returns:
+        A tuple containing:
+        - attn_metadata: Attention metadata for this specific layer, or None if
+          no metadata available


@NickLucche https://app.readthedocs.org/projects/vllm/builds/30282990/#293811637--1144

Suggested change

no metadata available

no metadata available

or

Suggested change

no metadata available

no metadata available

(not sure which will render better)

ptovam · 2025-11-11T18:43:52Z

Oh, see also #27026 - I'm proposing skipping the wait_for_layer_load() and save_kv_layer() if there is no KV connector metadata bound as a quick fix for #26675

@NickLucche any reason for not checking connector.has_connector_metadata() in the decorator? (was introduced in #28253)

NickLucche · 2025-11-11T21:25:55Z

Thanks @ptovam for spotting this! Rebase cruft on me

Signed-off-by: NickLucche <[email protected]>

Reduces code duplication between the maybe_transfer_kv_layer and the functions it decorates. Signed-off-by: Mark McLoughlin <[email protected]>

Signed-off-by: NickLucche <[email protected]>

…oject#27816) Signed-off-by: NickLucche <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Signed-off-by: George D. Torres <[email protected]>

…oject#27816) Signed-off-by: NickLucche <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Signed-off-by: Bram Wasti <[email protected]>

…oject#27816) Signed-off-by: NickLucche <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]>

NickLucche requested a review from LucasWilkinson as a code owner October 30, 2025 15:26

gemini-code-assist bot reviewed Oct 30, 2025

View reviewed changes

vllm/attention/utils/kv_transfer_utils.py Outdated Show resolved Hide resolved

markmc reviewed Oct 31, 2025

View reviewed changes

vllm/attention/utils/kv_transfer_utils.py Outdated Show resolved Hide resolved

NickLucche force-pushed the refactor-wait-kv-load branch from 8abc2cd to e3914fa Compare November 3, 2025 13:45

ProExpertProg reviewed Nov 3, 2025

View reviewed changes

vllm/attention/layer.py Outdated Show resolved Hide resolved

vllm/attention/utils/kv_transfer_utils.py Outdated Show resolved Hide resolved

vllm/attention/utils/kv_transfer_utils.py Show resolved Hide resolved

vllm/attention/layer.py Show resolved Hide resolved

NickLucche force-pushed the refactor-wait-kv-load branch from e3914fa to 5546494 Compare November 4, 2025 17:28

markmc approved these changes Nov 5, 2025

View reviewed changes

NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 5, 2025

NickLucche requested a review from ProExpertProg November 7, 2025 13:07

NickLucche closed this Nov 7, 2025

NickLucche reopened this Nov 7, 2025

NickLucche force-pushed the refactor-wait-kv-load branch from 5f24f38 to f3b6244 Compare November 9, 2025 12:24

mergify bot added the needs-rebase label Nov 11, 2025

NickLucche force-pushed the refactor-wait-kv-load branch from 9a18fa5 to 900be1f Compare November 11, 2025 17:34

mergify bot removed the needs-rebase label Nov 11, 2025

ProExpertProg approved these changes Nov 11, 2025

View reviewed changes

hmellor reviewed Nov 11, 2025

View reviewed changes

NickLucche and others added 5 commits November 12, 2025 10:12

refactor kv transfer methods into decorator

31a2ee3

Signed-off-by: NickLucche <[email protected]>

missed one

65d7c91

Signed-off-by: NickLucche <[email protected]>

fetch layer_name index in args

ad51dd4

Signed-off-by: NickLucche <[email protected]>

[Attention] Create get_attention_context() helper

02f6277

Reduces code duplication between the maybe_transfer_kv_layer and the functions it decorates. Signed-off-by: Mark McLoughlin <[email protected]>

review

e9aab4c

Signed-off-by: NickLucche <[email protected]>

fix rebase cruft

7ba7765

Signed-off-by: NickLucche <[email protected]>

NickLucche force-pushed the refactor-wait-kv-load branch from 9d06f87 to 7ba7765 Compare November 12, 2025 10:12

NickLucche enabled auto-merge (squash) November 12, 2025 10:34

peace treaty with docs builder

839a936

Signed-off-by: NickLucche <[email protected]>

NickLucche merged commit 728a9eb into vllm-project:main Nov 12, 2025
47 checks passed

Uh oh!

[Misc] Refactor Attention kv transfer methods into decorator #27816

[Misc] Refactor Attention kv transfer methods into decorator #27816

Uh oh!

Conversation

NickLucche commented Oct 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

NickLucche commented Oct 30, 2025

Uh oh!

markmc commented Oct 31, 2025

Uh oh!

markmc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

markmc commented Oct 31, 2025

Uh oh!

NickLucche commented Nov 3, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Nov 11, 2025

Uh oh!

NickLucche commented Nov 11, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ptovam commented Nov 11, 2025

Uh oh!

NickLucche commented Nov 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

NickLucche commented Oct 30, 2025 •

edited by github-actions bot

Loading

hmellor Nov 11, 2025 •

edited

Loading