-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Misc] Refactor Attention kv transfer methods into decorator #27816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Refactor Attention kv transfer methods into decorator #27816
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the KV cache transfer logic into a decorator, which is a great improvement for code clarity and maintainability in layer.py. The implementation is clean and correctly captures the on-entry/on-exit pattern.
However, I've found a critical issue in the new decorator. It attempts to access the layer_name parameter from keyword arguments, but this parameter is passed positionally in all call sites. This will cause a KeyError at runtime. I've provided a suggestion to fix this by using the inspect module to robustly retrieve the argument.
Once this is addressed, the PR will be in excellent shape.
|
@codex review |
markmc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the idea in general, suggestion inline
8abc2cd to
e3914fa
Compare
|
Thanks for reviewing @markmc ! |
ProExpertProg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits and Qs
e3914fa to
5546494
Compare
5f24f38 to
f3b6244
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
9a18fa5 to
900be1f
Compare
|
@hmellor can you tell why the docs build really dislikes this PR? :) |
ProExpertProg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup
vllm/attention/layer.py
Outdated
| Returns: | ||
| A tuple containing: | ||
| - attn_metadata: Attention metadata for this specific layer, or None if | ||
| no metadata available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NickLucche https://app.readthedocs.org/projects/vllm/builds/30282990/#293811637--1144
| no metadata available | |
| no metadata available |
or
| no metadata available | |
| no metadata available |
(not sure which will render better)
@NickLucche any reason for not checking |
|
Thanks @ptovam for spotting this! Rebase cruft on me |
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Reduces code duplication between the maybe_transfer_kv_layer and the functions it decorates. Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: NickLucche <[email protected]>
9d06f87 to
7ba7765
Compare
Signed-off-by: NickLucche <[email protected]>
…oject#27816) Signed-off-by: NickLucche <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Signed-off-by: George D. Torres <[email protected]>
…oject#27816) Signed-off-by: NickLucche <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Signed-off-by: Bram Wasti <[email protected]>
…oject#27816) Signed-off-by: NickLucche <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]>
Small quality of life improvement by removing some of the kv transfer-specifc code in
layer.pyand refactoring that into a decorator. I believe the on entry-on exit pattern (wait_read-wait_write) here is very suitable for that.The result is simply that there's less non-attention related code in the file. Behavior should be unchanged.
Also, I found that after grouping some common boilerplate code for both
maybe_save_kv_layer_to_connectorandwait_for_kv_layer_from_connector, I think there's too little left to justify a separate function for both, hence I ended-up inlining both connector method calls.cc @ApostaC who wrote the initial connector code