[neuron] add reshape_and_cache #14391

liangfu · 2025-03-07T01:24:55Z

Add reshape_and_cache function for Neuron KV cache updates

Implements a helper function to write key-value pairs into block-based KV cache tensors. Handles the layout mismatch between:

Input tensors: (num_tokens, n_kv_head, d_head)
Cache tensors: (num_blocks, n_kv_head, block_size, d_head)

Uses block index calculations and torch.index_put_ to efficiently map and write the inputs into the correct cache positions. Optimized for Neuron's memory layout requirements.

Signed-off-by: Liangfu Chen <[email protected]>

github-actions · 2025-03-07T01:25:04Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

lingfanyu

LGTM! Thanks!

aarondou · 2025-03-07T20:34:55Z

vllm/attention/ops/nki_flash_attn.py

+
+    Returns:
+        None: Updates the key_cache and value_cache tensors in-place
+    """


would add an out of bound check for slot_mapping so it < num_blocks * block_size

not sure about which of the two cases:
1/ bound check for slot_mapping shape
2/ bound check for slot_mapping values

for 1, i think slot_mapping shape (aka num_batched_tokens) could be greater than num_blocks * block_size (e.g. with DMA skipping), although it can be a rare case.
for 2, values are checked at execution time, not controled by the kernel/compilation.

yeah, I meant the value. what will happen if some value of slot_mapping go beyond num_blocks * block_size (how do we protect against it?), or could it happen at all?

if that's the case, there must be a bug with scheduler, which allocates the slots. If value exceed the boundary, the neuron runtime would raise out-of-bound (OOB) error.

aarondou · 2025-03-07T20:36:18Z

vllm/attention/ops/nki_flash_attn.py

+    key: torch.Tensor,
+    value: torch.Tensor,
+    key_cache: torch.Tensor,
+    value_cache: torch.Tensor,
+    slot_mapping: torch.Tensor,


would add a check to make sure they are on the same device.

aarondou

approved with some non-blocking questions.

aarondou · 2025-03-10T06:39:43Z

vllm/attention/ops/nki_flash_attn.py

+        value (torch.Tensor): Value tensor with shape 
+            (num_tokens, n_kv_head, d_head)
+        key_cache (torch.Tensor): Key cache tensor with shape 
+            (num_blocks, n_kv_head, block_size, d_head)


non blocking question: why this rather than (num_blocks, block_size, n_kv_head, d_head)?

the KV cache layout has been changed from (num_blocks, block_size, n_kv_head, d_head) to (num_blocks, n_kv_head, block_size, d_head) in #13245, in order to speedup KV cache loading and eliminate unnecessary transpose.

aarondou · 2025-03-10T06:54:08Z

vllm/attention/ops/nki_flash_attn.py

+
+    Returns:
+        None: Updates the key_cache and value_cache tensors in-place
+    """


yeah, I meant the value. what will happen if some value of slot_mapping go beyond num_blocks * block_size (how do we protect against it?), or could it happen at all?

Signed-off-by: Louis Ulmer <[email protected]>

[neuron] add reshape_and_cache

1140210

Signed-off-by: Liangfu Chen <[email protected]>

lingfanyu approved these changes Mar 7, 2025

View reviewed changes

aarondou reviewed Mar 7, 2025

View reviewed changes

aarondou approved these changes Mar 10, 2025

View reviewed changes

simon-mo merged commit c91b64f into vllm-project:main Mar 11, 2025
21 checks passed

liangfu mentioned this pull request Mar 11, 2025

[RFC][Exploratory]: vLLM Neuron Backend with V1 Architecture #11152

Closed

8 tasks

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[neuron] add reshape_and_cache (vllm-project#14391)

e341989

Signed-off-by: Louis Ulmer <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[neuron] add reshape_and_cache (vllm-project#14391)

eadf6dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[neuron] add reshape_and_cache #14391

[neuron] add reshape_and_cache #14391

liangfu commented Mar 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 7, 2025

Uh oh!

lingfanyu left a comment

Uh oh!

aarondou Mar 7, 2025

Uh oh!

liangfu Mar 7, 2025

Uh oh!

aarondou Mar 10, 2025

Uh oh!

liangfu Mar 10, 2025 •

edited

Loading

Uh oh!

aarondou Mar 7, 2025

Uh oh!

aarondou left a comment

Uh oh!

aarondou Mar 10, 2025

Uh oh!

liangfu Mar 10, 2025

Uh oh!

aarondou Mar 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[neuron] add reshape_and_cache #14391

[neuron] add reshape_and_cache #14391

Conversation

liangfu commented Mar 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 7, 2025

Uh oh!

lingfanyu left a comment

Choose a reason for hiding this comment

Uh oh!

aarondou Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

liangfu Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

aarondou Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

liangfu Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aarondou Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

aarondou left a comment

Choose a reason for hiding this comment

Uh oh!

aarondou Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

liangfu Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

aarondou Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

liangfu commented Mar 7, 2025 •

edited by github-actions bot

Loading

liangfu Mar 10, 2025 •

edited

Loading