Skip to content

Conversation

@stepinto
Copy link
Contributor

@stepinto stepinto commented Aug 5, 2025

When the parameter cache_seqlen is scalar, it should be expand to vector of shape (batch_size). In the original code, whenever block_table is used, the shape of k_cache is (num_blocks, page_size, ...), and thus cache_seqlen is expanded to shape (num_blocks) instead of (batch_size), which is wrong. This fix uses the shape of q, which is always batch_size.

@stepinto stepinto force-pushed the cache_seqlens_0805 branch from 9c1843f to 0f5288c Compare August 5, 2025 16:07
@stepinto stepinto changed the title [[BugFix]] Fix flash_attn_with_kvcache with scalar cache_seqlen [BugFix] Fix flash_attn_with_kvcache with scalar cache_seqlen Aug 5, 2025
When the parameter `cache_seqlen` is scalar, it should be expand to
vector of shape (batch_size).  In the original code, whenever `block_table`
is used, the shape of `k_cache` is (num_blocks, page_size, ...), and
thus `cache_seqlen` is expanded to shape (num_blocks) instead of
(batch_size), which is wrong.  This fix uses the shape of `q`, which
is always `batch_size`.
@stepinto stepinto force-pushed the cache_seqlens_0805 branch from 0f5288c to 9057ef4 Compare August 5, 2025 16:09
@tridao tridao merged commit cd9383f into Dao-AILab:main Aug 15, 2025
@tridao
Copy link
Member

tridao commented Aug 15, 2025

Thank you!

@stepinto stepinto deleted the cache_seqlens_0805 branch October 10, 2025 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants