[Bugfix] Mamba cache Cuda Graph padding #6214

tomeras91 · 2024-07-08T13:49:36Z

The current Jamba implementation doesn't allow use of Cuda Graph with batch sizes that weren't captured. When trying to do that, a RuntimeError is raised with incompatible tensor shapes. This happens because the mamba cache wasn't padded to the Cuda Graph batch size.

This PR fixes this issue, and adds a test to assert it.

…ape and not by number of sequences. This is so we pad the mamba cache to the captured CG batch sizes

Signed-off-by: Alvant <[email protected]>

Signed-off-by: LeiWang1999 <[email protected]>

tomeras91 added 3 commits July 8, 2024 15:50

bugfix: when working in CG mode, batch size should be by input_ids sh…

b966736

…ape and not by number of sequences. This is so we pad the mamba cache to the captured CG batch sizes

Add relevant test

612df99

rename batch_szoe -> cg_batch_size for clarity

e0b49e4

simon-mo approved these changes Jul 8, 2024

View reviewed changes

simon-mo merged commit ddc369f into vllm-project:main Jul 8, 2024

tlrmchlsmth added a commit to neuralmagic/nm-vllm that referenced this pull request Jul 16, 2024

apply fix from vllm-project#6214

fb846ce

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Bugfix] Mamba cache Cuda Graph padding (vllm-project#6214)

5053bc7

tomeras91 deleted the mamba-cg-cache-padding branch August 12, 2024 15:00

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Bugfix] Mamba cache Cuda Graph padding (vllm-project#6214)

0aeda41

Signed-off-by: Alvant <[email protected]>

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025

[Bugfix] Mamba cache Cuda Graph padding (vllm-project#6214)

bebcaa5

Signed-off-by: LeiWang1999 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Mamba cache Cuda Graph padding #6214

[Bugfix] Mamba cache Cuda Graph padding #6214

Uh oh!

tomeras91 commented Jul 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bugfix] Mamba cache Cuda Graph padding #6214

[Bugfix] Mamba cache Cuda Graph padding #6214

Uh oh!

Conversation

tomeras91 commented Jul 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants