Skip to content

Commit c44b9a2

Browse files
njhilljuliendenize
authored andcommitted
[BugFix] Fix incorrect preallocated sampled_token_ids tensor size (vllm-project#28025)
Signed-off-by: Nick Hill <[email protected]>
1 parent 1f5adef commit c44b9a2

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/v1/worker/gpu_model_runner.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -524,7 +524,7 @@ def __init__(
524524
self._draft_token_ids: list[list[int]] | torch.Tensor | None = None
525525
self.transfer_event = torch.cuda.Event()
526526
self.sampled_token_ids_pinned_cpu = torch.empty(
527-
(self.max_model_len, 1),
527+
(self.max_num_reqs, 1),
528528
dtype=torch.int64,
529529
device="cpu",
530530
pin_memory=self.pin_memory,

0 commit comments

Comments
 (0)