Skip to content

Commit f279cae

Browse files
2015aroraslulmer
authored andcommitted
[Bugfix][Model] OLMo 2: split qkv correctly for GQA and MQA (vllm-project#13687)
Signed-off-by: Louis Ulmer <[email protected]>
1 parent 5f969ef commit f279cae

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/model_executor/models/olmo2.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ def forward(
157157
attn_metadata: AttentionMetadata,
158158
) -> torch.Tensor:
159159
qkv, _ = self.qkv_proj(hidden_states)
160-
q, k, v = qkv.chunk(chunks=3, dim=-1)
160+
q, k, v = qkv.split([self.q_size, self.kv_size, self.kv_size], dim=-1)
161161
q, k = self._apply_qk_norm(q, k)
162162
q, k = self.rotary_emb(positions, q, k)
163163
attn_output = self.attn(q, k, v, kv_cache, attn_metadata)

0 commit comments

Comments
 (0)