Skip to content

Commit 05ae8a6

Browse files
mikeiovinedominicshanshan
authored andcommitted
[https://nvbugs/5455836][fix] Fix llama 4 FP4 (NVIDIA#6911)
Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
1 parent ba2b8ae commit 05ae8a6

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

tensorrt_llm/_torch/models/modeling_llama.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,9 @@ def _forward_nope(
186186
mrope_config,
187187
attention_sinks=None)
188188

189+
if isinstance(attn_output, tuple):
190+
attn_output = Fp4QuantizedTensor(attn_output[0], attn_output[1])
191+
189192
attn_output = self.o_proj(attn_output,
190193
all_reduce_params=all_reduce_params)
191194

0 commit comments

Comments
 (0)