Skip to content

Commit 54e44ad

Browse files
mikeiovinedominicshanshan
authored andcommitted
[https://nvbugs/5455836][fix] Fix llama 4 FP4 (NVIDIA#6911)
Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
1 parent 9df91e5 commit 54e44ad

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

tensorrt_llm/_torch/models/modeling_llama.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,9 @@ def _forward_nope(
183183
mrope_config,
184184
attention_sinks=None)
185185

186+
if isinstance(attn_output, tuple):
187+
attn_output = Fp4QuantizedTensor(attn_output[0], attn_output[1])
188+
186189
attn_output = self.o_proj(attn_output,
187190
all_reduce_params=all_reduce_params)
188191

0 commit comments

Comments
 (0)