Skip to content

Commit 1702ff9

Browse files
mikeiovinedominicshanshan
authored andcommitted
[https://nvbugs/5455836][fix] Fix llama 4 FP4 (NVIDIA#6911)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
1 parent 8bf7562 commit 1702ff9

1 file changed

Lines changed: 3 additions & 0 deletions

File tree

tensorrt_llm/_torch/models/modeling_llama.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,9 @@ def _forward_nope(
183183
mrope_config,
184184
attention_sinks=None)
185185

186+
if isinstance(attn_output, tuple):
187+
attn_output = Fp4QuantizedTensor(attn_output[0], attn_output[1])
188+
186189
attn_output = self.o_proj(attn_output,
187190
all_reduce_params=all_reduce_params)
188191

0 commit comments

Comments
 (0)