Skip to content

Commit 8e72f08

Browse files
committed
apply fix to fp8 pt
Signed-off-by: jiahanc <[email protected]>
1 parent ed83138 commit 8e72f08

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

csrc/trtllm_fused_moe_kernel_launcher.cu

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,8 +161,9 @@ void trtllm_fp8_per_tensor_scale_moe_launcher(
161161
alloc_tensor({args.num_tokens, args.top_k}, routing_bias_dtype, routing_logits->device);
162162
Tensor expert_indexes =
163163
alloc_tensor({args.num_tokens, args.top_k}, dl_int32, routing_logits->device);
164+
int64_t const size_of_expert_count_histogram = std::max(num_experts * 2, int64_t(256 * 2));
164165
Tensor expert_count_histogram = alloc_tensor(
165-
{2 * 256},
166+
{size_of_expert_count_histogram},
166167
dl_int32, // 256 is the max number of threads per block and max number of experts
167168
routing_logits->device);
168169

0 commit comments

Comments
 (0)