Harden GEMM rewriter (beta=0 for FP8, safer dim, ROCm FP8) #407

phambinhfin · 2025-10-17T12:30:33Z

Hypothesis: cuBLAS[Lt] GEMM uses
D := α * (A @ B) + β * C.
With matrix-bias fusion we set β = 1, which reads C. I’ve seen cases where C had garbage/stale values, and when A,B are small, β*C dominates → huge values / NaNs. For example, this one

buffer=A elems=16777216 sample=1024 NaN=0 Inf=0 min=-2.75 max=3.5
buffer=B elems=524288 sample=1024 NaN=0 Inf=0 min=-0.125 max=0.109375
buffer=C elems=33554432 sample=1024 NaN=0 Inf=0 min=-2.37932e+38 max=-2.37932e+38
buffer=D(out) elems=33554432 sample=1024 NaN=0 Inf=0 min=-2.37932e+38 max=-2.37932e+38

FP8 GEMM: skip matrix-bias→C fusion
- For FP8 cublasLt matmuls,it no longer fuse matrix bias as GEMM’s
  “C” (β=1). Keeping β=0 so GEMM does not read C at all, and apply
  the bias as a separate Add right after the GEMM.
- Effect: eliminates the “garbage-in-C → garbage-in-D” failure mode
  without changing numerics otherwise.
Safer non-contracting dim selection
- Initialize non_contracting_dim = -1, select it explicitly, and
  CHECK it was found before use. This prevents any accidental use of
  an uninitialized variable if future refactors ever violate the
  single-(non)contracting-dim invariant

ScXfjiang · 2025-10-17T13:20:59Z

Instead of to disable this fusion, we need to figure out why it fails to work for gfx950.

phambinhfin · 2025-10-17T15:17:50Z

Instead of to disable this fusion, we need to figure out why it fails to work for gfx950.

I think if we can find another GPUs that support FP8, we also can confirm that this issue may happen there as well, not only gfx950

ScXfjiang · 2025-10-17T15:21:43Z

Instead of to disable this fusion, we need to figure out why it fails to work for gfx950.

I think if we can find another GPUs that support FP8, we also can confirm that this issue may happen there as well, not only gfx950

It's the OCP FP8 that we currently care about, and OCP FP8 is only supported in gfx950 and gfx1201.

phambinhfin · 2025-10-23T13:46:47Z

Can you test again, i just cover more cases to preven Ffusion

i-chaochen · 2025-11-06T12:55:18Z

Hi @phambinhfin I think we can close this PR since #416 is merged?

Harden GEMM rewriter (beta=0 for FP8, safer dim, ROCm FP8)

23898e4

phambinhfin self-assigned this Oct 17, 2025

i-chaochen requested a review from ScXfjiang October 17, 2025 13:07

Guard all matrix‑bias fusions for ROCm FP8

c6f0b50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Harden GEMM rewriter (beta=0 for FP8, safer dim, ROCm FP8) #407

Harden GEMM rewriter (beta=0 for FP8, safer dim, ROCm FP8) #407

phambinhfin commented Oct 17, 2025

Uh oh!

ScXfjiang commented Oct 17, 2025

Uh oh!

phambinhfin commented Oct 17, 2025

Uh oh!

ScXfjiang commented Oct 17, 2025

Uh oh!

phambinhfin commented Oct 23, 2025

Uh oh!

i-chaochen commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Harden GEMM rewriter (beta=0 for FP8, safer dim, ROCm FP8) #407

Are you sure you want to change the base?

Harden GEMM rewriter (beta=0 for FP8, safer dim, ROCm FP8) #407

Conversation

phambinhfin commented Oct 17, 2025

Uh oh!

ScXfjiang commented Oct 17, 2025

Uh oh!

phambinhfin commented Oct 17, 2025

Uh oh!

ScXfjiang commented Oct 17, 2025

Uh oh!

phambinhfin commented Oct 23, 2025

Uh oh!

i-chaochen commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants