Harden GEMM rewriter (beta=0 for FP8, safer dim, ROCm FP8) #407
+29
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hypothesis: cuBLAS[Lt] GEMM uses
D := α * (A @ B) + β * C.
With matrix-bias fusion we set β = 1, which reads C. I’ve seen cases where C had garbage/stale values, and when A,B are small, β*C dominates → huge values / NaNs. For example, this one
FP8 GEMM: skip matrix-bias→C fusion
“C” (β=1). Keeping β=0 so GEMM does not read C at all, and apply
the bias as a separate Add right after the GEMM.
without changing numerics otherwise.
Safer non-contracting dim selection
non_contracting_dim = -1, select it explicitly, andCHECK it was found before use. This prevents any accidental use of
an uninitialized variable if future refactors ever violate the
single-(non)contracting-dim invariant