Enhance w4afp8 performance: implement per-token w4afp8 CUTLASS MoE GEMM for FP8 dispatch, improve performance with w4afp8 moe gemm #18144
+0
−0
Wangzheee:w4afp8_per-token-kernel was force-pushed and no longer has any new commits.
Pushing new commits will allow the pull request to be re-opened.