Commit dd25c4d
Implement per-token w4afp8 CUTLASS MoE GEMM for FP8 dispatch
Co-authored-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>1 parent 6dfa8a4 commit dd25c4d
File tree
15 files changed
+1302
-409
lines changed- python/sglang/srt/layers/quantization
- sgl-kernel
- csrc
- cutlass_extensions
- detail/collective
- gemm/collective
- builders
- moe/cutlass_moe/w4a8
- include
- python/sgl_kernel
- tests
15 files changed
+1302
-409
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2055 | 2055 | | |
2056 | 2056 | | |
2057 | 2057 | | |
| 2058 | + | |
| 2059 | + | |
| 2060 | + | |
| 2061 | + | |
| 2062 | + | |
| 2063 | + | |
| 2064 | + | |
| 2065 | + | |
| 2066 | + | |
| 2067 | + | |
| 2068 | + | |
| 2069 | + | |
| 2070 | + | |
| 2071 | + | |
| 2072 | + | |
| 2073 | + | |
| 2074 | + | |
| 2075 | + | |
| 2076 | + | |
| 2077 | + | |
| 2078 | + | |
| 2079 | + | |
| 2080 | + | |
| 2081 | + | |
| 2082 | + | |
| 2083 | + | |
| 2084 | + | |
| 2085 | + | |
| 2086 | + | |
| 2087 | + | |
| 2088 | + | |
| 2089 | + | |
| 2090 | + | |
| 2091 | + | |
| 2092 | + | |
| 2093 | + | |
| 2094 | + | |
| 2095 | + | |
| 2096 | + | |
| 2097 | + | |
| 2098 | + | |
| 2099 | + | |
| 2100 | + | |
| 2101 | + | |
| 2102 | + | |
| 2103 | + | |
| 2104 | + | |
| 2105 | + | |
| 2106 | + | |
| 2107 | + | |
| 2108 | + | |
| 2109 | + | |
| 2110 | + | |
| 2111 | + | |
| 2112 | + | |
| 2113 | + | |
| 2114 | + | |
| 2115 | + | |
| 2116 | + | |
| 2117 | + | |
| 2118 | + | |
| 2119 | + | |
| 2120 | + | |
| 2121 | + | |
| 2122 | + | |
| 2123 | + | |
| 2124 | + | |
| 2125 | + | |
| 2126 | + | |
| 2127 | + | |
| 2128 | + | |
| 2129 | + | |
| 2130 | + | |
| 2131 | + | |
| 2132 | + | |
| 2133 | + | |
2058 | 2134 | | |
2059 | 2135 | | |
2060 | 2136 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
244 | 244 | | |
245 | 245 | | |
246 | 246 | | |
247 | | - | |
| 247 | + | |
248 | 248 | | |
249 | 249 | | |
250 | 250 | | |
| |||
0 commit comments