Commit 91d3af2
authored
merge updates for Mi300x ops (#267)
* add initial r1 ops for testing
* add deepseek_r1_sigmoid_top_k_f32
* add glu_expert_bf16xf8_block_scal
* fix glu_expert_bf16xf8_block_scal for cat([gate, up], dim=0)
* update cached bf16xf8_block_scal
* add gemm_nt_bf16xfp8_block_scal
* add modules
* enhance glu perf for bs=32
* fuse routed_scaled in topk_gating
* using rotary_lookup_bf16 instead of rotary_emb_bf16
* moving head/tail ops to C++
* add partial absorb fusion
* using gate_gemm_out_bf16
* add gemm_gate_up_silu_bf16xf8_s_16x16
* use torch::matmul for some gemm
* handle scaling at non-ending dim
* add glu_expert_bf16xf8_block_scal_16x16 back
* fine tune bf16 deviation
* add test_allreduce_bf16
* update deepseek_sigmoid_top_8_static_v2 with scaling 2.5
* add glu_expert_bf16xf8_block_scal_16x16_fnuz
* restore previous interface to avoid compatiblity break
* add multi_head_latent_rope_bf16
* add system.from_url()
* force deepseek_sigmoid_top_8_static_v2's dtype compatible with sglang
* isolate NCCL-dependent APIs with macros1 parent 490917d commit 91d3af2
30 files changed
Lines changed: 629 additions & 86 deletions
File tree
- tutel
- custom
- examples
- ops
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
| 116 | + | |
116 | 117 | | |
117 | 118 | | |
118 | 119 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
181 | 181 | | |
182 | 182 | | |
183 | 183 | | |
184 | | - | |
| 184 | + | |
185 | 185 | | |
186 | 186 | | |
187 | 187 | | |
| |||
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
0 commit comments