Skip to content

Commit 214ca64

Browse files
shuangpesaagarjhahwu36sklevtsov-nvidiatridao
authored
Upgrade cutlass to v3.8.0 with commit 833f699 (NVIDIA#131)
* Handle MNK Sm90{Row, Col}Reduction problem shapes (NVIDIA#1803) * add is_last_tile * Improve sm90 mixed dtype kernel (NVIDIA#1883) * Add GMMA shape m64n40k16 (NVIDIA#1864) * Add all supported GMMA shapes (NVIDIA#1890) * add maximum support (NVIDIA#1833) * fix typo (NVIDIA#1853) * fix by adding public (NVIDIA#1753) * added mapping for bf16 to torch::kBFloat16 (NVIDIA#1843) Co-authored-by: Haicheng Wu <[email protected]> * Fix README (NVIDIA#1658) * Fix README * Improve README --------- Co-authored-by: Haicheng Wu <[email protected]> * Adjusting code indentation (NVIDIA#1639) * Include of regular_tile_iterator.h fixed for NVRTC (NVIDIA#1765) * Include of regular_tile_iterator.h fixed for NVRTC * More include fixed for NVRTC * Update gemm_f16n_f16t_f32t_tensor_op_f32_sm80.cu with include "cutlass/gemm/device/gemm_universal.h" (NVIDIA#1569) fix compile with `cmake .. -DCUTLASS_ENABLE_TESTS=ON -DCUTLASS_TEST_LEVEL=2` * remove redundant hardcoded packing configs in mixed dtype gemm (NVIDIA#1894) Co-authored-by: Siyuan Fu <[email protected]> * fix wrong A/BLayout in MMA_Traits for binary mma and append other MMA_Traits support (NVIDIA#1856) * fix wrong A/BLayout in MMA_Traits<SM80_16x8x256_S32U1U1S32_TN_XORPOPC> and append support for m8n8k128, m16n8k128 mma.and.popc in MMA_Traits instantiation * add "print" template for subbyte_reference<T> * Add a print for the uint{x}b_t type. (NVIDIA#1871) * Refactor some GroupedGEMM logic (NVIDIA#1899) * feat: support kFactor 8 used in mma tensor op tile iterator (NVIDIA#1512) * Update publications (NVIDIA#1912) * remove restriction of stride == kernel in nhwc_pooling (NVIDIA#1896) * fix undefined in device code error (NVIDIA#1880) * Fix the racing condition of mixed-input gemm when writing the registers (NVIDIA#1931) * move two warpgroup_wait * merge main --------- Co-authored-by: Siyuan Fu <[email protected]> * Fix `cutlass` python library with cuda `12.6.2.post1` (NVIDIA#1942) * Fix `cutlass` python library with cuda `12.6.2.post1` Previously we had this error: ``` File "/storage/home/cutlass/python/cutlass/backend/operation.py", line 39, in <listcomp> _version_splits = [int(x) for x in __version__.split("rc")[0].split(".")] ^^^^^^ ValueError: invalid literal for int() with base 10: 'post1' ``` * Update sm90_utils.py * Update generator.py * Update python/cutlass_library/generator.py Co-authored-by: Jack Kosaian <[email protected]> * Update python/cutlass_library/sm90_utils.py Co-authored-by: Jack Kosaian <[email protected]> --------- Co-authored-by: Jack Kosaian <[email protected]> * add {uint4, uint2, int2} => {fp16, bf16} conversion (NVIDIA#1966) * Improve mixed dtype GEMM (NVIDIA#1972) * update * fix a typo * fix a typo that fails the compiling when ElementScale is not the same as MmaType (NVIDIA#1977) * Fix CuTe README Typo (NVIDIA#1951) * Fix Typo (NVIDIA#1962) * 3.6.0 update (NVIDIA#2005) * 3.6.0 update * doc and swap stuff --------- Co-authored-by: yuzhai <[email protected]> Co-authored-by: Haicheng Wu <[email protected]> * Update CHANGELOG.md * Update 0x_gemm_tutorial.md (NVIDIA#1982) Shouldn't this be BLK_M, BLK_**K**, k * fix bug: arch/mma_sm60.h Mma<2,2,1> calculate wrong (NVIDIA#1989) * fix mem fence (NVIDIA#2030) Co-authored-by: yuzhai <[email protected]> * Add half->int8 saturate conversion to promise valid range (NVIDIA#1983) * Add half->int8 saturate conversion to promise valid range * add gpu only macro --------- Co-authored-by: Haicheng Wu <[email protected]> * Add vector-types back to platform.h (NVIDIA#2026) * Fix typo in library_defaults.py (NVIDIA#2024) * Fix Typos (NVIDIA#2021) * Fix Typo * Fix Typo * Add Line Break (NVIDIA#2020) * Blockwise Scaling for FP8 (NVIDIA#1932) * F8 Blockwise Scaling * two more NumProducerThreadEvents --------- Co-authored-by: Haicheng Wu <[email protected]> * fix assertion in integer_subbytes.h (NVIDIA#1961) * CUTLASS 3.7 (NVIDIA#2045) * CUTLASS 3.7 * clean up changelog --------- Co-authored-by: yuzhai <[email protected]> Co-authored-by: Haicheng Wu <[email protected]> * update 3.7 docs (NVIDIA#2051) * update docs * update docs * update docs * update docs * update docs --------- Co-authored-by: yuzhai <[email protected]> * CUTLASS 3.8 Release (NVIDIA#2059) * CUTLASS 3.8 Release * update * Update README.md * Revert "Update README.md" This reverts commit b353e36. * update * update --------- Co-authored-by: Haicheng Wu <[email protected]> Co-authored-by: Haicheng Wu <[email protected]> * fix cuda 12.6 issues (NVIDIA#2066) * fix a readme broken link (NVIDIA#2069) * Update README.md * Groupwise scaling along M for FP8 gemm (NVIDIA#2037) * FP8 groupwise scaling along M * small updates --------- Co-authored-by: zl <[email protected]> Co-authored-by: Haicheng Wu <[email protected]> * bugfix generic-k code in top-k with softmax (NVIDIA#1993) * bugfix generic-k code in top-k with softmax * Update include/cutlass/epilogue/fusion/sm90_visitor_topk_softmax.hpp Co-authored-by: Ali Hassani <[email protected]> * Update examples/61_hopper_gemm_with_topk_and_softmax/61_hopper_gemm_with_topk_and_softmax.cu Co-authored-by: Ali Hassani <[email protected]> --------- Co-authored-by: Ali Hassani <[email protected]> * [EVT] Add support for Row/Col broadcast PtrArray (NVIDIA#2033) * Add group support to EVT row/col broadcast. * small modifications --------- Co-authored-by: Haicheng Wu <[email protected]> * v3.8.0 update (NVIDIA#2082) * 3.8 update * fix Markus' name --------- Co-authored-by: yuzhai <[email protected]> * [WA] Fix compiling errors --------- Co-authored-by: Saagar Jha <[email protected]> Co-authored-by: Haicheng Wu <[email protected]> Co-authored-by: Sergey Klevtsov <[email protected]> Co-authored-by: Tri Dao <[email protected]> Co-authored-by: Xinyu Yang <[email protected]> Co-authored-by: sijialou <[email protected]> Co-authored-by: Bogumil Sapinski Mobica <[email protected]> Co-authored-by: Haicheng Wu <[email protected]> Co-authored-by: Lei Mao <[email protected]> Co-authored-by: 103yiran <[email protected]> Co-authored-by: MaxAkaAltmer <[email protected]> Co-authored-by: 侯奇 <[email protected]> Co-authored-by: Lain <[email protected]> Co-authored-by: Siyuan Fu <[email protected]> Co-authored-by: Caleb_Du <[email protected]> Co-authored-by: LiYu Lu <[email protected]> Co-authored-by: azhurkevich <[email protected]> Co-authored-by: chenwei <[email protected]> Co-authored-by: Wenlei Bao <[email protected]> Co-authored-by: LiuQiang <[email protected]> Co-authored-by: dan_the_3rd <[email protected]> Co-authored-by: Jack Kosaian <[email protected]> Co-authored-by: Yujia Zhai <[email protected]> Co-authored-by: yuzhai <[email protected]> Co-authored-by: Andrew O'Neill <[email protected]> Co-authored-by: Dongxu.Wang <[email protected]> Co-authored-by: ZZK <[email protected]> Co-authored-by: Driss Guessous <[email protected]> Co-authored-by: ZincCat <[email protected]> Co-authored-by: Manish Gupta <[email protected]> Co-authored-by: bobliao <[email protected]> Co-authored-by: mihir-awatramani <[email protected]> Co-authored-by: Liang <[email protected]> Co-authored-by: zl <[email protected]> Co-authored-by: Tadej Ciglarič <[email protected]> Co-authored-by: Ali Hassani <[email protected]> Co-authored-by: Josh Fromm <[email protected]>
1 parent 3d5428b commit 214ca64

File tree

2,224 files changed

+321604
-113201
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,224 files changed

+321604
-113201
lines changed

ACTIVE_DEVELOPERS.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
![ALT](./media/images/gemm-hierarchy-with-epilogue-no-labels.png "CUTLASS")
2+
3+
[README](./README.md#documentation) > **Active Developers**
4+
5+
# CUTLASS Developers **
6+
7+
Andrew Kerr (CUTLASS founding member)<br />
8+
Dustyn Blasig<br />
9+
Albert Xu<br />
10+
Junkai Wu<br />
11+
Xiuxia Zhang<br />
12+
Haicheng Wu (CUTLASS founding member)<br />
13+
Jack Yang<br />
14+
Pradeep Ramani (CUTLASS 3.x founding member)<br />
15+
Aditya Atluri<br />
16+
Han Li<br />
17+
Nick Zhao<br />
18+
Ivan Yin<br />
19+
Yu-Jung Chen<br />
20+
Markus Hoehnerbach<br />
21+
Honghao Lu<br />
22+
Mihir Awatramani<br />
23+
Hao Sheng<br />
24+
Zekun Fan<br />
25+
Aniket Shivam<br />
26+
Siyu Liu<br />
27+
Richard Cai<br />
28+
Vikas Gupta<br />
29+
Ethan Yan<br />
30+
Vijay Thakkar (CUTLASS 3.x founding member)<br />
31+
Cris Cecka (CuTe and CUTLASS 3.x founding member)<br />
32+
Lawrence Ryan<br />
33+
Qun Song<br />
34+
Daniel Ricketts<br />
35+
dePaul Miller<br />
36+
Yuhan Li<br />
37+
Saman Ashkiani<br />
38+
Jack Chen<br />
39+
Shang Zhang<br />
40+
Petrick Liu<br />
41+
Questa Wang<br />
42+
Pramod Shenoy<br />
43+
Jack Kosaian<br />
44+
Yujia Zhai<br />
45+
Zhaodong Chen<br />
46+
Manas Sahni<br />
47+
Shunfan Shao<br />
48+
Fengqi Qiao<br />
49+
Serif Yesil<br />
50+
Aragorn Guan<br />
51+
Heidi He<br />
52+
Xiao Song<br />
53+
Sergey Klevtsov<br />
54+
Jiang Shao<br />
55+
Ruqing Xu<br />
56+
Mengyu Guo<br />
57+
Tao Xie<br />
58+
Linfeng Zheng<br />
59+
Harrison Barclay<br />
60+
Wenfei Tang<br />
61+
Diksha Gohlyan<br />
62+
Alexander Zhurkevich<br />
63+
Siyuan Fu<br />
64+
Hua Huang<br />
65+
Xiufan Liang<br />
66+
Ian Tramble<br />
67+
Ali Hassani<br />
68+
Shreya Gaur<br />
69+
70+
** _The list is sorted in order of the author's first contribution to the CUTLASS project._
71+
72+
# CUTLASS Product Manager
73+
Matthew Nicely<br />

CHANGELOG.md

Lines changed: 84 additions & 16 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)