File tree Expand file tree Collapse file tree 1 file changed +1
-6
lines changed
Expand file tree Collapse file tree 1 file changed +1
-6
lines changed Original file line number Diff line number Diff line change @@ -99,12 +99,7 @@ CUTLASS team is working on a fix.
9999# Performance
100100
101101CUTLASS primitives are very efficient. When used to construct device-wide GEMM kernels,
102- they exhibit nearly optimal utilization of peak theoretical throughput. The figure below
103- shows CUTLASS 3.8's performance as a % of theoretical peak utilization
104- on various input and output data types when run on NVIDIA Blackwell SM100 architecture GPU.
105-
106- <p align =" center " ><img src =media/images/cutlass-3.8-blackwell-gemm-peak-performance.svg ></p >
107-
102+ they exhibit nearly optimal utilization of peak theoretical throughput.
108103The two figures below show the continual CUTLASS performance improvements
109104on an [ NVIDIA H100] ( https://www.nvidia.com/en-us/data-center/h100/ ) (NVIDIA Hopper architecture) since
110105CUTLASS 3.1.
You can’t perform that action at this time.
0 commit comments