Skip to content

Commit b353e36

Browse files
authored
Update README.md
1 parent fca2114 commit b353e36

File tree

1 file changed

+1
-6
lines changed

1 file changed

+1
-6
lines changed

README.md

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -99,12 +99,7 @@ CUTLASS team is working on a fix.
9999
# Performance
100100

101101
CUTLASS primitives are very efficient. When used to construct device-wide GEMM kernels,
102-
they exhibit nearly optimal utilization of peak theoretical throughput. The figure below
103-
shows CUTLASS 3.8's performance as a % of theoretical peak utilization
104-
on various input and output data types when run on NVIDIA Blackwell SM100 architecture GPU.
105-
106-
<p align="center"><img src=media/images/cutlass-3.8-blackwell-gemm-peak-performance.svg></p>
107-
102+
they exhibit nearly optimal utilization of peak theoretical throughput.
108103
The two figures below show the continual CUTLASS performance improvements
109104
on an [NVIDIA H100](https://www.nvidia.com/en-us/data-center/h100/) (NVIDIA Hopper architecture) since
110105
CUTLASS 3.1.

0 commit comments

Comments
 (0)