-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Description
H100
This PR:
Basic split-K GEMM kernel
Running:
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 1 (ScaleMsPerTile: 128)
ScaleGranularityN: 1 (ScaleNsPerTile: 128)
Disposition: Passed
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0159489 ms
GFLOPS: 67323.9
Running:
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 1 (ScaleMsPerTile: 128)
ScaleGranularityN: 128 (ScaleNsPerTile: 1)
Disposition: Passed
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0147137 ms
GFLOPS: 72975.5
Running:
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 128 (ScaleMsPerTile: 1)
ScaleGranularityN: 1 (ScaleNsPerTile: 128)
Disposition: Passed
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.015305 ms
GFLOPS: 70156.5
Running:
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 128 (ScaleMsPerTile: 1)
ScaleGranularityN: 128 (ScaleNsPerTile: 1)
Disposition: Passed
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.014349 ms
GFLOPS: 74830.6
StreamK GEMM kernel
Running:
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 1 (ScaleMsPerTile: 128)
ScaleGranularityN: 1 (ScaleNsPerTile: 128)
Disposition: Passed
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0176837 ms
GFLOPS: 60719.1
Running:
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 1 (ScaleMsPerTile: 128)
ScaleGranularityN: 128 (ScaleNsPerTile: 1)
Disposition: Passed
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0159611 ms
GFLOPS: 67272.3
Running:
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 128 (ScaleMsPerTile: 1)
ScaleGranularityN: 1 (ScaleNsPerTile: 128)
Disposition: Passed
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0170682 ms
GFLOPS: 62909.1
Running:
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 128 (ScaleMsPerTile: 1)
ScaleGranularityN: 128 (ScaleNsPerTile: 1)
Disposition: Passed
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0157621 ms
GFLOPS: 68121.7
Main:
Basic split-K GEMM kernel
Disposition: Passed
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 1 (ScaleMsPerTile: 128)
ScaleGranularityN: 1 (ScaleNsPerTile: 128)
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0177991 ms
GFLOPS: 60325.7
Disposition: Passed
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 1 (ScaleMsPerTile: 128)
ScaleGranularityN: 128 (ScaleNsPerTile: 1)
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0151617 ms
GFLOPS: 70819.4
Disposition: Passed
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 128 (ScaleMsPerTile: 1)
ScaleGranularityN: 1 (ScaleNsPerTile: 128)
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0158391 ms
GFLOPS: 67790.6
Disposition: Passed
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 128 (ScaleMsPerTile: 1)
ScaleGranularityN: 128 (ScaleNsPerTile: 1)
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0148361 ms
GFLOPS: 72373.6
StreamK GEMM kernel
Disposition: Passed
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 1 (ScaleMsPerTile: 128)
ScaleGranularityN: 1 (ScaleNsPerTile: 128)
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0185101 ms
GFLOPS: 58008.4
Disposition: Passed
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 1 (ScaleMsPerTile: 128)
ScaleGranularityN: 128 (ScaleNsPerTile: 1)
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0160838 ms
GFLOPS: 66759
Disposition: Passed
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 128 (ScaleMsPerTile: 1)
ScaleGranularityN: 1 (ScaleNsPerTile: 128)
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0169781 ms
GFLOPS: 63242.8
Disposition: Passed
Problem Size: 1024x512x1024x1
Tile shape (M, N, K): _128, _128, _128
ScaleGranularityM: 128 (ScaleMsPerTile: 1)
ScaleGranularityN: 128 (ScaleNsPerTile: 1)
Rasterization: Heuristic with a maximum CTA swizzle of 1
Avg runtime: 0.0156655 ms
GFLOPS: 68541.8
Originally posted by @LucasWilkinson in #2095 (comment)
Metadata
Metadata
Assignees
Labels
No labels