Skip to content

H100 groupwise scale #2153

@SunNy820828449

Description

@SunNy820828449
          H100

This PR:

Basic split-K GEMM kernel
Running: 
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 1 (ScaleMsPerTile: 128)
  ScaleGranularityN: 1 (ScaleNsPerTile: 128)
  Disposition: Passed
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0159489 ms
  GFLOPS: 67323.9

Running: 
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 1 (ScaleMsPerTile: 128)
  ScaleGranularityN: 128 (ScaleNsPerTile: 1)
  Disposition: Passed
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0147137 ms
  GFLOPS: 72975.5

Running: 
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 128 (ScaleMsPerTile: 1)
  ScaleGranularityN: 1 (ScaleNsPerTile: 128)
  Disposition: Passed
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.015305 ms
  GFLOPS: 70156.5

Running: 
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 128 (ScaleMsPerTile: 1)
  ScaleGranularityN: 128 (ScaleNsPerTile: 1)
  Disposition: Passed
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.014349 ms
  GFLOPS: 74830.6


StreamK GEMM kernel
Running: 
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 1 (ScaleMsPerTile: 128)
  ScaleGranularityN: 1 (ScaleNsPerTile: 128)
  Disposition: Passed
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0176837 ms
  GFLOPS: 60719.1

Running: 
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 1 (ScaleMsPerTile: 128)
  ScaleGranularityN: 128 (ScaleNsPerTile: 1)
  Disposition: Passed
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0159611 ms
  GFLOPS: 67272.3

Running: 
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 128 (ScaleMsPerTile: 1)
  ScaleGranularityN: 1 (ScaleNsPerTile: 128)
  Disposition: Passed
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0170682 ms
  GFLOPS: 62909.1

Running: 
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 128 (ScaleMsPerTile: 1)
  ScaleGranularityN: 128 (ScaleNsPerTile: 1)
  Disposition: Passed
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0157621 ms
  GFLOPS: 68121.7

Main:


Basic split-K GEMM kernel
  Disposition: Passed
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 1 (ScaleMsPerTile: 128)
  ScaleGranularityN: 1 (ScaleNsPerTile: 128)
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0177991 ms
  GFLOPS: 60325.7

  Disposition: Passed
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 1 (ScaleMsPerTile: 128)
  ScaleGranularityN: 128 (ScaleNsPerTile: 1)
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0151617 ms
  GFLOPS: 70819.4

  Disposition: Passed
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 128 (ScaleMsPerTile: 1)
  ScaleGranularityN: 1 (ScaleNsPerTile: 128)
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0158391 ms
  GFLOPS: 67790.6

  Disposition: Passed
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 128 (ScaleMsPerTile: 1)
  ScaleGranularityN: 128 (ScaleNsPerTile: 1)
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0148361 ms
  GFLOPS: 72373.6


StreamK GEMM kernel
  Disposition: Passed
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 1 (ScaleMsPerTile: 128)
  ScaleGranularityN: 1 (ScaleNsPerTile: 128)
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0185101 ms
  GFLOPS: 58008.4

  Disposition: Passed
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 1 (ScaleMsPerTile: 128)
  ScaleGranularityN: 128 (ScaleNsPerTile: 1)
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0160838 ms
  GFLOPS: 66759

  Disposition: Passed
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 128 (ScaleMsPerTile: 1)
  ScaleGranularityN: 1 (ScaleNsPerTile: 128)
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0169781 ms
  GFLOPS: 63242.8

  Disposition: Passed
  Problem Size: 1024x512x1024x1
  Tile shape (M, N, K): _128, _128, _128
  ScaleGranularityM: 128 (ScaleMsPerTile: 1)
  ScaleGranularityN: 128 (ScaleNsPerTile: 1)
  Rasterization: Heuristic with a maximum CTA swizzle of 1
  Avg runtime: 0.0156655 ms
  GFLOPS: 68541.8

Originally posted by @LucasWilkinson in #2095 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions