This repository contains a implementations of some common GPU kernels in CUDA C++.
Compile the kernels using
mkdir build
makeThen run the kernels, for instance using
./build/scanThis have only been tested on nvcc 12.9 on an L4 (compute capability 8.9).
The kernels can be profiled using Nsight Compute or Nsight Systems:
# compute
make ncu
# systems
make nsys