This project is about convolution operator optimization on GPU
- Cuda core Implicit GEMM forward
- Cuda core Implicit GEMM backward
- CuTe Implicit GEMM
This blog provides a detailed introduction to the optimization steps.
/cuda Implementation on GPU
/implicitgemm implicit gemm convolution implementation
/implicitgemmbwd implicit gemm convolution backward implementation
/cudnn cuDNN test on GPU
/cute Using CuTe implement convolution
$ cd cuda/implicitgemm
$ bash implgemm.shIf you want to change the version of program, just change TARGET in Makefile
There is verification code in main.cu, which was annotated due to slow running.
// printf("===================start verfiy===================\n");
// direct_conv2dcpu(input, weight, output, n, c, h, w, k, r, s, u, v, p, q);
// int error = 0;
// for (int i = 0; i < n * k * outh * outw; i++)
// {
// if (abs(output_host[i] - output[i]) > getPrecision(output[i]))
// {
// printf("error, postion:%d, gpuvalue:%f, cpuvalue:%f\n", i, output_host[i], output[i]);
// error++;
// break;
// }
// }
// printf("================finish,error:%d=========================\n", error);If you need to verify the result, just unannotate the above code to verify the correctness of the results.
- Triton Implicit GEMM
- Tensor core Implicit GEMM
- Winograd-based convolution