This repository is the official implementation of INT-FlashAttention.
flash_atten_*.pycontains the main code for different Flash Attention algorithm in Triton implementation.benchmark.pycontains the performance benchmark for different algorithm above.configs.pycontains the configs you may need to adjust for the Triton autotune.csrccontains ours algorithm implementation in cuda version. You should reference the official repository for Flash Attention to compile it.- More details can be found in the folders.