Online softmax https://arxiv.org/abs/1805.02867
flash attention https://arxiv.org/abs/2307.08691
Llama2 https://arxiv.org/abs/2307.09288
attention with linear-bias https://arxiv.org/abs/2108.12409
KV cache/ MQA https://arxiv.org/abs/2211.05102
GQA coming soon ...