diff --git a/README.md b/README.md index 9984b93257..67b33d947a 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ This is a fork of https://github.com/Dao-AILab/flash-attention customized for vL We have the following customizations: -- Build: Cmake, torch library. +- Build: Cmake, torch library (this package is bundled into vLLM). - Size: reduced templating and removal of (training) kernels - Features: Small page size support (FA2), DCP support (FA3) - Performance: Some decode specific optimizations for sizes we care about; as well as mixed batch performance optimizations. Upstream is hesitant on specializing for inference.