vllm-project · youkaichao · Oct 7, 2025 · Oct 7, 2025
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@ This is a fork of https://github.com/Dao-AILab/flash-attention customized for vL
 
 We have the following customizations:
 
-- Build: Cmake, torch library.
+- Build: Cmake, torch library (this package is bundled into vLLM).
 - Size: reduced templating and removal of (training) kernels
 - Features: Small page size support (FA2), DCP support (FA3)
 - Performance: Some decode specific optimizations for sizes we care about; as well as mixed batch performance optimizations. Upstream is hesitant on specializing for inference.