taco-project · peaceforeverCN · Nov 27, 2025 · Nov 27, 2025 · Nov 27, 2025 · Nov 27, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,7 +5,52 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [Unreleased]
+## [1.1.0] - 2025-11-27
+### Feature
+Universal:
+- Add op-level callback for local get/put [#13](https://github.com/taco-project/FlexKV/pull/13)
+- Add support for distributed sharing of the KV Cache, to suppot KV Cache sharing between CPU and SSD, as well as distributed sharing of PCFS  ([#17](https://github.com/taco-project/FlexKV/pull/17))
+- Add GDS (GPU Direct Storage) Support ([#25](https://github.com/taco-project/FlexKV/pull/25))
+- TP16 support ([#26](https://github.com/taco-project/FlexKV/pull/26))
+- Support more kv cache layout. Now include: vLLM, SGLang, TensorRT-LM ([#27](https://github.com/taco-project/FlexKV/pull/27))
+- GDS refactor & gtensor support ([#42](https://github.com/taco-project/FlexKV/pull/42))
+- Support construct TensorSharedHandle directly from CUDA IPC Handle ([#44](https://github.com/taco-project/FlexKV/pull/44))
+
+
+Targeting vllm: 
+- Support dp > 1 while integrated with vllm ([#18](https://github.com/taco-project/FlexKV/pull/18))
+- Add launch scripts for vllm adaption ([#47](https://github.com/taco-project/FlexKV/pull/47))
+- Support TP16 for vLLM+FlexKV ([#59](https://github.com/taco-project/FlexKV/pull/59))
+
+Targeting TensorRT-LLM 
+- Support using FlexKV on TensorRT-LLM ([#48](https://github.com/taco-project/FlexKV/pull/48))
+- Support TP16 for TensorRT-LLM+FlexKV ([#53](https://github.com/taco-project/FlexKV/pull/53))
+
+### Optimization
+- Mla d2h transfer optimization ([#19](https://github.com/taco-project/FlexKV/pull/19))
+- optimize SSD I/O ([#33](https://github.com/taco-project/FlexKV/pull/33))
+- Enhance cache eviction with frequency-aware grace time mechanism ([#38](https://github.com/taco-project/FlexKV/pull/38))
+- Replace std::map with std::unordered_map in RadixTree ([#41](https://github.com/taco-project/FlexKV/pull/41))
+
+### Bugfix
+- Fix wrong head number for DeepSeek for vllm integration ([#23](https://github.com/taco-project/FlexKV/pull/23))
+- Fix bug, if cpu match len is bigger than ssd when put, it will cause error ([#24](https://github.com/taco-project/FlexKV/pull/24))
+- Fix benchmark_worker ([#31](https://github.com/taco-project/FlexKV/pull/31))
+- Fix segfault caused by radix tree array out-of-bounds access ([#39](https://github.com/taco-project/FlexKV/pull/39))
+- Fix cache_info ([#40](https://github.com/taco-project/FlexKV/pull/40))
+- Fix port for GPU registration ([#45](https://github.com/taco-project/FlexKV/pull/45))
+- Fix SSD allocator ([#46](https://github.com/taco-project/FlexKV/pull/46))
+- Fix vllm init num_kv_heads bug ([#67](https://github.com/taco-project/FlexKV/pull/67))
+- Fix model_config for non-MLA models ([#68](https://github.com/taco-project/FlexKV/pull/68))
+
+### Misc
+- Add doc for: 
+  FlexKV + Dynamo ([#14](https://github.com/taco-project/FlexKV/pull/14)), 
+  flexkv_config.json ([#15](https://github.com/taco-project/FlexKV/pull/15)),
+  FlexKV + TensorRT-LLM ([#52](https://github.com/taco-project/FlexKV/pull/52))
+- For config: Simplify user configuration ([#37](https://github.com/taco-project/FlexKV/pull/37)), and other slight update ([#43](https://github.com/taco-project/FlexKV/pull/43))
+
+
 
 ## [1.0.0] - 2025-09-11
 

diff --git a/README.md b/README.md
@@ -6,6 +6,36 @@ FlexKV is a distributed KV store and multi-level cache management system develop
 
 FlexKV is released under the **Apache-2.0 License**. See the [LICENSE](LICENSE) file for details.
 
+
+## Main Change for latest version (1.1.0)
+### Feature
+Universal:
+- Add op-level callback for local get/put [#13](https://github.com/taco-project/FlexKV/pull/13)
+- Add support for distributed sharing of the KV Cache, to suppot KV Cache sharing between CPU and SSD, as well as distributed sharing of PCFS  ([#17](https://github.com/taco-project/FlexKV/pull/17))
+- Add GDS (GPU Direct Storage) Support ([#25](https://github.com/taco-project/FlexKV/pull/25))
+- TP16 support ([#26](https://github.com/taco-project/FlexKV/pull/26))
+- Support more kv cache layout. Now include: vLLM, SGLang, TensorRT-LM ([#27](https://github.com/taco-project/FlexKV/pull/27))
+- GDS refactor & gtensor support ([#42](https://github.com/taco-project/FlexKV/pull/42))
+- Support construct TensorSharedHandle directly from CUDA IPC Handle ([#44](https://github.com/taco-project/FlexKV/pull/44))
+
+
+Targeting vllm: 
+- Support dp > 1 while integrated with vllm ([#18](https://github.com/taco-project/FlexKV/pull/18))
+- Add launch scripts for vllm adaption ([#47](https://github.com/taco-project/FlexKV/pull/47))
+- Support TP16 for vLLM+FlexKV ([#59](https://github.com/taco-project/FlexKV/pull/59))
+
+Targeting TensorRT-LLM 
+- Support using FlexKV on TensorRT-LLM ([#48](https://github.com/taco-project/FlexKV/pull/48))
+- Support TP16 for TensorRT-LLM+FlexKV ([#53](https://github.com/taco-project/FlexKV/pull/53))
+
+### Optimization
+- Mla d2h transfer optimization ([#19](https://github.com/taco-project/FlexKV/pull/19))
+- optimize SSD I/O ([#33](https://github.com/taco-project/FlexKV/pull/33))
+- Enhance cache eviction with frequency-aware grace time mechanism ([#38](https://github.com/taco-project/FlexKV/pull/38))
+- Replace std::map with std::unordered_map in RadixTree ([#41](https://github.com/taco-project/FlexKV/pull/41))
+
+For more details, see [CHANGELOG](CHANGELOG.md)
+
 ## How to Use
 
 ### Install Dependencies

diff --git a/README_zh.md b/README_zh.md
@@ -6,6 +6,35 @@ FlexKV是腾讯云TACO团队和社区合作开发推出的面向超大规模 LLM
 
 FlexKV 采用 **Apache-2.0 开源协议**，详细信息请参见 [LICENSE](LICENSE) 文件。
 
+## 最新版本主要变更 (1.1.0)
+### 功能
+通用功能:
+- 添加本地 get/put 的操作级回调 [#13](https://github.com/taco-project/FlexKV/pull/13)
+- 添加分布式 KV Cache 共享支持，支持 CPU 和 SSD 之间的 KV Cache 共享，以及 PCFS 的分布式共享 ([#17](https://github.com/taco-project/FlexKV/pull/17))
+- 添加 GDS (GPU Direct Storage) 支持 ([#25](https://github.com/taco-project/FlexKV/pull/25))
+- TP16 支持 ([#26](https://github.com/taco-project/FlexKV/pull/26))
+- 支持更多 kv cache 布局。现在包括：vLLM、SGLang、TensorRT-LM ([#27](https://github.com/taco-project/FlexKV/pull/27))
+- GDS 重构和 gtensor 支持 ([#42](https://github.com/taco-project/FlexKV/pull/42))
+- 支持直接从 CUDA IPC Handle 构造 TensorSharedHandle ([#44](https://github.com/taco-project/FlexKV/pull/44))
+
+
+针对 vllm: 
+- 在 vllm 集成中支持 dp > 1 ([#18](https://github.com/taco-project/FlexKV/pull/18))
+- 添加 vllm 适配的启动脚本 ([#47](https://github.com/taco-project/FlexKV/pull/47))
+- 支持 vLLM+FlexKV 的 TP16 ([#59](https://github.com/taco-project/FlexKV/pull/59))
+
+针对 TensorRT-LLM 
+- 在 TensorRT-LLM 上支持使用 FlexKV ([#48](https://github.com/taco-project/FlexKV/pull/48))
+- 支持 TensorRT-LLM+FlexKV 的 TP16 ([#53](https://github.com/taco-project/FlexKV/pull/53))
+
+### 优化
+- MLA d2h 传输优化 ([#19](https://github.com/taco-project/FlexKV/pull/19))
+- 优化 SSD I/O ([#33](https://github.com/taco-project/FlexKV/pull/33))
+- 增强缓存淘汰机制，引入频率感知的宽限时间 ([#38](https://github.com/taco-project/FlexKV/pull/38))
+- 在 RadixTree 中使用 std::unordered_map 替代 std::map ([#41](https://github.com/taco-project/FlexKV/pull/41))
+
+更多详细信息，请参阅 [CHANGELOG](CHANGELOG.md)
+
 ## 如何使用
 
 ### 安装依赖
@@ -98,3 +127,28 @@ FlexKV 在处理 *get* 请求时：
 - **加速框架支持**：对 vLLM、SGLang 等主流推理框架的适配将陆续发布
 - **分布式查询支持**：实现可扩展的分布式 KVCache 查询能力
 - **延迟优化**：通过预取、压缩等手段进一步降低 *get* 请求延迟
+
+## 更新日志
+
+本项目遵循 [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) 格式，并采用 [语义化版本](https://semver.org/spec/v2.0.0.html)。
+
+### [Unreleased]
+
+### [1.0.0] - 2025-09-11
+
+#### 新增功能
+- C++ radix tree 用于快速匹配，需要在 cache_config 中设置 "index_accel": true
+- 同步内核启动
+- 重大变更：将缓存引擎改为库形式供加速器（如 vLLM）使用，替代原有的服务端-客户端模式。当没有匹配的 KVCache 时，这可以加速 get 和 put 操作。此版本包含破坏性 API 变更，不向后兼容。
+- 添加 evict_ratio 参数，需要在 cache_config 中设置 "evict_ratio": 0.05
+- 减少内核启动内部的 bubble
+- 添加 vLLM 0.10.1.1 适配器
+
+#### 修复
+- cython 发布包
+
+### [0.1.0] - 2025-08-29
+
+#### 初始化
+- 初始版本
+- 添加许可证