Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 46 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,52 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
## [1.1.0] - 2025-11-27
### Feature
Universal:
- Add op-level callback for local get/put [#13](https://github.com/taco-project/FlexKV/pull/13)
- Add support for distributed sharing of the KV Cache, to suppot KV Cache sharing between CPU and SSD, as well as distributed sharing of PCFS ([#17](https://github.com/taco-project/FlexKV/pull/17))
- Add GDS (GPU Direct Storage) Support ([#25](https://github.com/taco-project/FlexKV/pull/25))
- TP16 support ([#26](https://github.com/taco-project/FlexKV/pull/26))
- Support more kv cache layout. Now include: vLLM, SGLang, TensorRT-LM ([#27](https://github.com/taco-project/FlexKV/pull/27))
- GDS refactor & gtensor support ([#42](https://github.com/taco-project/FlexKV/pull/42))
- Support construct TensorSharedHandle directly from CUDA IPC Handle ([#44](https://github.com/taco-project/FlexKV/pull/44))


Targeting vllm:
- Support dp > 1 while integrated with vllm ([#18](https://github.com/taco-project/FlexKV/pull/18))
- Add launch scripts for vllm adaption ([#47](https://github.com/taco-project/FlexKV/pull/47))
- Support TP16 for vLLM+FlexKV ([#59](https://github.com/taco-project/FlexKV/pull/59))

Targeting TensorRT-LLM
- Support using FlexKV on TensorRT-LLM ([#48](https://github.com/taco-project/FlexKV/pull/48))
- Support TP16 for TensorRT-LLM+FlexKV ([#53](https://github.com/taco-project/FlexKV/pull/53))

### Optimization
- Mla d2h transfer optimization ([#19](https://github.com/taco-project/FlexKV/pull/19))
- optimize SSD I/O ([#33](https://github.com/taco-project/FlexKV/pull/33))
- Enhance cache eviction with frequency-aware grace time mechanism ([#38](https://github.com/taco-project/FlexKV/pull/38))
- Replace std::map with std::unordered_map in RadixTree ([#41](https://github.com/taco-project/FlexKV/pull/41))

### Bugfix
- Fix wrong head number for DeepSeek for vllm integration ([#23](https://github.com/taco-project/FlexKV/pull/23))
- Fix bug, if cpu match len is bigger than ssd when put, it will cause error ([#24](https://github.com/taco-project/FlexKV/pull/24))
- Fix benchmark_worker ([#31](https://github.com/taco-project/FlexKV/pull/31))
- Fix segfault caused by radix tree array out-of-bounds access ([#39](https://github.com/taco-project/FlexKV/pull/39))
- Fix cache_info ([#40](https://github.com/taco-project/FlexKV/pull/40))
- Fix port for GPU registration ([#45](https://github.com/taco-project/FlexKV/pull/45))
- Fix SSD allocator ([#46](https://github.com/taco-project/FlexKV/pull/46))
- Fix vllm init num_kv_heads bug ([#67](https://github.com/taco-project/FlexKV/pull/67))
- Fix model_config for non-MLA models ([#68](https://github.com/taco-project/FlexKV/pull/68))

### Misc
- Add doc for:
FlexKV + Dynamo ([#14](https://github.com/taco-project/FlexKV/pull/14)),
flexkv_config.json ([#15](https://github.com/taco-project/FlexKV/pull/15)),
FlexKV + TensorRT-LLM ([#52](https://github.com/taco-project/FlexKV/pull/52))
- For config: Simplify user configuration ([#37](https://github.com/taco-project/FlexKV/pull/37)), and other slight update ([#43](https://github.com/taco-project/FlexKV/pull/43))



## [1.0.0] - 2025-09-11

Expand Down
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,36 @@ FlexKV is a distributed KV store and multi-level cache management system develop

FlexKV is released under the **Apache-2.0 License**. See the [LICENSE](LICENSE) file for details.


## Main Change for latest version (1.1.0)
### Feature
Universal:
- Add op-level callback for local get/put [#13](https://github.com/taco-project/FlexKV/pull/13)
- Add support for distributed sharing of the KV Cache, to suppot KV Cache sharing between CPU and SSD, as well as distributed sharing of PCFS ([#17](https://github.com/taco-project/FlexKV/pull/17))
- Add GDS (GPU Direct Storage) Support ([#25](https://github.com/taco-project/FlexKV/pull/25))
- TP16 support ([#26](https://github.com/taco-project/FlexKV/pull/26))
- Support more kv cache layout. Now include: vLLM, SGLang, TensorRT-LM ([#27](https://github.com/taco-project/FlexKV/pull/27))
- GDS refactor & gtensor support ([#42](https://github.com/taco-project/FlexKV/pull/42))
- Support construct TensorSharedHandle directly from CUDA IPC Handle ([#44](https://github.com/taco-project/FlexKV/pull/44))


Targeting vllm:
- Support dp > 1 while integrated with vllm ([#18](https://github.com/taco-project/FlexKV/pull/18))
- Add launch scripts for vllm adaption ([#47](https://github.com/taco-project/FlexKV/pull/47))
- Support TP16 for vLLM+FlexKV ([#59](https://github.com/taco-project/FlexKV/pull/59))

Targeting TensorRT-LLM
- Support using FlexKV on TensorRT-LLM ([#48](https://github.com/taco-project/FlexKV/pull/48))
- Support TP16 for TensorRT-LLM+FlexKV ([#53](https://github.com/taco-project/FlexKV/pull/53))

### Optimization
- Mla d2h transfer optimization ([#19](https://github.com/taco-project/FlexKV/pull/19))
- optimize SSD I/O ([#33](https://github.com/taco-project/FlexKV/pull/33))
- Enhance cache eviction with frequency-aware grace time mechanism ([#38](https://github.com/taco-project/FlexKV/pull/38))
- Replace std::map with std::unordered_map in RadixTree ([#41](https://github.com/taco-project/FlexKV/pull/41))

For more details, see [CHANGELOG](CHANGELOG.md)

## How to Use

### Install Dependencies
Expand Down
54 changes: 54 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,35 @@ FlexKV是腾讯云TACO团队和社区合作开发推出的面向超大规模 LLM

FlexKV 采用 **Apache-2.0 开源协议**,详细信息请参见 [LICENSE](LICENSE) 文件。

## 最新版本主要变更 (1.1.0)
### 功能
通用功能:
- 添加本地 get/put 的操作级回调 [#13](https://github.com/taco-project/FlexKV/pull/13)
- 添加分布式 KV Cache 共享支持,支持 CPU 和 SSD 之间的 KV Cache 共享,以及 PCFS 的分布式共享 ([#17](https://github.com/taco-project/FlexKV/pull/17))
- 添加 GDS (GPU Direct Storage) 支持 ([#25](https://github.com/taco-project/FlexKV/pull/25))
- TP16 支持 ([#26](https://github.com/taco-project/FlexKV/pull/26))
- 支持更多 kv cache 布局。现在包括:vLLM、SGLang、TensorRT-LM ([#27](https://github.com/taco-project/FlexKV/pull/27))
- GDS 重构和 gtensor 支持 ([#42](https://github.com/taco-project/FlexKV/pull/42))
- 支持直接从 CUDA IPC Handle 构造 TensorSharedHandle ([#44](https://github.com/taco-project/FlexKV/pull/44))


针对 vllm:
- 在 vllm 集成中支持 dp > 1 ([#18](https://github.com/taco-project/FlexKV/pull/18))
- 添加 vllm 适配的启动脚本 ([#47](https://github.com/taco-project/FlexKV/pull/47))
- 支持 vLLM+FlexKV 的 TP16 ([#59](https://github.com/taco-project/FlexKV/pull/59))

针对 TensorRT-LLM
- 在 TensorRT-LLM 上支持使用 FlexKV ([#48](https://github.com/taco-project/FlexKV/pull/48))
- 支持 TensorRT-LLM+FlexKV 的 TP16 ([#53](https://github.com/taco-project/FlexKV/pull/53))

### 优化
- MLA d2h 传输优化 ([#19](https://github.com/taco-project/FlexKV/pull/19))
- 优化 SSD I/O ([#33](https://github.com/taco-project/FlexKV/pull/33))
- 增强缓存淘汰机制,引入频率感知的宽限时间 ([#38](https://github.com/taco-project/FlexKV/pull/38))
- 在 RadixTree 中使用 std::unordered_map 替代 std::map ([#41](https://github.com/taco-project/FlexKV/pull/41))

更多详细信息,请参阅 [CHANGELOG](CHANGELOG.md)

## 如何使用

### 安装依赖
Expand Down Expand Up @@ -98,3 +127,28 @@ FlexKV 在处理 *get* 请求时:
- **加速框架支持**:对 vLLM、SGLang 等主流推理框架的适配将陆续发布
- **分布式查询支持**:实现可扩展的分布式 KVCache 查询能力
- **延迟优化**:通过预取、压缩等手段进一步降低 *get* 请求延迟

## 更新日志

本项目遵循 [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) 格式,并采用 [语义化版本](https://semver.org/spec/v2.0.0.html)。

### [Unreleased]

### [1.0.0] - 2025-09-11

#### 新增功能
- C++ radix tree 用于快速匹配,需要在 cache_config 中设置 "index_accel": true
- 同步内核启动
- 重大变更:将缓存引擎改为库形式供加速器(如 vLLM)使用,替代原有的服务端-客户端模式。当没有匹配的 KVCache 时,这可以加速 get 和 put 操作。此版本包含破坏性 API 变更,不向后兼容。
- 添加 evict_ratio 参数,需要在 cache_config 中设置 "evict_ratio": 0.05
- 减少内核启动内部的 bubble
- 添加 vLLM 0.10.1.1 适配器

#### 修复
- cython 发布包

### [0.1.0] - 2025-08-29

#### 初始化
- 初始版本
- 添加许可证