Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [1.0.0] - 2025-09-11

### Added
- C++ radix tree for fast match, need set "index_accel": true in cache_config
- sync kernel launch
- a huge change that move cache engine to a library for accelerator(vLLM e.g.) to use instead of server-client mode.
This accelerate the get and put when no KVCache is matched. This version includes breaking API changes and is not backward compatible.
- add evict_ratio, need set "evict_ratio": 0.05 in cache_config
- reducing the bubble inner the launch kernel
- add vLLM 0.10.1.1 adapter

### Fixed
- cython release package


## [0.1.0] - 2025-08-29

### Init
- init version
- add license

1 change: 1 addition & 0 deletions VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1.0.0
12 changes: 6 additions & 6 deletions docs/vllm_adapter/README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ In commit [`0290841dce65ae9b036a23d733cf94e47e814934`](https://github.com/taco-p

This change involves significant API adjustments. Therefore, please note:

- **Version >= `0.0.2`**: Use the **current version API**; the vLLM patch is located in `examples/vllm_adaption/`.
- **Version == `0.0.1`**: Supports the **legacy version API**; the vLLM patch is located in `examples/vllm_adaption_legacy/`.
- **Version >= `1.0.0`**: Use the **current version API**; the vLLM patch is located in `examples/vllm_adaption/`.
- **Version == `0.1.0`**: Supports the **legacy version API**; the vLLM patch is located in `examples/vllm_adaption_legacy/`.

---

## Current Version (>= 0.0.2)
## Current Version (>= 1.0.0)

### Supported Versions
- FlexKV >= `0.0.2`
- FlexKV >= `1.0.0`
- vLLM versions >= `0.8.5` can generally follow this version for adaptation

### Example
Expand Down Expand Up @@ -63,10 +63,10 @@ VLLM_USE_V1=1 python -m vllm.entrypoints.cli.main serve Qwen3/Qwen3-32B \

```

## Legacy Version (<= 0.0.1) – Not Recommended for Current Use
## Legacy Version (<= 0.1.0) – Not Recommended for Current Use

### Supported Versions
- FlexKV <= `0.0.1`
- FlexKV <= `0.1.0`

### Example
Apply the patch `examples/vllm_adaption_legacy/flexkv_vllm_0_8_4.patch` to vLLM 0.8.4, then start FlexKV, vLLM, and the benchmark script:
Expand Down
12 changes: 6 additions & 6 deletions docs/vllm_adapter/README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@
**FlexKV 从 client-server 模式,变为推理加速引擎(如 vLLM)可直接调用的库函数**,以减少进程间消息传递的开销。
这一变更引发了较大的 API 调整。因此,请注意:

- **版本 >= `0.0.2`**:应使用 **当前版本 API**,vLLM patch位于 `examples/vllm_adaption/`。
- **版本 == `0.0.1`**:仅支持 **Legacy 版本 API**, vLLM patch位于`examples/vllm_adaption_legacy/`。
- **版本 >= `1.0.0`**:应使用 **当前版本 API**,vLLM patch位于 `examples/vllm_adaption/`。
- **版本 == `0.1.0`**:仅支持 **Legacy 版本 API**, vLLM patch位于`examples/vllm_adaption_legacy/`。

---

## 当前版本(>= 0.0.2
## 当前版本(>= 1.0.0

### 适用版本
- FlexKV >= `0.0.2`
- FlexKV >= `1.0.0`
- vLLM 原则上>= `0.8.5`版本均可参考示例代码进行修改

### 示例
Expand Down Expand Up @@ -62,10 +62,10 @@ VLLM_USE_V1=1 python -m vllm.entrypoints.cli.main serve Qwen3/Qwen3-32B \

```

## Legacy版本(<= 0.0.1),目前的版本尽量不要使用
## Legacy版本(<= 0.1.0),目前的版本尽量不要使用

### 适用版本
- FlexKV <= `0.0.1`
- FlexKV <= `0.1.0`

### 示例
在 vLLM 0.8.4 版本中应用patch `examples/vllm_adaption_legacy/flexkv_vllm_0_8_4.patch`,分别启动 FlexKV、vLLM 和测试脚本:
Expand Down
5 changes: 4 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
from setuptools.command.build_ext import build_ext
from torch.utils import cpp_extension

def get_version():
with open(os.path.join(os.path.dirname(__file__), "VERSION")) as f:
return f.read().strip()

build_dir = "build"
os.makedirs(build_dir, exist_ok=True)
Expand Down Expand Up @@ -130,7 +133,7 @@ def copy_shared_libraries(self):
setup(
name="flexkv",
description="A global KV-Cache manager for LLM inference",
version="0.1.0",
version=get_version(),
packages=find_packages(exclude=("benchmarks", "csrc", "examples", "tests")),
package_data={
"flexkv": ["*.so", "lib/*.so", "lib/*.so.*"],
Expand Down