diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000000..0fe668086a --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,30 @@ +# Changelog + +All notable changes to this project will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## [Unreleased] + +## [1.0.0] - 2025-09-11 + +### Added +- C++ radix tree for fast match, need set "index_accel": true in cache_config +- sync kernel launch +- a huge change that move cache engine to a library for accelerator(vLLM e.g.) to use instead of server-client mode. + This accelerate the get and put when no KVCache is matched. This version includes breaking API changes and is not backward compatible. +- add evict_ratio, need set "evict_ratio": 0.05 in cache_config +- reducing the bubble inner the launch kernel +- add vLLM 0.10.1.1 adapter + +### Fixed +- cython release package + + +## [0.1.0] - 2025-08-29 + +### Init +- init version +- add license + diff --git a/VERSION b/VERSION new file mode 100644 index 0000000000..3eefcb9dd5 --- /dev/null +++ b/VERSION @@ -0,0 +1 @@ +1.0.0 diff --git a/docs/vllm_adapter/README_en.md b/docs/vllm_adapter/README_en.md index 79a38f62ab..781b3ad3ee 100644 --- a/docs/vllm_adapter/README_en.md +++ b/docs/vllm_adapter/README_en.md @@ -6,15 +6,15 @@ In commit [`0290841dce65ae9b036a23d733cf94e47e814934`](https://github.com/taco-p This change involves significant API adjustments. Therefore, please note: -- **Version >= `0.0.2`**: Use the **current version API**; the vLLM patch is located in `examples/vllm_adaption/`. -- **Version == `0.0.1`**: Supports the **legacy version API**; the vLLM patch is located in `examples/vllm_adaption_legacy/`. +- **Version >= `1.0.0`**: Use the **current version API**; the vLLM patch is located in `examples/vllm_adaption/`. +- **Version == `0.1.0`**: Supports the **legacy version API**; the vLLM patch is located in `examples/vllm_adaption_legacy/`. --- -## Current Version (>= 0.0.2) +## Current Version (>= 1.0.0) ### Supported Versions -- FlexKV >= `0.0.2` +- FlexKV >= `1.0.0` - vLLM versions >= `0.8.5` can generally follow this version for adaptation ### Example @@ -63,10 +63,10 @@ VLLM_USE_V1=1 python -m vllm.entrypoints.cli.main serve Qwen3/Qwen3-32B \ ``` -## Legacy Version (<= 0.0.1) – Not Recommended for Current Use +## Legacy Version (<= 0.1.0) – Not Recommended for Current Use ### Supported Versions -- FlexKV <= `0.0.1` +- FlexKV <= `0.1.0` ### Example Apply the patch `examples/vllm_adaption_legacy/flexkv_vllm_0_8_4.patch` to vLLM 0.8.4, then start FlexKV, vLLM, and the benchmark script: diff --git a/docs/vllm_adapter/README_zh.md b/docs/vllm_adapter/README_zh.md index 81e291b5cc..0e7ce7687e 100644 --- a/docs/vllm_adapter/README_zh.md +++ b/docs/vllm_adapter/README_zh.md @@ -5,15 +5,15 @@ **FlexKV 从 client-server 模式,变为推理加速引擎(如 vLLM)可直接调用的库函数**,以减少进程间消息传递的开销。 这一变更引发了较大的 API 调整。因此,请注意: -- **版本 >= `0.0.2`**:应使用 **当前版本 API**,vLLM patch位于 `examples/vllm_adaption/`。 -- **版本 == `0.0.1`**:仅支持 **Legacy 版本 API**, vLLM patch位于`examples/vllm_adaption_legacy/`。 +- **版本 >= `1.0.0`**:应使用 **当前版本 API**,vLLM patch位于 `examples/vllm_adaption/`。 +- **版本 == `0.1.0`**:仅支持 **Legacy 版本 API**, vLLM patch位于`examples/vllm_adaption_legacy/`。 --- -## 当前版本(>= 0.0.2) +## 当前版本(>= 1.0.0) ### 适用版本 -- FlexKV >= `0.0.2` +- FlexKV >= `1.0.0` - vLLM 原则上>= `0.8.5`版本均可参考示例代码进行修改 ### 示例 @@ -62,10 +62,10 @@ VLLM_USE_V1=1 python -m vllm.entrypoints.cli.main serve Qwen3/Qwen3-32B \ ``` -## Legacy版本(<= 0.0.1),目前的版本尽量不要使用 +## Legacy版本(<= 0.1.0),目前的版本尽量不要使用 ### 适用版本 -- FlexKV <= `0.0.1` +- FlexKV <= `0.1.0` ### 示例 在 vLLM 0.8.4 版本中应用patch `examples/vllm_adaption_legacy/flexkv_vllm_0_8_4.patch`,分别启动 FlexKV、vLLM 和测试脚本: diff --git a/setup.py b/setup.py index 078650644e..fcd0f97a34 100755 --- a/setup.py +++ b/setup.py @@ -7,6 +7,9 @@ from setuptools.command.build_ext import build_ext from torch.utils import cpp_extension +def get_version(): + with open(os.path.join(os.path.dirname(__file__), "VERSION")) as f: + return f.read().strip() build_dir = "build" os.makedirs(build_dir, exist_ok=True) @@ -130,7 +133,7 @@ def copy_shared_libraries(self): setup( name="flexkv", description="A global KV-Cache manager for LLM inference", - version="0.1.0", + version=get_version(), packages=find_packages(exclude=("benchmarks", "csrc", "examples", "tests")), package_data={ "flexkv": ["*.so", "lib/*.so", "lib/*.so.*"],