Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
c2a26fa
add vllm 0.10.0 support
linhu-nv Sep 1, 2025
b8a9ff0
radix tree c++ impl (#70)
charliecgxu Aug 19, 2025
ef0720b
sync kernel launch
Luis-xu Aug 20, 2025
0290841
kvmanager refactor (#73)
zhuofan1123 Aug 27, 2025
875a99a
feat: add support release wheel (#77)
Aug 22, 2025
22fc69e
update unit tests for new version (#79)
zhuofan1123 Aug 22, 2025
42247c2
enable profile in release build
charliecgxu Aug 22, 2025
502e9aa
rename functions
zhuofan1123 Aug 22, 2025
ee91ca7
add evict_ratio in cache config, default is 0
peaceforeverCN Aug 21, 2025
d5ecff6
update benchmark worker (#82)
zhuofan1123 Aug 25, 2025
419156e
fix broken cpp radix tree support for cache engine (#84)
charliecgxu Aug 25, 2025
3581cda
ci: trigger on main and dev
Aug 25, 2025
0857111
fix direct io
zhuofan1123 Aug 26, 2025
1007125
quickfix for return type of reduce_tensor
linhu-nv Aug 27, 2025
32e7905
fix bug
gz944367214 Aug 28, 2025
f082e43
fix status bug
zhuofan1123 Aug 29, 2025
98902c7
Using ring buffer in transfer engine to manage the src and dst block …
Luis-xu Aug 27, 2025
c1f70fd
refine ring_buffer and apply it to all workers
Luis-xu Aug 28, 2025
0ca6304
rename PinnedMemoryRing to SharedMemoryRing
Luis-xu Aug 28, 2025
277b6a3
allow to exceed the max_block_num
zhuofan1123 Aug 29, 2025
d1aff96
refactor: use hash to allocate buffer && no wait for free slot
zhuofan1123 Aug 29, 2025
fa50901
allow different tp ranks have different num_gpu_blocks
linhu-nv Sep 2, 2025
82c6e2f
fix
linhu-nv Sep 3, 2025
828d36f
create arrays of gpu_block infos in c++ to avoid invalid ptrs
linhu-nv Sep 3, 2025
f27ef18
vllm v0.10.1.1 adapter
gz944367214 Sep 3, 2025
a77191c
fix bug
zhuofan1123 Sep 4, 2025
1e50623
server-client mode works now (#92)
linhu-nv Sep 5, 2025
38c83ce
[docs] change vllm adapter README
peaceforeverCN Sep 5, 2025
f17d6a8
[docs] add stable branch introduce
peaceforeverCN Sep 5, 2025
b80fe94
[doc] add CONTRRIBUTING.md
peaceforeverCN Sep 5, 2025
0922651
Merge pull request #3 from taco-project/docs/move_vllm_patch
linhu-nv Sep 5, 2025
ca070af
[bugfix] fix incorrect num_tokens_to_get/put
zhuofan1123 Sep 8, 2025
8194c8c
Merge pull request #4 from zhuofan1123/zfl/dev
peaceforeverCN Sep 8, 2025
d79e4c1
[bugfix] fix incorrect num_tokens_to_get && format code
zhuofan1123 Sep 8, 2025
96591df
[bugfix] fix build issue
zhuofan1123 Sep 8, 2025
5dc9f4f
Merge pull request #5 from zhuofan1123/zfl/dev
linhu-nv Sep 8, 2025
c05db2c
fix vllm connector bug
gz944367214 Sep 9, 2025
a03978d
further_fix
gz944367214 Sep 9, 2025
15de1ce
modify default config
zhuofan1123 Sep 10, 2025
31ab4cf
update vllm patch
zhuofan1123 Sep 10, 2025
8cd535c
Merge pull request #6 from peaceforeverCN/dev
linhu-nv Sep 10, 2025
3371934
init xx_kv_layout_type from str
gz944367214 Sep 10, 2025
9c2b208
Merge pull request #8 from peaceforeverCN/zuogan/dev
peaceforeverCN Sep 11, 2025
33c0901
[doc] add version
peaceforeverCN Sep 11, 2025
3ef5869
Merge pull request #9 from taco-project/version_control
peaceforeverCN Sep 11, 2025
7cf7118
Merge pull request #10 from taco-project/dev
peaceforeverCN Sep 11, 2025
339bb73
Merge pull request #11 from taco-project/main
peaceforeverCN Sep 11, 2025
abc3468
ADD: dynamo+flexkv doc
shaonvidia Sep 15, 2025
4c82b9a
MOD:dynamo doc and main doc
shaonvidia Sep 15, 2025
099ec69
Merge pull request #14 from maureen1111/main
peaceforeverCN Sep 15, 2025
ecc1d59
[doc] add flexkv_config.json introduce
peaceforeverCN Sep 15, 2025
36fcf1c
Merge pull request #15 from taco-project/args_docs
peaceforeverCN Sep 15, 2025
54b21ed
Merge pull request #16 from taco-project/bugfix
peaceforeverCN Sep 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# This workflow will upload a Python Package to Release asset
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions
# Copied from vLLM github actions https://github.com/vllm-project/vllm/blob/main/.github/workflows/publish.yml
name: flexkv ci

on:
pull_request:
branches: [ "main", "dev"]
push:
branches: [ "main", "dev"]

# Needed to create wheel and upload assets
permissions:
contents: write

jobs:
build:
name: Build Wheel
runs-on: ${{ matrix.os }}

strategy:
fail-fast: false
matrix:
os: ['ubuntu-22.04']
python-version: ['3.10']
pytorch-version: ['2.6.0']
cuda-version: ['12.4']

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Set up Linux Env
if: ${{ runner.os == 'Linux' }}
run: |
bash -x .github/workflows/scripts/env.sh

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'

- name: Install CUDA ${{ matrix.cuda-version }}
run: |
bash -x .github/workflows/scripts/cuda-install.sh ${{ matrix.cuda-version }} ${{ matrix.os }}

- name: Install PyTorch ${{ matrix.pytorch-version }} with CUDA ${{ matrix.cuda-version }}
run: |
bash -x .github/workflows/scripts/pytorch-install.sh ${{ matrix.python-version }} ${{ matrix.pytorch-version }} ${{ matrix.cuda-version }}

- name: Build wheel
shell: bash
env:
TORCH_CUDA_ARCH_LIST: "8.9 9.0+PTX"
MAX_JOBS: 4
run: |
./build.sh --release

- name: Get Date and Time
run: |
echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
echo "time=$(date +'%H-%M-%S')" >> $GITHUB_ENV

- name: Upload to cos
uses: shallwefootball/s3-upload-action@master
with:
aws_key_id: ${{ secrets.COS_SECRET_ID }}
aws_secret_access_key: ${{ secrets.COS_SECRET_KEY }}
aws_bucket: ${{ secrets.COS_BUCKET }}
endpoint: ${{ secrets.COS_ENDPOINT }}
source_dir: dist
destination_dir: flexkv/${{ env.date }}/${{ env.time }}
24 changes: 24 additions & 0 deletions .github/workflows/scripts/cuda-install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/bash
# Copied from vLLM github actions https://github.com/vllm-project/vllm/blob/main/.github/workflows/scripts/cuda-install.sh

# Replace '.' with '-' ex: 11.8 -> 11-8
cuda_version=$(echo "$1" | tr "." "-")
# Removes '-' and '.' ex: ubuntu-20.04 -> ubuntu2004
OS=$(echo "$2" | tr -d ".\-")

# Installs CUDA
wget -nv "https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-keyring_1.1-1_all.deb"
sudo dpkg -i cuda-keyring_1.1-1_all.deb
rm cuda-keyring_1.1-1_all.deb
sudo apt -qq update
sudo apt -y install "cuda-${cuda_version}" "cuda-nvcc-${cuda_version}" "cuda-libraries-dev-${cuda_version}"
sudo apt clean

# Test nvcc
PATH=/usr/local/cuda-$1/bin:${PATH}
nvcc --version

# Log gcc, g++, c++ versions
gcc --version
g++ --version
c++ --version
21 changes: 21 additions & 0 deletions .github/workflows/scripts/env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash
# Copied from vLLM github actions https://github.com/vllm-project/vllm/blob/main/.github/workflows/scripts/env.sh

# This file installs common linux environment tools

export LANG=C.UTF-8

sudo apt-get update && \
sudo apt-get install -y --no-install-recommends \
software-properties-common

sudo apt-get install -y --no-install-recommends \
build-essential \
liburing-dev \
git \
cmake

# Remove github bloat files to free up disk space
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
sudo rm -rf "/usr/share/dotnet"
16 changes: 16 additions & 0 deletions .github/workflows/scripts/pytorch-install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash
# Copied from vLLM github actions https://github.com/vllm-project/vllm/blob/main/.github/workflows/scripts/pytorch-install.sh

python_executable=python$1
pytorch_version=$2
cuda_version=$3

# Install torch
$python_executable -m pip install numpy ninja cython wheel typing typing-extensions dataclasses setuptools && conda clean -ya
$python_executable -m pip install torch=="${pytorch_version}+cu${cuda_version//./}" --extra-index-url "https://download.pytorch.org/whl/cu${cuda_version//./}"

# Print version information
$python_executable --version
$python_executable -c "import torch; print('PyTorch:', torch.__version__)"
$python_executable -c "import torch; print('CUDA:', torch.version.cuda)"
$python_executable -c "from torch.utils import cpp_extension; print (cpp_extension.CUDA_HOME)"
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -70,3 +70,6 @@ cover/

# mypy
.mypy_cache/

# VSCode
.vscode/
30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [1.0.0] - 2025-09-11

### Added
- C++ radix tree for fast match, need set "index_accel": true in cache_config
- sync kernel launch
- a huge change that move cache engine to a library for accelerator(vLLM e.g.) to use instead of server-client mode.
This accelerate the get and put when no KVCache is matched. This version includes breaking API changes and is not backward compatible.
- add evict_ratio, need set "evict_ratio": 0.05 in cache_config
- reducing the bubble inner the launch kernel
- add vLLM 0.10.1.1 adapter

### Fixed
- cython release package


## [0.1.0] - 2025-08-29

### Init
- init version
- add license

13 changes: 13 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Contributing to FlexKV

Thank you for your interest in contributing to FlexKV!

## PR Title and Classification
Use a prefixed PR title to indicate the type of changes. Please use one of the following:

- `[bugfix]` for bugfixes
- `[feature]` for new features
- `[test]` for test cases
- `[ci/build]` for build or continuous integration improvements
- `[doc]` for documentation fixes
- `[misc]` for PRs that do not fit the above categories. Please use this sparingly.
31 changes: 16 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,28 +8,27 @@ FlexKV is released under the **Apache-2.0 License**. See the [LICENSE](LICENSE)

## How to Use

### Build FlexKV
### Install Dependencies

```bash
./build.sh
apt install liburing-dev
apt install libxxhash-dev
```

### Use FlexKV with vLLM (v0.8.4)

Apply the patch `examples/vllm_adaption/flexkv_vllm_0_8_4.patch` to vLLM 0.8.4, then start FlexKV, vLLM, and the benchmark script:
### Build FlexKV

```bash
# Start FlexKV as server
bash benchmarks/flexkv_benchmark/run_flexkv_server.sh
./build.sh
#./build.sh --release for cython package
```

# Start vLLM as client
bash benchmarks/flexkv_benchmark/serving_vllm.sh
### Use FlexKV with vLLM

# Start benchmark
bash benchmarks/flexkv_benchmark/multiturn_benchmark.sh
```
See [docs/vllm_adapter/README_en.md](docs/vllm_adapter/README_en.md)

### FlexKV Integration with Dynamo

> **Note**: The current script is only compatible with the `main` branch. Support for the latest features in the `dev` branch is under development.
See [docs/dynamo_integration/README_en.md](docs/dynamo_integration/README_en.md)

## Design Architecture

Expand Down Expand Up @@ -84,8 +83,10 @@ FlexKV performs:
- *put* requests can be called asynchronously; the time to copy data from GPU to CPU memory can overlap with subsequent computation. Data transfers between CPU memory, SSD, and scalable storage are fully handled asynchronously by the TransferEngine and transparent to the main process.

## Branch
- main is the stable branch, maintaining commits that have been tested.
- dev is the development branch, maintaining newer features.
- The main branch is the stable branch, which maintains already tested commits. Please pull from main branch if you need stable code.
- The dev branch is the development branch, which contains newer features. Please branch from and merge into dev if you need new features or are developing new functionality.
- The bugfix branch is for bug fixes, maintaining urgent bugs that need immediate resolution or documentation that requires prompt updates. If you need to fix a bug or update documentation urgently, please branch from and merge into the bugfix branch.
- The stable branch refers to the previous main branch state, intended only for rollback or extremely conservative use cases (e.g., production deployment). Its use is discouraged.

## Roadmap

Expand Down
29 changes: 15 additions & 14 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,28 +8,27 @@ FlexKV 采用 **Apache-2.0 开源协议**,详细信息请参见 [LICENSE](LICE

## 如何使用

### 安装依赖

```bash
apt install liburing-dev
apt install libxxhash-dev
```

### 编译 FlexKV

```bash
./build.sh
#./build.sh --release for cython package
```

### 以 vLLM 为例使用 FlexKV

在 vLLM 0.8.4 版本中应用patch `examples/vllm_adaption/flexkv_vllm_0_8_4.patch`,分别启动 FlexKV、vLLM 和测试脚本:
见[docs/vllm_adapter/README_zh.md](docs/vllm_adapter/README_zh.md)

```bash
# 启动 FlexKV 作为服务端
bash benchmarks/flexkv_benchmark/run_flexkv_server.sh

# 启动 vLLM 作为客户端
bash benchmarks/flexkv_benchmark/serving_vllm.sh

# 启动性能测试
bash benchmarks/flexkv_benchmark/multiturn_benchmark.sh
```
### FlexKV和Dynamo框架的集成

> **注意**:当前脚本仅适配 `main` 分支。`dev` 分支的最新特性支持脚本正在开发中。
见[docs/dynamo_integration/README_zh.md](docs/dynamo_integration/README_zh.md)

## 设计框架

Expand Down Expand Up @@ -84,8 +83,10 @@ FlexKV 在处理 *get* 请求时:
- *put*请求可以异步调用,从GPU copy到内存的时间可以与之后的计算重合。内存与SSD以及扩展存储间的传输则完全由TransferEngine之后执行,主进程不感知。

## Branch
- main 为稳定分支,维护已经测试过的commit。
- dev 为开发分支,维护较新特性。
- main 为稳定分支,维护已经测试过的commit。需要稳定的代码请从此分支拉取。
- dev 为开发分支,维护较新特性。需要新特性和开发新特性请从此分支拉取和合入。
- bugfix 为bug分支,维护需要立即解决的bug或需要立即更新的文档。需要解决bug和立即更新的文档请从此分支拉取和合入。
- stable 为上一个版本的main分支位置,仅用于回滚以及极其保守的情况使用(如产品化)。不鼓励使用此版本。

## Roadmap

Expand Down
1 change: 1 addition & 0 deletions VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1.0.0
Loading