Skip to content

Commit 339bb73

Browse files
Merge pull request #11 from taco-project/main
set bugfix to main
2 parents c2a26fa + 7cf7118 commit 339bb73

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+5619
-2888
lines changed

.github/workflows/publish.yml

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# This workflow will upload a Python Package to Release asset
2+
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions
3+
# Copied from vLLM github actions https://github.com/vllm-project/vllm/blob/main/.github/workflows/publish.yml
4+
name: flexkv ci
5+
6+
on:
7+
pull_request:
8+
branches: [ "main", "dev"]
9+
push:
10+
branches: [ "main", "dev"]
11+
12+
# Needed to create wheel and upload assets
13+
permissions:
14+
contents: write
15+
16+
jobs:
17+
build:
18+
name: Build Wheel
19+
runs-on: ${{ matrix.os }}
20+
21+
strategy:
22+
fail-fast: false
23+
matrix:
24+
os: ['ubuntu-22.04']
25+
python-version: ['3.10']
26+
pytorch-version: ['2.6.0']
27+
cuda-version: ['12.4']
28+
29+
steps:
30+
- name: Checkout
31+
uses: actions/checkout@v4
32+
33+
- name: Set up Linux Env
34+
if: ${{ runner.os == 'Linux' }}
35+
run: |
36+
bash -x .github/workflows/scripts/env.sh
37+
38+
- name: Set up Python
39+
uses: actions/setup-python@v5
40+
with:
41+
python-version: ${{ matrix.python-version }}
42+
cache: 'pip'
43+
44+
- name: Install CUDA ${{ matrix.cuda-version }}
45+
run: |
46+
bash -x .github/workflows/scripts/cuda-install.sh ${{ matrix.cuda-version }} ${{ matrix.os }}
47+
48+
- name: Install PyTorch ${{ matrix.pytorch-version }} with CUDA ${{ matrix.cuda-version }}
49+
run: |
50+
bash -x .github/workflows/scripts/pytorch-install.sh ${{ matrix.python-version }} ${{ matrix.pytorch-version }} ${{ matrix.cuda-version }}
51+
52+
- name: Build wheel
53+
shell: bash
54+
env:
55+
TORCH_CUDA_ARCH_LIST: "8.9 9.0+PTX"
56+
MAX_JOBS: 4
57+
run: |
58+
./build.sh --release
59+
60+
- name: Get Date and Time
61+
run: |
62+
echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
63+
echo "time=$(date +'%H-%M-%S')" >> $GITHUB_ENV
64+
65+
- name: Upload to cos
66+
uses: shallwefootball/s3-upload-action@master
67+
with:
68+
aws_key_id: ${{ secrets.COS_SECRET_ID }}
69+
aws_secret_access_key: ${{ secrets.COS_SECRET_KEY }}
70+
aws_bucket: ${{ secrets.COS_BUCKET }}
71+
endpoint: ${{ secrets.COS_ENDPOINT }}
72+
source_dir: dist
73+
destination_dir: flexkv/${{ env.date }}/${{ env.time }}
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#!/bin/bash
2+
# Copied from vLLM github actions https://github.com/vllm-project/vllm/blob/main/.github/workflows/scripts/cuda-install.sh
3+
4+
# Replace '.' with '-' ex: 11.8 -> 11-8
5+
cuda_version=$(echo "$1" | tr "." "-")
6+
# Removes '-' and '.' ex: ubuntu-20.04 -> ubuntu2004
7+
OS=$(echo "$2" | tr -d ".\-")
8+
9+
# Installs CUDA
10+
wget -nv "https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-keyring_1.1-1_all.deb"
11+
sudo dpkg -i cuda-keyring_1.1-1_all.deb
12+
rm cuda-keyring_1.1-1_all.deb
13+
sudo apt -qq update
14+
sudo apt -y install "cuda-${cuda_version}" "cuda-nvcc-${cuda_version}" "cuda-libraries-dev-${cuda_version}"
15+
sudo apt clean
16+
17+
# Test nvcc
18+
PATH=/usr/local/cuda-$1/bin:${PATH}
19+
nvcc --version
20+
21+
# Log gcc, g++, c++ versions
22+
gcc --version
23+
g++ --version
24+
c++ --version

.github/workflows/scripts/env.sh

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/bin/bash
2+
# Copied from vLLM github actions https://github.com/vllm-project/vllm/blob/main/.github/workflows/scripts/env.sh
3+
4+
# This file installs common linux environment tools
5+
6+
export LANG=C.UTF-8
7+
8+
sudo apt-get update && \
9+
sudo apt-get install -y --no-install-recommends \
10+
software-properties-common
11+
12+
sudo apt-get install -y --no-install-recommends \
13+
build-essential \
14+
liburing-dev \
15+
git \
16+
cmake
17+
18+
# Remove github bloat files to free up disk space
19+
sudo rm -rf "/usr/local/share/boost"
20+
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
21+
sudo rm -rf "/usr/share/dotnet"
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#!/bin/bash
2+
# Copied from vLLM github actions https://github.com/vllm-project/vllm/blob/main/.github/workflows/scripts/pytorch-install.sh
3+
4+
python_executable=python$1
5+
pytorch_version=$2
6+
cuda_version=$3
7+
8+
# Install torch
9+
$python_executable -m pip install numpy ninja cython wheel typing typing-extensions dataclasses setuptools && conda clean -ya
10+
$python_executable -m pip install torch=="${pytorch_version}+cu${cuda_version//./}" --extra-index-url "https://download.pytorch.org/whl/cu${cuda_version//./}"
11+
12+
# Print version information
13+
$python_executable --version
14+
$python_executable -c "import torch; print('PyTorch:', torch.__version__)"
15+
$python_executable -c "import torch; print('CUDA:', torch.version.cuda)"
16+
$python_executable -c "from torch.utils import cpp_extension; print (cpp_extension.CUDA_HOME)"

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,6 @@ cover/
7070

7171
# mypy
7272
.mypy_cache/
73+
74+
# VSCode
75+
.vscode/

CHANGELOG.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [Unreleased]
9+
10+
## [1.0.0] - 2025-09-11
11+
12+
### Added
13+
- C++ radix tree for fast match, need set "index_accel": true in cache_config
14+
- sync kernel launch
15+
- a huge change that move cache engine to a library for accelerator(vLLM e.g.) to use instead of server-client mode.
16+
This accelerate the get and put when no KVCache is matched. This version includes breaking API changes and is not backward compatible.
17+
- add evict_ratio, need set "evict_ratio": 0.05 in cache_config
18+
- reducing the bubble inner the launch kernel
19+
- add vLLM 0.10.1.1 adapter
20+
21+
### Fixed
22+
- cython release package
23+
24+
25+
## [0.1.0] - 2025-08-29
26+
27+
### Init
28+
- init version
29+
- add license
30+

CONTRIBUTING.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Contributing to Mooncake
2+
3+
Thank you for your interest in contributing to FlexKV!
4+
5+
## PR Title and Classification
6+
Use a prefixed PR title to indicate the type of changes. Please use one of the following:
7+
8+
- `[bugfix]` for bugfixes
9+
- `[feature]` for new features
10+
- `[test]` for test cases
11+
- `[ci/build]` for build or continuous integration improvements
12+
- `[doc]` for documentation fixes
13+
- `[misc]` for PRs that do not fit the above categories. Please use this sparingly.

README.md

Lines changed: 3 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -14,23 +14,9 @@ FlexKV is released under the **Apache-2.0 License**. See the [LICENSE](LICENSE)
1414
./build.sh
1515
```
1616

17-
### Use FlexKV with vLLM (v0.8.4)
17+
### Use FlexKV with vLLM
1818

19-
Apply the patch `examples/vllm_adaption/flexkv_vllm_0_8_4.patch` to vLLM 0.8.4, then start FlexKV, vLLM, and the benchmark script:
20-
21-
```bash
22-
# Start FlexKV as server
23-
bash benchmarks/flexkv_benchmark/run_flexkv_server.sh
24-
25-
# Start vLLM as client
26-
bash benchmarks/flexkv_benchmark/serving_vllm.sh
27-
28-
# Start benchmark
29-
bash benchmarks/flexkv_benchmark/multiturn_benchmark.sh
30-
```
31-
Apply the patch `examples/vllm_adaption/flexkv_vllm_0_10_0.patch` to vLLM 0.10.0, and use the same testing method as above.
32-
33-
> **Note**: The current script is only compatible with the `main` branch. Support for the latest features in the `dev` branch is under development.
19+
See [docs/vllm_adapter/README_en.md](docs/vllm_adapter/README_en.md)
3420

3521
## Design Architecture
3622

@@ -88,6 +74,7 @@ FlexKV performs:
8874
- The main branch is the stable branch, which maintains already tested commits. Please pull from main branch if you need stable code.
8975
- The dev branch is the development branch, which contains newer features. Please branch from and merge into dev if you need new features or are developing new functionality.
9076
- The bugfix branch is for bug fixes, maintaining urgent bugs that need immediate resolution or documentation that requires prompt updates. If you need to fix a bug or update documentation urgently, please branch from and merge into the bugfix branch.
77+
- The stable branch refers to the previous main branch state, intended only for rollback or extremely conservative use cases (e.g., production deployment). Its use is discouraged.
9178

9279
## Roadmap
9380

README_zh.md

Lines changed: 2 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,7 @@ FlexKV 采用 **Apache-2.0 开源协议**,详细信息请参见 [LICENSE](LICE
1616

1717
### 以 vLLM 为例使用 FlexKV
1818

19-
在 vLLM 0.8.4 版本中应用patch `examples/vllm_adaption/flexkv_vllm_0_8_4.patch`,分别启动 FlexKV、vLLM 和测试脚本:
20-
21-
```bash
22-
# 启动 FlexKV 作为服务端
23-
bash benchmarks/flexkv_benchmark/run_flexkv_server.sh
24-
25-
# 启动 vLLM 作为客户端
26-
bash benchmarks/flexkv_benchmark/serving_vllm.sh
27-
28-
# 启动性能测试
29-
bash benchmarks/flexkv_benchmark/multiturn_benchmark.sh
30-
```
31-
在 vLLM 0.10.0 版本中应用patch `examples/vllm_adaption/flexkv_vllm_0_10_0.patch`,测试方法同上。
32-
33-
> **注意**:当前脚本仅适配 `main` 分支。`dev` 分支的最新特性支持脚本正在开发中。
19+
[docs/vllm_adapter/README_zh.md](docs/vllm_adapter/README_zh.md)
3420

3521
## 设计框架
3622

@@ -88,6 +74,7 @@ FlexKV 在处理 *get* 请求时:
8874
- main 为稳定分支,维护已经测试过的commit。需要稳定的代码请从此分支拉取。
8975
- dev 为开发分支,维护较新特性。需要新特性和开发新特性请从此分支拉取和合入。
9076
- bugfix 为bug分支,维护需要立即解决的bug或需要立即更新的文档。需要解决bug和立即更新的文档请从此分支拉取和合入。
77+
- stable 为上一个版本的main分支位置,仅用于回滚以及极其保守的情况使用(如产品化)。不鼓励使用此版本。
9178

9279
## Roadmap
9380

VERSION

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
1.0.0

0 commit comments

Comments
 (0)