taco-project · peaceforeverCN · Sep 15, 2025 · Sep 15, 2025
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,4 +1,4 @@
-# Contributing to Mooncake
+# Contributing to FlexKV
 
 Thank you for your interest in contributing to FlexKV!
 

diff --git a/README.md b/README.md
@@ -8,10 +8,18 @@ FlexKV is released under the **Apache-2.0 License**. See the [LICENSE](LICENSE)
 
 ## How to Use
 
+### Install Dependencies
+
+```bash
+apt install liburing-dev
+apt install libxxhash-dev 
+```
+
 ### Build FlexKV
 
 ```bash
 ./build.sh
+#./build.sh --release for cython package
 ```
 
 ### Use FlexKV with vLLM

diff --git a/README_zh.md b/README_zh.md
@@ -8,10 +8,18 @@ FlexKV 采用 **Apache-2.0 开源协议**，详细信息请参见 [LICENSE](LICE
 
 ## 如何使用
 
+### 安装依赖
+
+```bash
+apt install liburing-dev
+apt install libxxhash-dev 
+```
+
 ### 编译 FlexKV
 
 ```bash
 ./build.sh
+#./build.sh --release for cython package
 ```
 
 ### 以 vLLM 为例使用 FlexKV

diff --git a/docs/dynamo_integration/README_en.md b/docs/dynamo_integration/README_en.md
@@ -39,7 +39,7 @@ git apply /your/path/to/FlexKV/examples/vllm_adaption/vllm_0_10_1_1-flexkv-conne
 
 ### FlexKV Verification
 
-Please refer to the test scripts in [vLLM online serving](https://github.com/taco-project/FlexKV/blob/dev/docs/vllm_adapter/README_zh.md#%E7%A4%BA%E4%BE%8B).
+Please refer to the test scripts in [vLLM online serving](../../docs/vllm_adapter/README_zh.md#%E7%A4%BA%E4%BE%8B).
 
 ## 2. Dynamo Modifications
 
@@ -123,6 +123,8 @@ for i in $(seq 0 $((NUM_WORKERS-1))); do
 done
 ```
 
+> Note: The `flexkv_config.json` configuration is provided as a simple example only. For full parameter options, please refer to [`docs/flexkv_config_reference/README_en.md`](../../docs/flexkv_config_reference/README_en.md)
+
 ### Verification
 
 You can verify that the Dynamo service has started correctly with the following command:

diff --git a/docs/dynamo_integration/README_zh.md b/docs/dynamo_integration/README_zh.md
@@ -39,7 +39,7 @@ git apply /your/path/to/FlexKV/examples/vllm_adaption/vllm_0_10_1_1-flexkv-conne
 
 ### FlexKV 验证
 
-请参考[vLLM online serving](https://github.com/taco-project/FlexKV/blob/dev/docs/vllm_adapter/README_zh.md#%E7%A4%BA%E4%BE%8B)里的测试脚本。
+请参考[vLLM online serving](../../docs/vllm_adapter/README_zh.md#%E7%A4%BA%E4%BE%8B)里的测试脚本。
 
 
 ## 2. Dynamo 配置修改
@@ -124,6 +124,8 @@ for i in $(seq 0 $((NUM_WORKERS-1))); do
 done
 ```
 
+> 注：`flexkv_config.json`配置仅为简单示例，选项请参考[`docs/flexkv_config_reference/README_zh.md`](../../docs/flexkv_config_reference/README_zh.md)
+
 ### 验证
 
 可通过如下命令验证Dynamo服务是否正确启动：

diff --git a/docs/flexkv_config_reference/README_en.md b/docs/flexkv_config_reference/README_en.md
@@ -0,0 +1,147 @@
+# FlexKV Configuration Guide
+
+This guide explains how to configure and use the FlexKV online serving configuration file (`flexkv_config.json`), including the meaning of all parameters, recommended values, and typical usage scenarios.
+
+---
+
+## Recommended Configuration
+
+Below is a production-grade recommended configuration that balances performance and stability:
+
+```json
+{
+    "enable_flexkv": true,
+    "server_recv_port": "ipc:///tmp/flexkv_test",
+    "cache_config": {
+        "enable_cpu": true,
+        "enable_ssd": true,
+        "enable_remote": false,
+        "use_gds": false,
+        "enable_trace": false,
+        "ssd_cache_iouring_entries": 512,
+        "tokens_per_block": 64,
+        "num_cpu_blocks": 233000,
+        "num_ssd_blocks": 4096000,
+        "ssd_cache_dir": "/data/flexkv_ssd/",
+        "evict_ratio": 0.05,
+        "index_accel": true
+    },
+    "num_log_interval_requests": 2000
+}
+```
+- `num_cpu_blocks` and `num_ssd_blocks` represent the total number of blocks in CPU memory and SSD respectively. These values must be configured according to your machine specs and model size. See [Cache Capacity Configuration](#cache-capacity-config) for calculation details.
+- `ssd_cache_dir` specifies the directory where SSD-stored KV cache files are saved.
+
+---
+
+## Configuration File Structure Overview
+
+The FlexKV configuration file is a JSON file, primarily consisting of three parts:
+
+- `enable_flexkv`: Whether to enable FlexKV (must be set to `true` to take effect).
+- `server_recv_port`: The IPC port on which the FlexKV service listens.
+- `cache_config`: The core cache configuration object, containing all cache behavior parameters.
+- `num_log_interval_requests`: Log statistics interval (outputs performance log every N requests).
+
+---
+
+## Complete `cache_config` Parameter Reference (from [`flexkv/common/config.py`](../../flexkv/common/config.py))
+
+### Basic Configuration
+
+| Parameter Name | Type | Default | Description |
+|----------------|------|---------|-------------|
+| `tokens_per_block` | int | 16 | Number of tokens per KV block. Must match the `block_size` used in the acceleration framework (e.g., vLLM). |
+| `enable_cpu` | bool | true | Whether to enable CPU memory as a cache layer. Strongly recommended to enable. |
+| `enable_ssd` | bool | false | Whether to enable SSD as a cache layer. Recommended if NVMe SSD is available. |
+| `enable_remote` | bool | false | Whether to enable remote cache (e.g., scalable cloud storage). Requires remote cache engine and custom implementation. |
+| `use_gds` | bool | false | Whether to use GPU Direct Storage (GDS) to accelerate SSD I/O. Not currently supported. |
+| `index_accel` | bool | false | Whether to enable C++ RadixTree. Recommended to enable. |
+
+---
+
+### KV Cache Layout Types (Generally No Need to Modify)
+
+| Parameter Name | Type | Default | Description |
+|----------------|------|---------|-------------|
+| `gpu_kv_layout_type` | enum | LAYERWISE | Organization of KV cache on GPU (layer-wise or block-wise). Must match vLLM’s layout (currently `LAYERWISE`). |
+| `cpu_kv_layout_type` | enum | BLOCKWISE | Organization on CPU. Recommended to use `BLOCKWISE`. Does not need to match vLLM. |
+| `ssd_kv_layout_type` | enum | BLOCKWISE | Organization on SSD. Recommended to use `BLOCKWISE`. Does not need to match vLLM. |
+| `remote_kv_layout_type` | enum | BLOCKWISE | Organization for remote cache. Must be defined according to remote backend’s layout. |
+
+> Note: Do not modify layout types unless you have specific performance requirements.
+
+---
+
+### Cache Capacity Configuration <a id="cache-capacity-config"></a>
+
+| Parameter Name | Type | Default | Description |
+|----------------|------|---------|-------------|
+| `num_cpu_blocks` | int | 1000000 | Number of blocks allocated in CPU memory. Adjust based on available RAM. |
+| `num_ssd_blocks` | int | 10000000 | Number of blocks allocated on SSD. |
+| `num_remote_blocks` | int \| None | None | Number of blocks allocated in remote cache. |
+
+> Note: Block size in all cache levels (CPU/SSD/Remote) matches the GPU block size. Estimate cache capacities based on GPU KV cache memory usage and block count.
+
+> Note: `block_size = num_layer * _kv_dim * tokens_per_block * num_head * head_size * dtype_size`.
+
+---
+
+### CPU-GPU Transfer Optimization
+
+| Parameter Name | Type | Default | Description |
+|----------------|------|---------|-------------|
+| `use_ce_transfer_h2d` | bool | false | Whether to use CUDA Copy Engine for Host→Device transfers. Reduces SM usage but may slightly reduce bandwidth. Real-world difference is minimal. |
+| `use_ce_transfer_d2h` | bool | false | Whether to use CUDA Copy Engine for Device→Host transfers. |
+| `transfer_sms_h2d` | int | 8 | Number of SMs (Streaming Multiprocessors) allocated for H2D transfers. |
+| `transfer_sms_d2h` | int | 8 | Number of SMs allocated for D2H transfers. |
+
+---
+
+### SSD Cache Configuration
+
+| Parameter Name | Type | Default | Description |
+|----------------|------|---------|-------------|
+| `max_blocks_per_file` | int | 32000 | Maximum number of blocks per SSD file. `-1` means unlimited. |
+| `ssd_cache_dir` | str \| List[str] | None | **Required.** Path to SSD cache directory, e.g., `"/data/flexkv_ssd/"`. |
+| `ssd_cache_iouring_entries` | int | 0 | io_uring queue depth. Recommended: `512` for significantly improved concurrent I/O performance. |
+| `ssd_cache_iouring_flags` | int | 0 | io_uring flags. Keep as `0` in most cases. |
+
+> Note: To maximize bandwidth across multiple SSDs, bind each SSD to a separate directory and specify them as a list:  
+> `"ssd_cache_dir": ["/data0/flexkv_ssd/", "/data1/flexkv_ssd/"]`.  
+> KV blocks will be evenly distributed across all SSDs.
+
+> Note: Setting `ssd_cache_iouring_entries` to `0` disables io_uring. Not recommended.
+
+---
+
+### Remote Cache Configuration (Skip if not enabled)
+
+| Parameter Name | Type | Default | Description |
+|----------------|------|---------|-------------|
+| `remote_cache_size_mode` | str | "file_size" | Allocate remote cache space by file size or block count. |
+| `remote_file_size` | int \| None | None | Size (in bytes) of each remote file. |
+| `remote_file_num` | int \| None | None | Number of remote files. |
+| `remote_file_prefix` | str \| None | None | Prefix for remote file names. |
+| `remote_cache_path` | str \| List[str] | None | Remote cache path (e.g., Redis URL, S3 path). |
+| `remote_config_custom` | dict \| None | None | Custom remote cache configurations (e.g., timeout, authentication). |
+
+---
+
+### Tracing and Logging
+
+| Parameter Name | Type | Default | Description |
+|----------------|------|---------|-------------|
+| `enable_trace` | bool | true | Whether to enable performance tracing. Disable (`false`) in production to reduce overhead. |
+| `trace_file_path` | str | "./flexkv_trace.log" | Path to trace log file. |
+| `trace_max_file_size_mb` | int | 100 | Maximum size (MB) per trace log file. |
+| `trace_max_files` | int | 5 | Maximum number of trace log files to retain. |
+| `trace_flush_interval_ms` | int | 1000 | Trace log flush interval (milliseconds). |
+
+---
+
+### Cache Eviction Policy
+
+| Parameter Name | Type | Default | Description |
+|----------------|------|---------|-------------|
+| `evict_ratio` | float | 0.0 | Ratio of blocks to proactively evict from CPU/SSD per eviction cycle. `0.0` = evict only the minimal necessary blocks (more eviction cycles may impact performance). Recommended: `0.05` (evict 5% of least recently used blocks per cycle). |
diff --git a/docs/flexkv_config_reference/README_zh.md b/docs/flexkv_config_reference/README_zh.md
@@ -0,0 +1,145 @@
+# FlexKV 配置使用指南
+
+本指南详细说明如何配置和使用 FlexKV 的在线服务配置文件（`flexkv_config.json`），涵盖所有参数的含义、推荐值及典型使用场景。
+
+---
+
+## 推荐配置方案
+
+以下是一个兼顾性能与稳定性的生产级推荐配置：
+
+```json
+{
+    "enable_flexkv": true,
+    "server_recv_port": "ipc:///tmp/flexkv_test",
+    "cache_config": {
+        "enable_cpu": true,
+        "enable_ssd": true,
+        "enable_remote": false,
+        "use_gds": false,
+        "enable_trace": false,
+        "ssd_cache_iouring_entries": 512,
+        "tokens_per_block": 64,
+        "num_cpu_blocks": 233000,
+        "num_ssd_blocks": 4096000,
+        "ssd_cache_dir": "/data/flexkv_ssd/",
+        "evict_ratio": 0.05,
+        "index_accel": true
+    },
+    "num_log_interval_requests": 2000
+}
+```
+- 其中的`num_cpu_blocks`和`num_ssd_blocks`分别代表内存和SSD中block的总数量，需要根据实际机器配置和模型来配置，具体计算方式见下文[缓存容量配置](#cache-capacity-config)
+- `ssd_cache_dir`为ssd中KVCache存放的文件目录
+
+---
+
+## 配置文件结构概览
+
+FlexKV 的配置文件是一个 JSON 文件，主要包含三个部分：
+
+- `enable_flexkv`: 是否启用 FlexKV 功能（必须设为 `true` 才生效）
+- `server_recv_port`: FlexKV 服务监听的 IPC 端口
+- `cache_config`: 核心缓存配置对象，包含所有缓存行为参数
+- `num_log_interval_requests`: 日志统计间隔（每处理 N 个请求输出一次性能日志）
+
+---
+
+## cache_config完整参数详解（来自 [`flexkv/common/config.py`](../../flexkv/common/config.py)）
+
+### 基础配置
+
+| 参数名 | 类型 | 默认值 | 说明 |
+|--------|------|--------|------|
+| `tokens_per_block` | int | 16 | 每个 KV Block 包含的 token 数量。需要与加速框架（如vLLM）中`block_size`保持一致 |
+| `enable_cpu` | bool | true | 是否启用 CPU 内存作为缓存层。强烈建议开启。 |
+| `enable_ssd` | bool | false | 是否启用 SSD 作为缓存层。如配备 NVMe SSD，建议开启。 |
+| `enable_remote` | bool | false | 是否启用远程缓存（如可扩展云存储等）。需要配合远程缓存和自定义的远程缓存引擎使用 |
+| `use_gds` | bool | false | 是否使用 GPU Direct Storage（GDS）加速 SSD 读写。目前暂不支持。 |
+| `index_accel` | bool | false | 是否启用C++ RadixTree。推荐开启。 |
+
+---
+
+### KV 缓存布局类型（一般无需修改）
+
+| 参数名 | 类型 | 默认值 | 说明 |
+|--------|------|--------|------|
+| `gpu_kv_layout_type` | enum | LAYERWISE | GPU 上 KV Cache 的组织方式（按层或按块）。目前vLLM在GPU组织方式为`LAYERWISE`，因此FlexKV的`gpu_kv_layout_type`须与vLLM保持一致 |
+| `cpu_kv_layout_type` | enum | BLOCKWISE | CPU 上按块组织, 推荐使用`BLOCKWISE`，不需要与vLLM保持一致 |
+| `ssd_kv_layout_type` | enum | BLOCKWISE | SSD 上按块组织, 推荐使用`BLOCKWISE`，不需要与vLLM保持一致 |
+| `remote_kv_layout_type` | enum | BLOCKWISE | 远程缓存按块组织, 需要按照remote组织形式定义 |
+
+> 注：除非有特殊性能需求，否则不建议修改布局类型。
+
+---
+
+### 缓存容量配置 <a id="cache-capacity-config"></a>
+
+| 参数名 | 类型 | 默认值 | 说明 |
+|--------|------|--------|------|
+| `num_cpu_blocks` | int | 1000000 | CPU 缓存块数。根据内存大小调整。|
+| `num_ssd_blocks` | int | 10000000 | SSD 缓存块数。|
+| `num_remote_blocks` | int \| None | None | 远程缓存块数。|
+
+> 注：FlexKV里的各级缓存的block大小与GPU中的block大小保持一致，可以参考GPU的KVCache显存大小与block数量估算各级缓存中的block数量。
+
+> 注：block_size = num_layer * _kv_dim * tokens_per_block * num_head * self.head_size * torch_dtype.size()。
+
+---
+
+### CPU-GPU 传输优化
+
+| 参数名 | 类型 | 默认值 | 说明 |
+|--------|------|--------|------|
+| `use_ce_transfer_h2d` | bool | false | 是否使用 cuda copy engine 优化 Host→Device 传输，使用CE可以减少GPU SM在传输上的使用，但是传输速度会降低，实际测试差距不大 |
+| `use_ce_transfer_d2h` | bool | false | 是否使用 cuda copy engine 优化 Device→Host 传输 |
+| `transfer_sms_h2d` | int | 8 | H2D 传输使用的流处理器数量 |
+| `transfer_sms_d2h` | int | 8 | D2H 传输使用的流处理器数量 |
+
+---
+
+### SSD 缓存配置
+
+| 参数名 | 类型 | 默认值 | 说明 |
+|--------|------|--------|------|
+| `max_blocks_per_file` | int | 32000 | 单个 SSD 文件最多包含的 block 数。-1 表示无限制 |
+| `ssd_cache_dir` | str \| List[str] | None | SSD 缓存目录路径，**必须设置**，如 `"/data/flexkv_ssd/"` |
+| `ssd_cache_iouring_entries` | int | 0 | io_uring 队列深度，推荐设为 `512` 以提升并发 IO 性能，实测比不使用iouring提升较大，推荐使用512 |
+| `ssd_cache_iouring_flags` | int | 0 | io_uring 标志位，一般保持 0 |
+
+> 注：为了充分利用多块SSD的带宽上限，可以将多块SSD绑定至不同目录，并使用如 `"ssd cache dir": ["/data0/flexkv_ssd/", "/data1/flexkv_ssd/"]`方式初始化，SSD KVCache会均匀分布在所有SSD中，充分利用多个SSD带宽。
+
+> 注：`ssd_cache_iouring_entries`设置为0即不适用iouring，不推荐设置为0
+
+---
+
+### 远程缓存配置（不启用时无需配置）
+
+| 参数名 | 类型 | 默认值 | 说明 |
+|--------|------|--------|------|
+| `remote_cache_size_mode` | str | "file_size" | 按文件大小或块数分配远程缓存空间 |
+| `remote_file_size` | int \| None | None | 单个远程文件大小（字节） |
+| `remote_file_num` | int \| None | None | 远程文件数量 |
+| `remote_file_prefix` | str \| None | None | 远程文件名前缀 |
+| `remote_cache_path` | str \| List[str] | None | 远程缓存路径（如 Redis URL、S3 路径等） |
+| `remote_config_custom` | dict \| None | None | 自定义远程缓存配置（如超时、认证等） |
+
+---
+
+### 追踪与日志
+
+| 参数名 | 类型 | 默认值 | 说明 |
+|--------|------|--------|------|
+| `enable_trace` | bool | true | 是否启用性能追踪。生产环境建议关闭（`false`）以减少开销 |
+| `trace_file_path` | str | "./flexkv_trace.log" | 追踪日志路径 |
+| `trace_max_file_size_mb` | int | 100 | 单个追踪文件最大大小（MB） |
+| `trace_max_files` | int | 5 | 最多保留的追踪文件数 |
+| `trace_flush_interval_ms` | int | 1000 | 追踪日志刷新间隔（毫秒） |
+
+---
+
+### 缓存淘汰策略
+
+| 参数名 | 类型 | 默认值 | 说明 |
+|--------|------|--------|------|
+| `evict_ratio` | float | 0.0 | cpu，ssd一次evict主动淘汰比例（0.0 = 只淘汰最小的必要的block数量，较多的淘汰次数会影响性能）。建议保持 `0.05`，即每一次淘汰5%的最久未使用的block |
diff --git a/docs/vllm_adapter/README_en.md b/docs/vllm_adapter/README_en.md
@@ -63,6 +63,8 @@ VLLM_USE_V1=1 python -m vllm.entrypoints.cli.main serve Qwen3/Qwen3-32B \
 
 ```
 
+> Note: The `flexkv_config.json` configuration is provided as a simple example only. For full parameter options, please refer to [`docs/flexkv_config_reference/README_en.md`](../../docs/flexkv_config_reference/README_en.md)
+
 ## Legacy Version (<= 0.1.0) – Not Recommended for Current Use
 
 ### Supported Versions

diff --git a/docs/vllm_adapter/README_zh.md b/docs/vllm_adapter/README_zh.md
@@ -62,6 +62,8 @@ VLLM_USE_V1=1 python -m vllm.entrypoints.cli.main serve Qwen3/Qwen3-32B \
 
 ```
 
+> 注：`flexkv_config.json`配置仅为简单示例，选项请参考[`docs/flexkv_config_reference/README_zh.md`](../../docs/flexkv_config_reference/README_zh.md)
+
 ## Legacy版本（<= 0.1.0）,目前的版本尽量不要使用
 
 ### 适用版本
-Original file line number
+Diff line change
@@ Expand Up @@
     ```
+    > Note: The `flexkv_config.json` configuration is provided as a simple example only. For full parameter options, please refer to [`docs/flexkv_config_reference/README_en.md`](../../docs/flexkv_config_reference/README_en.md)
     ## Legacy Version (<= 0.1.0) – Not Recommended for Current Use
     ### Supported Versions
@@ Expand Down @@