taco-project · feiqiangs · Sep 5, 2025 · Sep 5, 2025 · Sep 5, 2025 · Sep 8, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,30 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+## [1.0.0] - 2025-09-11
+
+### Added
+- C++ radix tree for fast match, need set "index_accel": true in cache_config
+- sync kernel launch
+- a huge change that move cache engine to a library for accelerator(vLLM e.g.) to use instead of server-client mode.
+  This accelerate the get and put when no KVCache is matched. This version includes breaking API changes and is not backward compatible. 
+- add evict_ratio, need set "evict_ratio": 0.05 in cache_config
+- reducing the bubble inner the launch kernel
+- add vLLM 0.10.1.1 adapter
+
+### Fixed
+- cython release package
+
+
+## [0.1.0] - 2025-08-29
+
+### Init
+- init version
+- add license
+
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,13 @@
+# Contributing to FlexKV
+
+Thank you for your interest in contributing to FlexKV!
+
+## PR Title and Classification
+Use a prefixed PR title to indicate the type of changes. Please use one of the following:
+
+- `[bugfix]` for bugfixes  
+- `[feature]` for new features  
+- `[test]` for test cases  
+- `[ci/build]` for build or continuous integration improvements  
+- `[doc]` for documentation fixes  
+- `[misc]` for PRs that do not fit the above categories. Please use this sparingly.
diff --git a/README.md b/README.md
@@ -8,29 +8,27 @@ FlexKV is released under the **Apache-2.0 License**. See the [LICENSE](LICENSE)
 
 ## How to Use
 
-### Build FlexKV
+### Install Dependencies
 
 ```bash
-./build.sh
+apt install liburing-dev
+apt install libxxhash-dev 
 ```
 
-### Use FlexKV with vLLM (v0.8.4)
-
-Apply the patch `examples/vllm_adaption/flexkv_vllm_0_8_4.patch` to vLLM 0.8.4, then start FlexKV, vLLM, and the benchmark script:
+### Build FlexKV
 
 ```bash
-# Start FlexKV as server
-bash benchmarks/flexkv_benchmark/run_flexkv_server.sh
+./build.sh
+#./build.sh --release for cython package
+```
 
-# Start vLLM as client
-bash benchmarks/flexkv_benchmark/serving_vllm.sh
+### Use FlexKV with vLLM
 
-# Start benchmark
-bash benchmarks/flexkv_benchmark/multiturn_benchmark.sh
-```
-Apply the patch `examples/vllm_adaption/flexkv_vllm_0_10_0.patch` to vLLM 0.10.0, and use the same testing method as above.
+See [docs/vllm_adapter/README_en.md](docs/vllm_adapter/README_en.md)
+
+### FlexKV Integration with Dynamo
 
-> **Note**: The current script is only compatible with the `main` branch. Support for the latest features in the `dev` branch is under development.
+See [docs/dynamo_integration/README_en.md](docs/dynamo_integration/README_en.md)
 
 ## Design Architecture
 
@@ -88,6 +86,7 @@ FlexKV performs:
 - The main branch is the stable branch, which maintains already tested commits. Please pull from main branch if you need stable code.
 - The dev branch is the development branch, which contains newer features. Please branch from and merge into dev if you need new features or are developing new functionality.
 - The bugfix branch is for bug fixes, maintaining urgent bugs that need immediate resolution or documentation that requires prompt updates. If you need to fix a bug or update documentation urgently, please branch from and merge into the bugfix branch.
+- The stable branch refers to the previous main branch state, intended only for rollback or extremely conservative use cases (e.g., production deployment). Its use is discouraged.
 
 ## Roadmap
 

diff --git a/README_zh.md b/README_zh.md
@@ -8,29 +8,27 @@ FlexKV 采用 **Apache-2.0 开源协议**，详细信息请参见 [LICENSE](LICE
 
 ## 如何使用
 
+### 安装依赖
+
+```bash
+apt install liburing-dev
+apt install libxxhash-dev 
+```
+
 ### 编译 FlexKV
 
 ```bash
 ./build.sh
+#./build.sh --release for cython package
 ```
 
 ### 以 vLLM 为例使用 FlexKV
 
-在 vLLM 0.8.4 版本中应用patch `examples/vllm_adaption/flexkv_vllm_0_8_4.patch`，分别启动 FlexKV、vLLM 和测试脚本：
+见[docs/vllm_adapter/README_zh.md](docs/vllm_adapter/README_zh.md)
 
-```bash
-# 启动 FlexKV 作为服务端
-bash benchmarks/flexkv_benchmark/run_flexkv_server.sh
-
-# 启动 vLLM 作为客户端
-bash benchmarks/flexkv_benchmark/serving_vllm.sh
-
-# 启动性能测试
-bash benchmarks/flexkv_benchmark/multiturn_benchmark.sh
-```
-在 vLLM 0.10.0 版本中应用patch `examples/vllm_adaption/flexkv_vllm_0_10_0.patch`，测试方法同上。
+### FlexKV和Dynamo框架的集成
 
-> **注意**：当前脚本仅适配 `main` 分支。`dev` 分支的最新特性支持脚本正在开发中。
+见[docs/dynamo_integration/README_zh.md](docs/dynamo_integration/README_zh.md)
 
 ## 设计框架
 
@@ -88,6 +86,7 @@ FlexKV 在处理 *get* 请求时：
 - main 为稳定分支，维护已经测试过的commit。需要稳定的代码请从此分支拉取。
 - dev 为开发分支，维护较新特性。需要新特性和开发新特性请从此分支拉取和合入。
 - bugfix 为bug分支，维护需要立即解决的bug或需要立即更新的文档。需要解决bug和立即更新的文档请从此分支拉取和合入。
+- stable 为上一个版本的main分支位置，仅用于回滚以及极其保守的情况使用（如产品化）。不鼓励使用此版本。
 
 ## Roadmap
 

diff --git a/VERSION b/VERSION
@@ -0,0 +1 @@
+1.0.0
diff --git a/benchmarks/example_config.json b/benchmarks/example_config.json
@@ -14,7 +14,6 @@
         "enable_remote": false,
         "tokens_per_block": 16,
         "use_gds": false,
-        "use_pinned_memory": true,
         "gpu_kv_layout_type": "LAYERWISE",
         "cpu_kv_layout_type": "BLOCKWISE",
         "ssd_kv_layout_type": "BLOCKWISE",

diff --git a/csrc/bindings.cpp b/csrc/bindings.cpp
@@ -29,6 +29,7 @@
 
 namespace py = pybind11;
 
+#ifdef CUDA_AVAILABLE
 void transfer_kv_blocks_binding(
     torch::Tensor &gpu_block_id_tensor, torch::Tensor &gpu_layer_ptrs_tensor,
     int64_t gpu_kv_stride_in_bytes, int64_t gpu_block_stride_in_bytes,
@@ -60,7 +61,9 @@ void transfer_kv_blocks_binding(
     throw std::runtime_error(cudaGetErrorString(err));
   }
 }
+#endif
 
+#ifdef CUDA_AVAILABLE
 void transfer_kv_blocks_ssd_binding(
     flexkv::SSDIOCTX &ioctx,
     const torch::Tensor &cpu_layer_id_list, int64_t cpu_tensor_ptr,
@@ -82,6 +85,7 @@ void transfer_kv_blocks_ssd_binding(
       block_stride_in_bytes, is_read, num_blocks_per_file, round_robin,
       num_threads_per_device, is_mla);
 }
+#endif
 #ifdef FLEXKV_ENABLE_CFS
 void transfer_kv_blocks_remote(
     const py::list &file_nodeid_list, const torch::Tensor &cpu_layer_id_list,
@@ -162,6 +166,7 @@ void shared_transfer_kv_blocks_remote_read_binding(
 #endif
 
 PYBIND11_MODULE(c_ext, m) {
+#ifdef CUDA_AVAILABLE
   m.def("transfer_kv_blocks", &transfer_kv_blocks_binding,
         "Transfer multi-layer KV-cache between CPU and GPU");
   m.def("transfer_kv_blocks_ssd", &transfer_kv_blocks_ssd_binding,
@@ -174,6 +179,7 @@ PYBIND11_MODULE(c_ext, m) {
         py::arg("block_stride_in_bytes"), py::arg("is_read"),
         py::arg("num_blocks_per_file"), py::arg("round_robin") = 1,
         py::arg("num_threads_per_device") = 16, py::arg("is_mla") = false);
+#endif
 #ifdef FLEXKV_ENABLE_CFS
   m.def("transfer_kv_blocks_remote", &transfer_kv_blocks_remote,
         "Transfer KV blocks between remote and CPU memory",
@@ -249,6 +255,7 @@ PYBIND11_MODULE(c_ext, m) {
   m.def("call_pcfs_write", &flexkv::call_pcfs_write,
         "Call Pcfs::write from C++", py::arg("file_nodeid"), py::arg("offset"),
         py::arg("buffer"), py::arg("size"), py::arg("thread_id"));
+#ifdef CUDA_AVAILABLE
   m.def("shared_transfer_kv_blocks_remote_read", 
         &shared_transfer_kv_blocks_remote_read_binding,
         "Shared transfer KV blocks from remote PCFS to CPU memory",
@@ -266,6 +273,7 @@ PYBIND11_MODULE(c_ext, m) {
         py::arg("total_layers"),
         py::arg("is_mla") = false,
         py::arg("num_threads_per_file") = 8);
+#endif
 #endif
 
   py::class_<flexkv::CRadixTreeIndex>(m, "CRadixTreeIndex")
@@ -297,7 +305,7 @@ PYBIND11_MODULE(c_ext, m) {
       .def("has_block_node_ids", &flexkv::CRadixNode::has_block_node_ids);
 
   py::class_<flexkv::CMatchResult, std::shared_ptr<flexkv::CMatchResult>>(m, "CMatchResult")
-      .def(py::init<int, int, int, flexkv::CRadixNode *, flexkv::CRadixNode *, std::vector<int64_t> *>())
+      .def(py::init<int, int, int, flexkv::CRadixNode *, flexkv::CRadixNode *, torch::Tensor, torch::Tensor>())
       .def_readonly("last_ready_node", &flexkv::CMatchResult::last_ready_node)
       .def_readonly("last_node", &flexkv::CMatchResult::last_node)
       .def_readonly("physical_blocks", &flexkv::CMatchResult::physical_blocks)
@@ -318,14 +326,14 @@ PYBIND11_MODULE(c_ext, m) {
 
   // RedisMetaChannel binding
   py::class_<flexkv::RedisMetaChannel>(m, "RedisMetaChannel")
-      .def(py::init<const std::string&, int, uint32_t, const std::string&, const std::string&>(),
-           py::arg("host"), py::arg("port"), py::arg("node_id"), py::arg("local_ip"), py::arg("blocks_key") = std::string("blocks"))
+      .def(py::init<const std::string&, int, uint32_t, const std::string&, const std::string&, const std::string&>(),
+           py::arg("host"), py::arg("port"), py::arg("node_id"), py::arg("local_ip"), py::arg("blocks_key") = std::string("blocks"), py::arg("password") = std::string(""))
       .def("connect", &flexkv::RedisMetaChannel::connect)
       .def("get_node_id", &flexkv::RedisMetaChannel::get_node_id)
       .def("get_local_ip", &flexkv::RedisMetaChannel::get_local_ip)
       .def("make_block_key", &flexkv::RedisMetaChannel::make_block_key, py::arg("node_id"), py::arg("hash"))
-      .def("publish_one", [](flexkv::RedisMetaChannel &ch, const flexkv::BlockMeta &m){ ch.publish(m); })
-      .def("publish_batch", [](flexkv::RedisMetaChannel &ch, const std::vector<flexkv::BlockMeta> &metas, size_t batch_size){ ch.publish(metas, batch_size); }, py::arg("metas"), py::arg("batch_size")=100)
+      .def("publish_one", [](flexkv::RedisMetaChannel &ch, const flexkv::BlockMeta &m){ return ch.publish(m); })
+      .def("publish_batch", [](flexkv::RedisMetaChannel &ch, const std::vector<flexkv::BlockMeta> &metas, size_t batch_size){ return ch.publish(metas, batch_size); }, py::arg("metas"), py::arg("batch_size")=100)
       .def("load", [](flexkv::RedisMetaChannel &ch, size_t max_items){ std::vector<flexkv::BlockMeta> out; ch.load(out, max_items); return out; }, py::arg("max_items"))
       .def("renew_node_leases", &flexkv::RedisMetaChannel::renew_node_leases, py::arg("node_id"), py::arg("new_lt"), py::arg("batch_size")=200)
       .def("list_keys", [](flexkv::RedisMetaChannel &ch, const std::string &pattern){ std::vector<std::string> keys; ch.list_keys(pattern, keys); return keys; }, py::arg("pattern"))
@@ -334,19 +342,18 @@ PYBIND11_MODULE(c_ext, m) {
       .def("hmget_field_for_keys", [](flexkv::RedisMetaChannel &ch, const std::vector<std::string> &keys, const std::string &field){ std::vector<std::string> values; ch.hmget_field_for_keys(keys, field, values); return values; }, py::arg("keys"), py::arg("field"))
       .def("hmget_two_fields_for_keys", [](flexkv::RedisMetaChannel &ch, const std::vector<std::string> &keys, const std::string &f1, const std::string &f2){ std::vector<std::pair<std::string,std::string>> out; ch.hmget_two_fields_for_keys(keys, f1, f2, out); return out; }, py::arg("keys"), py::arg("field1"), py::arg("field2"))
       .def("load_metas_by_keys", [](flexkv::RedisMetaChannel &ch, const std::vector<std::string> &keys){ std::vector<flexkv::BlockMeta> out; ch.load_metas_by_keys(keys, out); return out; }, py::arg("keys"))
-      .def("update_block_state_batch", [](flexkv::RedisMetaChannel &ch, uint32_t node_id, const std::vector<int64_t> &hashes, flexkv::NodeState state, size_t batch_size){ std::deque<int64_t> dq(hashes.begin(), hashes.end()); ch.update_block_state_batch(node_id, &dq, state, batch_size); }, py::arg("node_id"), py::arg("hashes"), py::arg("state"), py::arg("batch_size")=200)
-      .def("delete_blockmeta_batch", [](flexkv::RedisMetaChannel &ch, uint32_t node_id, const std::vector<int64_t> &hashes, size_t batch_size){ std::deque<int64_t> dq(hashes.begin(), hashes.end()); ch.delete_blockmeta_batch(node_id, &dq, batch_size); }, py::arg("node_id"), py::arg("hashes"), py::arg("batch_size")=200);
+      .def("update_block_state_batch", [](flexkv::RedisMetaChannel &ch, uint32_t node_id, const std::vector<int64_t> &hashes, int state, size_t batch_size){ std::deque<int64_t> dq(hashes.begin(), hashes.end()); return ch.update_block_state_batch(node_id, &dq, state, batch_size); }, py::arg("node_id"), py::arg("hashes"), py::arg("state"), py::arg("batch_size")=200)
+      .def("delete_blockmeta_batch", [](flexkv::RedisMetaChannel &ch, uint32_t node_id, const std::vector<int64_t> &hashes, size_t batch_size){ std::deque<int64_t> dq(hashes.begin(), hashes.end()); return ch.delete_blockmeta_batch(node_id, &dq, batch_size); }, py::arg("node_id"), py::arg("hashes"), py::arg("batch_size")=200);
 
   // LocalRadixTree bindings (derived from CRadixTreeIndex)
   py::class_<flexkv::LocalRadixTree, flexkv::CRadixTreeIndex>(m, "LocalRadixTree")
-      .def(py::init<int, int, uint32_t, uint32_t, uint32_t, uint32_t, size_t>(),
+      .def(py::init<int, int, uint32_t, uint32_t, uint32_t, uint32_t>(),
            py::arg("tokens_per_block"),
            py::arg("max_num_blocks") = 1000000,
            py::arg("lease_ttl_ms") = 100000,
            py::arg("renew_lease_ms") = 0,
            py::arg("refresh_batch_size") = 256,
-           py::arg("idle_sleep_ms") = 10,
-           py::arg("lt_pool_initial_capacity") = 0)
+           py::arg("idle_sleep_ms") = 10)
       .def("set_meta_channel", &flexkv::LocalRadixTree::set_meta_channel, py::arg("channel"))
       .def("start", &flexkv::LocalRadixTree::start, py::arg("channel"))
       .def("stop", &flexkv::LocalRadixTree::stop)
@@ -374,14 +381,14 @@ PYBIND11_MODULE(c_ext, m) {
 
   // DistributedRadixTree bindings (remote reference tree manager)
   py::class_<flexkv::DistributedRadixTree>(m, "DistributedRadixTree")
-      .def(py::init<int, int, uint32_t, size_t, size_t, uint32_t, uint32_t>(),
+      .def(py::init<int, int, uint32_t, size_t, uint32_t, uint32_t, uint32_t>(),
            py::arg("tokens_per_block"),
            py::arg("max_num_blocks"),
            py::arg("node_id"),
-           py::arg("lt_pool_initial_capacity") = 0,
            py::arg("refresh_batch_size") = 128,
            py::arg("rebuild_interval_ms") = 1000,
-           py::arg("idle_sleep_ms") = 10)
+           py::arg("idle_sleep_ms") = 10,
+           py::arg("lease_renew_ms") = 5000)
       .def("start", &flexkv::DistributedRadixTree::start, py::arg("channel"))
       .def("stop", &flexkv::DistributedRadixTree::stop)
       .def("remote_tree_refresh", &flexkv::DistributedRadixTree::remote_tree_refresh, py::return_value_policy::reference)
@@ -392,4 +399,14 @@ PYBIND11_MODULE(c_ext, m) {
       .def("unlock", &flexkv::DistributedRadixTree::unlock, py::arg("node"))
       .def("is_empty", &flexkv::DistributedRadixTree::is_empty)
       .def("set_ready", &flexkv::DistributedRadixTree::set_ready, py::arg("node"), py::arg("ready") = true, py::arg("ready_length") = -1);
+
+  // RefRadixTree bindings (for type information)
+  py::class_<flexkv::RefRadixTree, flexkv::CRadixTreeIndex>(m, "RefRadixTree")
+      .def(py::init<int, int, uint32_t, flexkv::LockFreeQueue<flexkv::CRadixNode*>*>(),
+           py::arg("tokens_per_block"),
+           py::arg("max_num_blocks") = 1000000,
+           py::arg("lease_renew_ms") = 5000,
+           py::arg("renew_lease_queue") = nullptr)
+      .def("dec_ref_cnt", &flexkv::RefRadixTree::dec_ref_cnt)
+      .def("inc_ref_cnt", &flexkv::RefRadixTree::inc_ref_cnt);
 }
diff --git a/csrc/block_meta.h b/csrc/block_meta.h
@@ -2,7 +2,7 @@
 
 #include <cstdint>
 
-#include "radix_tree.h" // for NodeState
+#include "lease_meta_mempool.h" // for NODE_STATE_* macros
 
 namespace flexkv {
 
@@ -12,7 +12,7 @@ struct BlockMeta {
   uint32_t nid;      // node id
   int64_t hash;      // current block hash
   uint32_t lt;       // lease time
-  NodeState state;   // lease state
+  int state;         // lease state
 };
 
 } // namespace flexkv