feat: FlexKV integration in Dynamo#5858
Conversation
There was a problem hiding this comment.
Pull request overview
This PR integrates FlexKV as a first-class KV transfer connector for the Dynamo vLLM backend, with accompanying examples and documentation updates. It adds a new flexkv connector option, wiring it into the vLLM KV transfer configuration, and documents how to deploy and use it in aggregated and disaggregated serving setups.
Changes:
- Extend vLLM backend argument parsing to recognize
flexkvas a valid connector and map it toFlexKVDynamoConnectorV1inKVTransferConfig. - Add example launch scripts for aggregated, aggregated+router, and disaggregated FlexKV-powered vLLM deployments, including corresponding Kubernetes README updates.
- Update architecture and backend docs, hidden toctree, KVBM intro, and attributions to reference FlexKV and provide a dedicated FlexKV integration guide.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
components/src/dynamo/vllm/args.py |
Adds flexkv to VALID_CONNECTORS, CLI help, and create_kv_transfer_config, mapping it to FlexKVDynamoConnectorV1 so vLLM can instantiate the FlexKV connector via the --connector flag. |
examples/backends/vllm/launch/agg_flexkv.sh |
New single-worker aggregated example that runs dynamo.frontend and a vLLM worker configured with --connector flexkv and FlexKV-related env vars. |
examples/backends/vllm/launch/agg_flexkv_router.sh |
New aggregated + KV router example wiring dynamo.frontend in --router-mode kv with two FlexKV-enabled vLLM workers and explicit --kv-events-config. |
examples/backends/vllm/launch/disagg_flexkv.sh |
New disaggregated example that runs separate decode and FlexKV-enabled prefill workers, using both nixl and flexkv connectors to support KV transfer and offload. |
examples/backends/vllm/deploy/README.md |
Updates the documented --connector options to include flexkv so Kubernetes deployment templates reflect the new backend. |
docs/kvbm/kvbm_intro.rst |
Adds a toctree entry linking to the new FlexKV integration documentation from the KVBM intro page. |
docs/hidden_toctree.rst |
Registers backends/vllm/flexkv_integration.md in the hidden docs toctree so it’s included in the built documentation. |
docs/design_docs/architecture.md |
Extends the “Memory bottlenecks” discussion to reference FlexKV as one of the KV offloading systems. |
docs/backends/vllm/flexkv_integration.md |
New guide explaining what FlexKV is, how to enable it via DYNAMO_USE_FLEXKV, and how to run aggregated and disaggregated FlexKV deployments with links to example scripts (note: contains minor spelling nits already flagged). |
docs/backends/vllm/README.md |
Updates the vLLM backend feature matrix and configuration section to include FlexKV as a supported connector and link to the FlexKV integration guide. |
ATTRIBUTIONS-Python.md |
Adds third-party attribution and Apache-2.0 license text for the flexkv Python dependency (version 1.2.1). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
WalkthroughThis change adds FlexKV support to the Dynamo vLLM backend, including Apache 2.0 license attribution, connector configuration handling, comprehensive documentation on FlexKV integration with architecture references, and example deployment scripts for both aggregated and disaggregated serving modes with multi-worker setup capabilities. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/src/dynamo/vllm/args.py (1)
139-144:⚠️ Potential issue | 🟡 MinorRun Black to fix formatting drift.
CI reports Black reformatting this file; please run Black on
components/src/dynamo/vllm/args.py(likely the long help string here).
🤖 Fix all issues with AI agents
In `@ATTRIBUTIONS-Python.md`:
- Around line 10982-11169: The ATTRIBUTIONS entry for "flexkv (1.2.1)" is
incorrect — either remove it or verify and correct its source/version; search
the project dependency declarations (pyproject.toml) for "flexkv" and any
references in the vllm args/connector code (the module that defines the optional
connector in vllm/args.py) to determine whether flexkv is actually a transitive
dependency or a GitHub-only dependency, then: if not used, remove the "flexkv
(1.2.1)" block from ATTRIBUTIONS-Python.md; if used from GitHub, replace the
PyPI version with the GitHub URL and commit metadata in the attribution entry;
if used from a specific tag/commit, update the version string to that tag/commit
and ensure attribution text reflects the source.
In `@components/src/dynamo/vllm/args.py`:
- Around line 533-536: The diff sets connector_cfg to use a non-existent class
"FlexKVDynamoConnectorV1", which will fail at runtime; update the branch that
handles connector == "flexkv" to either set a valid built-in connector (e.g.,
use "DynamoConnector" or "NixlConnector") or supply a kv_connector_module_path
that points to the module implementing FlexKVDynamoConnectorV1; modify the
connector_cfg assignment (the same place LMCacheConnectorV1 is used) to include
kv_connector_module_path if you intend to keep a custom class, or replace
"FlexKVDynamoConnectorV1" with an existing class name so the connector is
discoverable at runtime.
In `@docs/backends/vllm/flexkv_integration.md`:
- Around line 15-31: The section headings contain misspellings: change
"Aggregrated Serving" to "Aggregated Serving" and "Disaggregrated Serving" to
"Disaggregated Serving" in the document; update both header lines that precede
the FlexKV usage block and the disagg example reference so the corrected
headings read "Aggregated Serving" and "Disaggregated Serving" while leaving the
code snippet (`python -m dynamo.vllm --model $YOUR_MODEL --connector flexkv`)
and example file references (`agg_flexkv.sh`, `disagg_flexkv.sh`) unchanged.
In `@examples/backends/vllm/launch/agg_flexkv_router.sh`:
- Around line 1-3: Add the project's standard copyright/header block to the top
of the script (above or immediately before the existing shebang line
"#!/bin/bash") so the file matches the repo header conventions, and then mark
the file executable (e.g., run chmod +x on
examples/backends/vllm/launch/agg_flexkv_router.sh) so CI stops flagging it as
non-executable.
In `@examples/backends/vllm/launch/agg_flexkv.sh`:
- Around line 1-3: The script examples/backends/vllm/launch/agg_flexkv.sh is
missing the required copyright/header and isn’t marked executable; add the
project's copyright/header block to the top of agg_flexkv.sh (above the shebang)
and update file permissions to be executable (e.g., chmod +x) so CI recognizes
the script as executable.
In `@examples/backends/vllm/launch/disagg_flexkv.sh`:
- Around line 1-3: Add an SPDX copyright header immediately after the existing
shebang line (#!/bin/bash) in the script
(examples/backends/vllm/launch/disagg_flexkv.sh) and commit that change, and
ensure the file is executable by setting its mode (e.g., run git update-index
--chmod=+x examples/backends/vllm/launch/disagg_flexkv.sh or chmod +x before
committing) so CI no longer flags a missing header or non-executable script;
keep the existing trap and set -e lines intact and place the SPDX lines between
the shebang and the first command.
🧹 Nitpick comments (2)
examples/backends/vllm/launch/agg_flexkv_router.sh (2)
17-21: Quote the$MODELvariable.While the current value is safe, unquoted variables can cause word splitting or globbing issues if the value contains spaces or special characters.
Proposed fix
python -m dynamo.vllm \ - --model $MODEL \ + --model "$MODEL" \ --connector flexkv \
27-31: Quote the$MODELvariable here as well.Same issue as the first worker invocation.
Proposed fix
python -m dynamo.vllm \ - --model $MODEL \ + --model "$MODEL" \ --connector flexkv \
|
Hi @YconquestY! We're currently migrating our docs from Sphinx ( Could you please also apply your changes to the corresponding file(s) in
Note: The new file Thanks for helping keep both doc sets in sync during the migration! |
Migrate KVBM documentation to a new three-tier structure: - Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture - Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions) - Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations Create integrations folder with: - flexkv_integration.md - new FlexKV integration guide from PR #5858 - lmcache_integration.md - migrated from backends/vllm/ - sglang_hicache.md - migrated from backends/sglang/ Add AGENTS.md for KVBM component to guide AI agents. Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides.
Migrate KVBM documentation to a new three-tier structure: - Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture - Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions) - Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations Create integrations folder with: - flexkv_integration.md - new FlexKV integration guide from PR #5858 - lmcache_integration.md - migrated from backends/vllm/ - sglang_hicache.md - migrated from backends/sglang/ Add AGENTS.md for KVBM component to guide AI agents. Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides. Signed-off-by: akshatha-k <akshutk@gmail.com>
222a9a5 to
cb5b03a
Compare
|
Hi @Jont828, thanks for the review. FlexKV doc is updated as requested :) |
|
Hi @YconquestY - I only see the vLLM changes, did you also want to add the changes for flexKV with TRTLLM? |
|
Hi @akshatha-k. FlexKV does support TensorRT-LLM and SGLang. And yes, we will support Dynamo + TensorRT-LLM/SGLang + FlexKV in a future PR. This PR addresses Dynamo + vLLM + FlexKV only. |
22cfaf6 to
07f9b38
Compare
|
In vLLM we don't support |
07f9b38 to
cc0dd50
Compare
840d1b5 to
b28766e
Compare
|
/ok to test b28766e |
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
b28766e to
72a025c
Compare
|
/ok to test 72a025c |
Overview:
This PR enables using FlexKV as the KV cache offloading backend for Dynamo vLLM worker.
Details:
FlexKV is a scalable, distributed runtime for KV cache offloading. It acts as a unified KV caching layer for inference engines like vLLM, TensorRT-LLM and SGLang.
Where should the reviewer start?
The only code change happens in
components/src/dynamo/vllm/args.py.Related Issues:
None
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.