feat: FlexKV integration in Dynamo by YconquestY · Pull Request #5858 · ai-dynamo/dynamo

YconquestY · 2026-02-01T10:45:58Z

Overview:

This PR enables using FlexKV as the KV cache offloading backend for Dynamo vLLM worker.

Details:

FlexKV is a scalable, distributed runtime for KV cache offloading. It acts as a unified KV caching layer for inference engines like vLLM, TensorRT-LLM and SGLang.

Where should the reviewer start?

The only code change happens in components/src/dynamo/vllm/args.py.

Related Issues:

None

Summary by CodeRabbit

New Features
- Added FlexKV as a new connector option for vLLM deployment
Documentation
- Added comprehensive FlexKV integration guide with setup instructions
- Updated vLLM deployment documentation to include FlexKV configuration
- Added example deployment scripts for aggregated and disaggregated FlexKV setups
- Added license attribution for FlexKV package

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2026-02-01T10:46:02Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-02-01T10:46:05Z

🌿 Fern Docs Preview: https://nvidia-preview-63ededf4-a7dd-4f85-8034-d5ea95d8bb40.docs.buildwithfern.com/dynamo/dev

Copilot

Pull request overview

This PR integrates FlexKV as a first-class KV transfer connector for the Dynamo vLLM backend, with accompanying examples and documentation updates. It adds a new flexkv connector option, wiring it into the vLLM KV transfer configuration, and documents how to deploy and use it in aggregated and disaggregated serving setups.

Changes:

Extend vLLM backend argument parsing to recognize flexkv as a valid connector and map it to FlexKVDynamoConnectorV1 in KVTransferConfig.
Add example launch scripts for aggregated, aggregated+router, and disaggregated FlexKV-powered vLLM deployments, including corresponding Kubernetes README updates.
Update architecture and backend docs, hidden toctree, KVBM intro, and attributions to reference FlexKV and provide a dedicated FlexKV integration guide.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`components/src/dynamo/vllm/args.py`	Adds `flexkv` to `VALID_CONNECTORS`, CLI help, and `create_kv_transfer_config`, mapping it to `FlexKVDynamoConnectorV1` so vLLM can instantiate the FlexKV connector via the `--connector` flag.
`examples/backends/vllm/launch/agg_flexkv.sh`	New single-worker aggregated example that runs `dynamo.frontend` and a vLLM worker configured with `--connector flexkv` and FlexKV-related env vars.
`examples/backends/vllm/launch/agg_flexkv_router.sh`	New aggregated + KV router example wiring `dynamo.frontend` in `--router-mode kv` with two FlexKV-enabled vLLM workers and explicit `--kv-events-config`.
`examples/backends/vllm/launch/disagg_flexkv.sh`	New disaggregated example that runs separate decode and FlexKV-enabled prefill workers, using both `nixl` and `flexkv` connectors to support KV transfer and offload.
`examples/backends/vllm/deploy/README.md`	Updates the documented `--connector` options to include `flexkv` so Kubernetes deployment templates reflect the new backend.
`docs/kvbm/kvbm_intro.rst`	Adds a toctree entry linking to the new FlexKV integration documentation from the KVBM intro page.
`docs/hidden_toctree.rst`	Registers `backends/vllm/flexkv_integration.md` in the hidden docs toctree so it’s included in the built documentation.
`docs/design_docs/architecture.md`	Extends the “Memory bottlenecks” discussion to reference FlexKV as one of the KV offloading systems.
`docs/backends/vllm/flexkv_integration.md`	New guide explaining what FlexKV is, how to enable it via `DYNAMO_USE_FLEXKV`, and how to run aggregated and disaggregated FlexKV deployments with links to example scripts (note: contains minor spelling nits already flagged).
`docs/backends/vllm/README.md`	Updates the vLLM backend feature matrix and configuration section to include FlexKV as a supported connector and link to the FlexKV integration guide.
`ATTRIBUTIONS-Python.md`	Adds third-party attribution and Apache-2.0 license text for the `flexkv` Python dependency (version `1.2.1`).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/backends/vllm/launch/disagg_flexkv.sh

docs/backends/vllm/flexkv_integration.md

coderabbitai · 2026-02-01T10:52:59Z

Walkthrough

This change adds FlexKV support to the Dynamo vLLM backend, including Apache 2.0 license attribution, connector configuration handling, comprehensive documentation on FlexKV integration with architecture references, and example deployment scripts for both aggregated and disaggregated serving modes with multi-worker setup capabilities.

Changes

Cohort / File(s)	Summary
License Attribution `ATTRIBUTIONS-Python.md`	Adds flexkv (1.2.1) package entry with Apache 2.0 license text and metadata URLs (homepage, source).
Connector Configuration `components/src/dynamo/vllm/args.py`	Adds flexkv to VALID_CONNECTORS set and creates configuration branch mapping flexkv connector to FlexKVDynamoConnectorV1 with kv_role kv_both.
Documentation Updates `docs/backends/vllm/README.md`, `docs/design_docs/architecture.md`, `docs/kvbm/kvbm_intro.rst`, `examples/backends/vllm/deploy/README.md`	Updates feature tables, CLI documentation, and architecture references to include flexkv as documented option alongside nixl, lmcache, kvbm, and none.
Integration Guide `docs/backends/vllm/flexkv_integration.md`	New documentation file covering FlexKV enablement via DYNAMO_USE_FLEXKV, aggregated/disaggregated serving patterns, and configuration guidance with setup script references.
Navigation Updates `docs/hidden_toctree.rst`	Adds flexkv_integration.md to documentation toctree for discoverability.
Deployment Scripts `examples/backends/vllm/launch/agg_flexkv.sh`, `examples/backends/vllm/launch/agg_flexkv_router.sh`, `examples/backends/vllm/launch/disagg_flexkv.sh`	Introduces three bash launcher scripts demonstrating aggregated single/multi-worker and disaggregated configurations with FlexKV, CPU cache settings, IPC ports, CUDA device binding, and KV event publisher configuration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 FlexKV hops into the scene with grace,
New connectors now join the race,
Docs and scripts in place so fine,
License terms aligned in a line,
Dynamo's backend now shines with flexkv divine! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description covers the template's required sections (Overview, Details, Where to start, Related Issues) with meaningful content explaining FlexKV's purpose and directing reviewers to the key code changes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The title 'feat: FlexKV integration in Dynamo' accurately and concisely summarizes the main change—adding FlexKV support across configuration, documentation, and example scripts.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

components/src/dynamo/vllm/args.py (1)

139-144: ⚠️ Potential issue | 🟡 Minor

Run Black to fix formatting drift.

CI reports Black reformatting this file; please run Black on components/src/dynamo/vllm/args.py (likely the long help string here).

🤖 Fix all issues with AI agents

In `@ATTRIBUTIONS-Python.md`:
- Around line 10982-11169: The ATTRIBUTIONS entry for "flexkv (1.2.1)" is
incorrect — either remove it or verify and correct its source/version; search
the project dependency declarations (pyproject.toml) for "flexkv" and any
references in the vllm args/connector code (the module that defines the optional
connector in vllm/args.py) to determine whether flexkv is actually a transitive
dependency or a GitHub-only dependency, then: if not used, remove the "flexkv
(1.2.1)" block from ATTRIBUTIONS-Python.md; if used from GitHub, replace the
PyPI version with the GitHub URL and commit metadata in the attribution entry;
if used from a specific tag/commit, update the version string to that tag/commit
and ensure attribution text reflects the source.

In `@components/src/dynamo/vllm/args.py`:
- Around line 533-536: The diff sets connector_cfg to use a non-existent class
"FlexKVDynamoConnectorV1", which will fail at runtime; update the branch that
handles connector == "flexkv" to either set a valid built-in connector (e.g.,
use "DynamoConnector" or "NixlConnector") or supply a kv_connector_module_path
that points to the module implementing FlexKVDynamoConnectorV1; modify the
connector_cfg assignment (the same place LMCacheConnectorV1 is used) to include
kv_connector_module_path if you intend to keep a custom class, or replace
"FlexKVDynamoConnectorV1" with an existing class name so the connector is
discoverable at runtime.

In `@docs/backends/vllm/flexkv_integration.md`:
- Around line 15-31: The section headings contain misspellings: change
"Aggregrated Serving" to "Aggregated Serving" and "Disaggregrated Serving" to
"Disaggregated Serving" in the document; update both header lines that precede
the FlexKV usage block and the disagg example reference so the corrected
headings read "Aggregated Serving" and "Disaggregated Serving" while leaving the
code snippet (`python -m dynamo.vllm --model $YOUR_MODEL --connector flexkv`)
and example file references (`agg_flexkv.sh`, `disagg_flexkv.sh`) unchanged.

In `@examples/backends/vllm/launch/agg_flexkv_router.sh`:
- Around line 1-3: Add the project's standard copyright/header block to the top
of the script (above or immediately before the existing shebang line
"#!/bin/bash") so the file matches the repo header conventions, and then mark
the file executable (e.g., run chmod +x on
examples/backends/vllm/launch/agg_flexkv_router.sh) so CI stops flagging it as
non-executable.

In `@examples/backends/vllm/launch/agg_flexkv.sh`:
- Around line 1-3: The script examples/backends/vllm/launch/agg_flexkv.sh is
missing the required copyright/header and isn’t marked executable; add the
project's copyright/header block to the top of agg_flexkv.sh (above the shebang)
and update file permissions to be executable (e.g., chmod +x) so CI recognizes
the script as executable.

In `@examples/backends/vllm/launch/disagg_flexkv.sh`:
- Around line 1-3: Add an SPDX copyright header immediately after the existing
shebang line (#!/bin/bash) in the script
(examples/backends/vllm/launch/disagg_flexkv.sh) and commit that change, and
ensure the file is executable by setting its mode (e.g., run git update-index
--chmod=+x examples/backends/vllm/launch/disagg_flexkv.sh or chmod +x before
committing) so CI no longer flags a missing header or non-executable script;
keep the existing trap and set -e lines intact and place the SPDX lines between
the shebang and the first command.

🧹 Nitpick comments (2)

examples/backends/vllm/launch/agg_flexkv_router.sh (2)
17-21: Quote the $MODEL variable.

While the current value is safe, unquoted variables can cause word splitting or globbing issues if the value contains spaces or special characters.
Proposed fix
 python -m dynamo.vllm \
-    --model $MODEL \
+    --model "$MODEL" \
     --connector flexkv \
27-31: Quote the $MODEL variable here as well.

Same issue as the first worker invocation.
Proposed fix
 python -m dynamo.vllm \
-    --model $MODEL \
+    --model "$MODEL" \
     --connector flexkv \

ATTRIBUTIONS-Python.md

components/src/dynamo/vllm/args.py

docs/backends/vllm/flexkv_integration.md

examples/backends/vllm/launch/agg_flexkv_router.sh

examples/backends/vllm/launch/agg_flexkv.sh

examples/backends/vllm/launch/disagg_flexkv.sh

Jont828 · 2026-02-02T21:35:00Z

Hi @YconquestY! We're currently migrating our docs from Sphinx (docs/) to Fern (fern/).

Could you please also apply your changes to the corresponding file(s) in fern/pages/? Here's an idea on where to migrate the files:

docs/ file	fern/ equivalent
`docs/backends/vllm/README.md`	`fern/pages/backends/vllm/README.md`
`docs/backends/vllm/flexkv_integration.md`	`fern/pages/backends/vllm/flexkv-integration.md` (new file)
`docs/design_docs/architecture.md`	`fern/pages/design-docs/architecture.md`
`docs/kvbm/kvbm_intro.rst`	`fern/pages/kvbm/kvbm-intro.md`

Note: The new file flexkv-integration.md should be added to the hidden "Backend Details > vLLM" section in fern/versions/next.yml. This is just a suggestion, though, feel free to go with something else if you have a better idea.

Thanks for helping keep both doc sets in sync during the migration!

Migrate KVBM documentation to a new three-tier structure: - Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture - Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions) - Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations Create integrations folder with: - flexkv_integration.md - new FlexKV integration guide from PR #5858 - lmcache_integration.md - migrated from backends/vllm/ - sglang_hicache.md - migrated from backends/sglang/ Add AGENTS.md for KVBM component to guide AI agents. Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides.

Migrate KVBM documentation to a new three-tier structure: - Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture - Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions) - Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations Create integrations folder with: - flexkv_integration.md - new FlexKV integration guide from PR #5858 - lmcache_integration.md - migrated from backends/vllm/ - sglang_hicache.md - migrated from backends/sglang/ Add AGENTS.md for KVBM component to guide AI agents. Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides. Signed-off-by: akshatha-k <akshutk@gmail.com>

YconquestY · 2026-02-11T09:14:05Z

Hi @Jont828, thanks for the review. FlexKV doc is updated as requested :)

akshatha-k · 2026-02-11T19:16:39Z

Hi @YconquestY - I only see the vLLM changes, did you also want to add the changes for flexKV with TRTLLM?

YconquestY · 2026-02-12T04:07:31Z

Hi @akshatha-k. FlexKV does support TensorRT-LLM and SGLang. And yes, we will support Dynamo + TensorRT-LLM/SGLang + FlexKV in a future PR. This PR addresses Dynamo + vLLM + FlexKV only.

examples/backends/vllm/launch/disagg_flexkv.sh

alec-flowers · 2026-03-02T03:02:24Z

In vLLM we don't support --connector x syntax anymore. We solely use --kv-transfer-config . The only reason there should be any connector mentioned is to print out the syntax the user needs to adopt.

alec-flowers · 2026-03-02T05:48:12Z

/ok to test b28766e

Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>

alec-flowers · 2026-03-03T03:46:10Z

/ok to test 72a025c

YconquestY requested a review from a team as a code owner February 1, 2026 10:45

Copilot AI review requested due to automatic review settings February 1, 2026 10:45

YconquestY requested review from a team as code owners February 1, 2026 10:45

pull-request-size bot added the size/L label Feb 1, 2026

github-actions bot added external-contribution Pull request is from an external contributor documentation Improvements or additions to documentation backend::vllm Relates to the vllm backend labels Feb 1, 2026

Copilot started reviewing on behalf of YconquestY February 1, 2026 10:46 View session

YconquestY changed the title ~~FlexKV integration in Dynamo~~ feat: FlexKV integration in Dynamo Feb 1, 2026

github-actions bot added the feat label Feb 1, 2026

Copilot AI reviewed Feb 1, 2026

View reviewed changes

coderabbitai bot reviewed Feb 1, 2026

View reviewed changes

akshatha-k mentioned this pull request Feb 3, 2026

docs: restructure KVBM documentation into three-tier format #5905

Merged

4 tasks

linhu-nv force-pushed the feature/support_flexkv branch from 222a9a5 to cb5b03a Compare February 11, 2026 06:47

linhu-nv force-pushed the feature/support_flexkv branch 2 times, most recently from 22cfaf6 to 07f9b38 Compare February 26, 2026 09:03

ziqifan617 approved these changes Mar 2, 2026

View reviewed changes

alec-flowers reviewed Mar 2, 2026

View reviewed changes

examples/backends/vllm/launch/disagg_flexkv.sh Outdated Show resolved Hide resolved

alec-flowers reviewed Mar 2, 2026

View reviewed changes

examples/backends/vllm/launch/disagg_flexkv.sh Outdated Show resolved Hide resolved

linhu-nv force-pushed the feature/support_flexkv branch from 07f9b38 to cc0dd50 Compare March 2, 2026 05:19

alec-flowers approved these changes Mar 2, 2026

View reviewed changes

linhu-nv force-pushed the feature/support_flexkv branch from 840d1b5 to b28766e Compare March 2, 2026 05:41

copy-pr-bot bot temporarily deployed to GITLAB March 2, 2026 05:48 Inactive

YconquestY and others added 8 commits March 3, 2026 09:59

Add README and use cases for FlexKV

52e3f1b

Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>

add more docs and fix typos

e0ae7a8

Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>

add flexKV connector

c2635d1

Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>

update usage of flexkv

454d6dd

Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>

update doc

efbc762

Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>

update flexkvconnector

87e2c8b

Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>

use new kvcache connector syntax

a91e12b

Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>

chore: add DCO signoff

72a025c

Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>

linhu-nv force-pushed the feature/support_flexkv branch from b28766e to 72a025c Compare March 3, 2026 02:01

copy-pr-bot bot had a problem deploying to GITLAB March 3, 2026 03:46 Failure

alec-flowers enabled auto-merge (squash) March 3, 2026 03:46

alec-flowers merged commit 4d7c984 into ai-dynamo:main Mar 3, 2026
58 of 59 checks passed

Conversation

YconquestY commented Feb 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 1, 2026

Uh oh!

github-actions bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jont828 commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YconquestY commented Feb 11, 2026

Uh oh!

akshatha-k commented Feb 11, 2026

Uh oh!

YconquestY commented Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

alec-flowers commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alec-flowers commented Mar 2, 2026

Uh oh!

alec-flowers commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

YconquestY commented Feb 1, 2026 •

edited by coderabbitai bot

Loading

github-actions bot commented Feb 1, 2026 •

edited

Loading

coderabbitai bot commented Feb 1, 2026 •

edited

Loading

Jont828 commented Feb 2, 2026 •

edited

Loading

alec-flowers commented Mar 2, 2026 •

edited

Loading