Skip to content

feat: FlexKV integration in Dynamo#5858

Merged
alec-flowers merged 8 commits intoai-dynamo:mainfrom
YconquestY:feature/support_flexkv
Mar 3, 2026
Merged

feat: FlexKV integration in Dynamo#5858
alec-flowers merged 8 commits intoai-dynamo:mainfrom
YconquestY:feature/support_flexkv

Conversation

@YconquestY
Copy link
Copy Markdown
Contributor

@YconquestY YconquestY commented Feb 1, 2026

Overview:

This PR enables using FlexKV as the KV cache offloading backend for Dynamo vLLM worker.

Details:

FlexKV is a scalable, distributed runtime for KV cache offloading. It acts as a unified KV caching layer for inference engines like vLLM, TensorRT-LLM and SGLang.

Where should the reviewer start?

The only code change happens in components/src/dynamo/vllm/args.py.

Related Issues:

None

Summary by CodeRabbit

  • New Features

    • Added FlexKV as a new connector option for vLLM deployment
  • Documentation

    • Added comprehensive FlexKV integration guide with setup instructions
    • Updated vLLM deployment documentation to include FlexKV configuration
    • Added example deployment scripts for aggregated and disaggregated FlexKV setups
    • Added license attribution for FlexKV package

✏️ Tip: You can customize this high-level summary in your review settings.

@YconquestY YconquestY requested a review from a team as a code owner February 1, 2026 10:45
Copilot AI review requested due to automatic review settings February 1, 2026 10:45
@YconquestY YconquestY requested review from a team as code owners February 1, 2026 10:45
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 1, 2026

@github-actions github-actions bot added external-contribution Pull request is from an external contributor documentation Improvements or additions to documentation backend::vllm Relates to the vllm backend labels Feb 1, 2026
@YconquestY YconquestY changed the title FlexKV integration in Dynamo feat: FlexKV integration in Dynamo Feb 1, 2026
@github-actions github-actions bot added the feat label Feb 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates FlexKV as a first-class KV transfer connector for the Dynamo vLLM backend, with accompanying examples and documentation updates. It adds a new flexkv connector option, wiring it into the vLLM KV transfer configuration, and documents how to deploy and use it in aggregated and disaggregated serving setups.

Changes:

  • Extend vLLM backend argument parsing to recognize flexkv as a valid connector and map it to FlexKVDynamoConnectorV1 in KVTransferConfig.
  • Add example launch scripts for aggregated, aggregated+router, and disaggregated FlexKV-powered vLLM deployments, including corresponding Kubernetes README updates.
  • Update architecture and backend docs, hidden toctree, KVBM intro, and attributions to reference FlexKV and provide a dedicated FlexKV integration guide.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
components/src/dynamo/vllm/args.py Adds flexkv to VALID_CONNECTORS, CLI help, and create_kv_transfer_config, mapping it to FlexKVDynamoConnectorV1 so vLLM can instantiate the FlexKV connector via the --connector flag.
examples/backends/vllm/launch/agg_flexkv.sh New single-worker aggregated example that runs dynamo.frontend and a vLLM worker configured with --connector flexkv and FlexKV-related env vars.
examples/backends/vllm/launch/agg_flexkv_router.sh New aggregated + KV router example wiring dynamo.frontend in --router-mode kv with two FlexKV-enabled vLLM workers and explicit --kv-events-config.
examples/backends/vllm/launch/disagg_flexkv.sh New disaggregated example that runs separate decode and FlexKV-enabled prefill workers, using both nixl and flexkv connectors to support KV transfer and offload.
examples/backends/vllm/deploy/README.md Updates the documented --connector options to include flexkv so Kubernetes deployment templates reflect the new backend.
docs/kvbm/kvbm_intro.rst Adds a toctree entry linking to the new FlexKV integration documentation from the KVBM intro page.
docs/hidden_toctree.rst Registers backends/vllm/flexkv_integration.md in the hidden docs toctree so it’s included in the built documentation.
docs/design_docs/architecture.md Extends the “Memory bottlenecks” discussion to reference FlexKV as one of the KV offloading systems.
docs/backends/vllm/flexkv_integration.md New guide explaining what FlexKV is, how to enable it via DYNAMO_USE_FLEXKV, and how to run aggregated and disaggregated FlexKV deployments with links to example scripts (note: contains minor spelling nits already flagged).
docs/backends/vllm/README.md Updates the vLLM backend feature matrix and configuration section to include FlexKV as a supported connector and link to the FlexKV integration guide.
ATTRIBUTIONS-Python.md Adds third-party attribution and Apache-2.0 license text for the flexkv Python dependency (version 1.2.1).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 1, 2026

Walkthrough

This change adds FlexKV support to the Dynamo vLLM backend, including Apache 2.0 license attribution, connector configuration handling, comprehensive documentation on FlexKV integration with architecture references, and example deployment scripts for both aggregated and disaggregated serving modes with multi-worker setup capabilities.

Changes

Cohort / File(s) Summary
License Attribution
ATTRIBUTIONS-Python.md
Adds flexkv (1.2.1) package entry with Apache 2.0 license text and metadata URLs (homepage, source).
Connector Configuration
components/src/dynamo/vllm/args.py
Adds flexkv to VALID_CONNECTORS set and creates configuration branch mapping flexkv connector to FlexKVDynamoConnectorV1 with kv_role kv_both.
Documentation Updates
docs/backends/vllm/README.md, docs/design_docs/architecture.md, docs/kvbm/kvbm_intro.rst, examples/backends/vllm/deploy/README.md
Updates feature tables, CLI documentation, and architecture references to include flexkv as documented option alongside nixl, lmcache, kvbm, and none.
Integration Guide
docs/backends/vllm/flexkv_integration.md
New documentation file covering FlexKV enablement via DYNAMO_USE_FLEXKV, aggregated/disaggregated serving patterns, and configuration guidance with setup script references.
Navigation Updates
docs/hidden_toctree.rst
Adds flexkv_integration.md to documentation toctree for discoverability.
Deployment Scripts
examples/backends/vllm/launch/agg_flexkv.sh, examples/backends/vllm/launch/agg_flexkv_router.sh, examples/backends/vllm/launch/disagg_flexkv.sh
Introduces three bash launcher scripts demonstrating aggregated single/multi-worker and disaggregated configurations with FlexKV, CPU cache settings, IPC ports, CUDA device binding, and KV event publisher configuration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 FlexKV hops into the scene with grace,
New connectors now join the race,
Docs and scripts in place so fine,
License terms aligned in a line,
Dynamo's backend now shines with flexkv divine! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description covers the template's required sections (Overview, Details, Where to start, Related Issues) with meaningful content explaining FlexKV's purpose and directing reviewers to the key code changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title 'feat: FlexKV integration in Dynamo' accurately and concisely summarizes the main change—adding FlexKV support across configuration, documentation, and example scripts.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/src/dynamo/vllm/args.py (1)

139-144: ⚠️ Potential issue | 🟡 Minor

Run Black to fix formatting drift.

CI reports Black reformatting this file; please run Black on components/src/dynamo/vllm/args.py (likely the long help string here).

🤖 Fix all issues with AI agents
In `@ATTRIBUTIONS-Python.md`:
- Around line 10982-11169: The ATTRIBUTIONS entry for "flexkv (1.2.1)" is
incorrect — either remove it or verify and correct its source/version; search
the project dependency declarations (pyproject.toml) for "flexkv" and any
references in the vllm args/connector code (the module that defines the optional
connector in vllm/args.py) to determine whether flexkv is actually a transitive
dependency or a GitHub-only dependency, then: if not used, remove the "flexkv
(1.2.1)" block from ATTRIBUTIONS-Python.md; if used from GitHub, replace the
PyPI version with the GitHub URL and commit metadata in the attribution entry;
if used from a specific tag/commit, update the version string to that tag/commit
and ensure attribution text reflects the source.

In `@components/src/dynamo/vllm/args.py`:
- Around line 533-536: The diff sets connector_cfg to use a non-existent class
"FlexKVDynamoConnectorV1", which will fail at runtime; update the branch that
handles connector == "flexkv" to either set a valid built-in connector (e.g.,
use "DynamoConnector" or "NixlConnector") or supply a kv_connector_module_path
that points to the module implementing FlexKVDynamoConnectorV1; modify the
connector_cfg assignment (the same place LMCacheConnectorV1 is used) to include
kv_connector_module_path if you intend to keep a custom class, or replace
"FlexKVDynamoConnectorV1" with an existing class name so the connector is
discoverable at runtime.

In `@docs/backends/vllm/flexkv_integration.md`:
- Around line 15-31: The section headings contain misspellings: change
"Aggregrated Serving" to "Aggregated Serving" and "Disaggregrated Serving" to
"Disaggregated Serving" in the document; update both header lines that precede
the FlexKV usage block and the disagg example reference so the corrected
headings read "Aggregated Serving" and "Disaggregated Serving" while leaving the
code snippet (`python -m dynamo.vllm --model $YOUR_MODEL --connector flexkv`)
and example file references (`agg_flexkv.sh`, `disagg_flexkv.sh`) unchanged.

In `@examples/backends/vllm/launch/agg_flexkv_router.sh`:
- Around line 1-3: Add the project's standard copyright/header block to the top
of the script (above or immediately before the existing shebang line
"#!/bin/bash") so the file matches the repo header conventions, and then mark
the file executable (e.g., run chmod +x on
examples/backends/vllm/launch/agg_flexkv_router.sh) so CI stops flagging it as
non-executable.

In `@examples/backends/vllm/launch/agg_flexkv.sh`:
- Around line 1-3: The script examples/backends/vllm/launch/agg_flexkv.sh is
missing the required copyright/header and isn’t marked executable; add the
project's copyright/header block to the top of agg_flexkv.sh (above the shebang)
and update file permissions to be executable (e.g., chmod +x) so CI recognizes
the script as executable.

In `@examples/backends/vllm/launch/disagg_flexkv.sh`:
- Around line 1-3: Add an SPDX copyright header immediately after the existing
shebang line (#!/bin/bash) in the script
(examples/backends/vllm/launch/disagg_flexkv.sh) and commit that change, and
ensure the file is executable by setting its mode (e.g., run git update-index
--chmod=+x examples/backends/vllm/launch/disagg_flexkv.sh or chmod +x before
committing) so CI no longer flags a missing header or non-executable script;
keep the existing trap and set -e lines intact and place the SPDX lines between
the shebang and the first command.
🧹 Nitpick comments (2)
examples/backends/vllm/launch/agg_flexkv_router.sh (2)

17-21: Quote the $MODEL variable.

While the current value is safe, unquoted variables can cause word splitting or globbing issues if the value contains spaces or special characters.

Proposed fix
 python -m dynamo.vllm \
-    --model $MODEL \
+    --model "$MODEL" \
     --connector flexkv \

27-31: Quote the $MODEL variable here as well.

Same issue as the first worker invocation.

Proposed fix
 python -m dynamo.vllm \
-    --model $MODEL \
+    --model "$MODEL" \
     --connector flexkv \

@Jont828
Copy link
Copy Markdown
Contributor

Jont828 commented Feb 2, 2026

Hi @YconquestY! We're currently migrating our docs from Sphinx (docs/) to Fern (fern/).

Could you please also apply your changes to the corresponding file(s) in fern/pages/? Here's an idea on where to migrate the files:

docs/ file fern/ equivalent
docs/backends/vllm/README.md fern/pages/backends/vllm/README.md
docs/backends/vllm/flexkv_integration.md fern/pages/backends/vllm/flexkv-integration.md (new file)
docs/design_docs/architecture.md fern/pages/design-docs/architecture.md
docs/kvbm/kvbm_intro.rst fern/pages/kvbm/kvbm-intro.md

Note: The new file flexkv-integration.md should be added to the hidden "Backend Details > vLLM" section in fern/versions/next.yml. This is just a suggestion, though, feel free to go with something else if you have a better idea.

Thanks for helping keep both doc sets in sync during the migration!

akshatha-k added a commit that referenced this pull request Feb 3, 2026
Migrate KVBM documentation to a new three-tier structure:
- Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture
- Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions)
- Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations

Create integrations folder with:
- flexkv_integration.md - new FlexKV integration guide from PR #5858
- lmcache_integration.md - migrated from backends/vllm/
- sglang_hicache.md - migrated from backends/sglang/

Add AGENTS.md for KVBM component to guide AI agents.

Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides.
akshatha-k added a commit that referenced this pull request Feb 4, 2026
Migrate KVBM documentation to a new three-tier structure:
- Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture
- Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions)
- Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations

Create integrations folder with:
- flexkv_integration.md - new FlexKV integration guide from PR #5858
- lmcache_integration.md - migrated from backends/vllm/
- sglang_hicache.md - migrated from backends/sglang/

Add AGENTS.md for KVBM component to guide AI agents.

Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides.

Signed-off-by: akshatha-k <akshutk@gmail.com>
@linhu-nv linhu-nv force-pushed the feature/support_flexkv branch from 222a9a5 to cb5b03a Compare February 11, 2026 06:47
@YconquestY
Copy link
Copy Markdown
Contributor Author

Hi @Jont828, thanks for the review. FlexKV doc is updated as requested :)

@akshatha-k
Copy link
Copy Markdown
Contributor

Hi @YconquestY - I only see the vLLM changes, did you also want to add the changes for flexKV with TRTLLM?

@YconquestY
Copy link
Copy Markdown
Contributor Author

Hi @akshatha-k. FlexKV does support TensorRT-LLM and SGLang. And yes, we will support Dynamo + TensorRT-LLM/SGLang + FlexKV in a future PR. This PR addresses Dynamo + vLLM + FlexKV only.

@linhu-nv linhu-nv force-pushed the feature/support_flexkv branch 2 times, most recently from 22cfaf6 to 07f9b38 Compare February 26, 2026 09:03
@alec-flowers
Copy link
Copy Markdown
Contributor

alec-flowers commented Mar 2, 2026

In vLLM we don't support --connector x syntax anymore. We solely use --kv-transfer-config . The only reason there should be any connector mentioned is to print out the syntax the user needs to adopt.

@linhu-nv linhu-nv force-pushed the feature/support_flexkv branch from 07f9b38 to cc0dd50 Compare March 2, 2026 05:19
@linhu-nv linhu-nv force-pushed the feature/support_flexkv branch from 840d1b5 to b28766e Compare March 2, 2026 05:41
@alec-flowers
Copy link
Copy Markdown
Contributor

/ok to test b28766e

YconquestY and others added 8 commits March 3, 2026 09:59
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
Signed-off-by: zhuofan1123 <zhuofanl@nvidia.com>
@linhu-nv linhu-nv force-pushed the feature/support_flexkv branch from b28766e to 72a025c Compare March 3, 2026 02:01
@alec-flowers
Copy link
Copy Markdown
Contributor

/ok to test 72a025c

@alec-flowers alec-flowers enabled auto-merge (squash) March 3, 2026 03:46
@alec-flowers alec-flowers merged commit 4d7c984 into ai-dynamo:main Mar 3, 2026
58 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::vllm Relates to the vllm backend documentation Improvements or additions to documentation external-contribution Pull request is from an external contributor feat size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants