docs: restructure KVBM documentation into three-tier format by akshatha-k · Pull Request #5905 · ai-dynamo/dynamo

akshatha-k · 2026-02-03T05:07:47Z

Summary

Migrate KVBM documentation to a new three-tier structure:
- Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture
- Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions)
- Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations
Create integrations folder with:
- flexkv_integration.md - new FlexKV integration guide from PR feat: FlexKV integration in Dynamo #5858
- lmcache_integration.md - migrated from backends/vllm/
- sglang_hicache.md - migrated from backends/sglang/
Add AGENTS.md for KVBM component to guide AI agents
Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides

Test plan

Verify docs website renders correctly with new structure
Verify all links work (internal cross-references and external links)
Verify all diagrams render properly
Verify KVBM usage guides are correct and up to date

Summary by CodeRabbit

Release Notes

Documentation
- Reorganized KV Cache Offloading (KVBM) documentation with a new user guide, design documentation, and deployment guidance.
- Added comprehensive FlexKV integration documentation.
- Enhanced SGLang Hierarchical Cache documentation with configuration, deployment, benchmarking, and troubleshooting guidance.
- Restructured documentation navigation for improved accessibility and clarity.

copy-pr-bot · 2026-02-03T05:07:51Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-03T05:11:50Z

Walkthrough

This pull request restructures KVBM documentation by consolidating scattered backend-specific guides into unified integration documentation, removing fragmented setup files, and introducing comprehensive design and operational guides. Updates navigation structure via toctree changes to reflect the new documentation organization.

Changes

Cohort / File(s)	Summary
KVBM Documentation Consolidation `docs/kvbm/kvbm_architecture.md`, `docs/kvbm/kvbm_components.md`, `docs/kvbm/kvbm_design_deepdive.md`, `docs/kvbm/kvbm_integrations.md`, `docs/kvbm/kvbm_motivation.md`, `docs/kvbm/kvbm_reading.md`, `docs/kvbm/trtllm-setup.md`, `docs/kvbm/vllm-setup.md`	Removes 8 documentation files (894 lines total), including fragmented architecture, components, design deep-dive, integration guides, and framework-specific setup instructions.
KVBM Documentation Consolidation `docs/kvbm/README.md`, `docs/kvbm/kvbm_design.md`, `docs/kvbm/kvbm_guide.md`	Adds 3 new files (850 lines total) providing unified KVBM overview, design documentation, and operational guide for deployment and configuration.
Integration Documentation `docs/integrations/flexkv_integration.md`, `docs/integrations/sglang_hicache.md`	Adds 2 comprehensive integration guides (692 lines total) for FlexKV and SGLang HiCache, consolidating previously scattered backend-specific examples into integration-focused documentation.
Documentation Navigation `docs/index.rst`, `docs/kvbm/kvbm_intro.rst`, `docs/backends/sglang/sgl-hicache-example.md`	Updates toctree structure and navigation references to reflect new documentation organization; removes SGLang HiCache example from backends directory (65 lines).
Content Refinements `docs/integrations/lmcache_integration.md`	Minor textual updates, punctuation corrections, and list reflowing without altering functionality or configuration guidance (+35/−17 lines).
AI Agent Documentation `lib/bindings/kvbm/AGENTS.md`	Adds new comprehensive guide (288 lines) for AI agents, detailing KVBM architecture, usage patterns, key commands, and integration points.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A burrow of docs, once scattered and wide,
Now nestled together, organized with pride!
New guides like carrots, so golden and clear,
For FlexKV and HiCache to users so dear.
The old paths are pruned, the new ones take flight,
Architecture and guides, all gleaming so bright! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: restructuring KVBM documentation into a three-tier format, which is the primary objective of this PR.
Description check	✅ Passed	The description covers all required template sections with substantive details: Overview (summary of three-tier structure), Details (specific file changes and additions), implied reviewer guidance, and test plan checklist.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/kvbm/kvbm_intro.rst (1)

23-23: ⚠️ Potential issue | 🟡 Minor

Tighten spacing/wording in the memory list item.

“GPU memory(in future) ,” reads as a typo. Consider “GPU memory (future)” and remove the extra space before the comma.

🤖 Fix all issues with AI agents

In `@docs/integrations/flexkv_integration.md`:
- Around line 70-74: The docs reference three missing launch
scripts—agg_flexkv.sh, agg_flexkv_router.sh, and disagg_flexkv.sh—so either add
those scripts to the launch scripts set with the expected
aggregator/router/disaggregator start logic (matching the style and env var
usage of the other launch scripts) or update the documentation to point to the
actual script names or remove the calls; make the change by adding the three
shell scripts with clear usage/help text or editing the doc examples to use the
correct existing script names so the examples run as shown.

In `@docs/integrations/sglang_hicache.md`:
- Line 1: Remove the trailing whitespace at the end of line 1 in
docs/integrations/sglang_hicache.md (the empty/HTML comment line at the top),
save the file, and commit the change so the pre-commit hook and pipeline no
longer fail.
- Line 445: Update the broken markdown link in
docs/integrations/sglang_hicache.md by replacing the removed target
"../kvbm/kvbm_architecture.md" with the new file "../kvbm/kvbm_design.md" so the
"**[KVBM Architecture](... )**" link points to the existing kvbm_design.md;
ensure the link text ("KVBM Architecture") remains the same and the relative
path is correct.
- Around line 122-124: Update the fenced code block that contains the formula
"Host KV Cache Size = Device KV Cache Size × hicache-ratio" to include a
language specifier (use "text") after the opening triple backticks so the block
is rendered correctly; locate the code block in
docs/integrations/sglang_hicache.md by searching for that exact formula and
change the opening fence from ``` to ```text.

In `@docs/kvbm/kvbm_design.md`:
- Around line 103-105: The fenced code block containing the formula
"block_stride_in_bytes = align_up(num_layers × layer_stride, alignment);" should
include a language specifier (e.g., text) so renderers treat it as plain
text/pseudocode; update the triple-backtick fence to include `text` (or convert
the single-line formula to inline code) around the expression involving
block_stride_in_bytes, align_up, num_layers, layer_stride, and alignment.

In `@docs/kvbm/kvbm_guide.md`:
- Line 1: Remove the trailing whitespace found on line 1 of the
docs/kvbm/kvbm_guide.md file: edit the file to delete the extra space at the end
of the first line (so it ends with the last visible character, not a space),
save and recommit so the pre-commit hook can pass; no code changes required
beyond trimming that whitespace.
- Around line 46-50: Update the broken link to the KVBM bindings README by
replacing the lowercase filename reference "readme.md" with the actual uppercase
filename "README.md" in the link text that points to the "KVBM bindings README"
(the line containing "To build KVBM from source, see the detailed instructions
in the [KVBM bindings
README](../../lib/bindings/kvbm/readme.md#build-from-source)"). Ensure the
Markdown link now uses "README.md" so it works on case-sensitive filesystems.

docs/integrations/flexkv_integration.md

docs/integrations/sglang_hicache.md

docs/kvbm/kvbm_design.md

docs/kvbm/kvbm_guide.md

Migrate KVBM documentation to a new three-tier structure: - Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture - Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions) - Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations Create integrations folder with: - flexkv_integration.md - new FlexKV integration guide from PR #5858 - lmcache_integration.md - migrated from backends/vllm/ - sglang_hicache.md - migrated from backends/sglang/ Add AGENTS.md for KVBM component to guide AI agents. Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides. Signed-off-by: akshatha-k <akshutk@gmail.com>

Update fern directory with new three-tier KVBM documentation: - Add fern/pages/kvbm/README.md (Quick Start) - Add fern/pages/kvbm/kvbm-guide.md (Guide) - Add fern/pages/kvbm/kvbm-design.md (Design) Create fern/pages/integrations/ with: - lmcache-integration.md - flexkv-integration.md - sglang-hicache.md Update fern/versions/next.yml navigation: - Add "KV Cache Offloading" to User Guides section - Update KVBM section with new structure - Add Integrations section Delete old fern KVBM files that were replaced. Signed-off-by: akshatha-k <akshutk@gmail.com>

- Remove references to non-existent FlexKV launch scripts (agg_flexkv.sh, agg_flexkv_router.sh, disagg_flexkv.sh) from docs and fern - Add language specifier 'text' to code blocks showing formulas in sglang_hicache.md and kvbm_design.md - Fix broken link: kvbm_architecture.md → kvbm_design.md in sglang_hicache.md - Fix case sensitivity: readme.md → README.md in kvbm_guide.md Signed-off-by: akshatha-k <akshutk@gmail.com>

Move the Grafana screenshot to the central images directory and update the reference in kvbm_guide.md to use the new path. Signed-off-by: akshatha-k <akshutk@gmail.com>

Update links to reflect new documentation structure: - kvbm_architecture.md → README.md (KVBM overview) - LMCache_Integration.md → integrations/lmcache_integration.md - trtllm-setup.md → kvbm_guide.md#run-kvbm-in-dynamo-with-tensorrt-llm Affected files: - README.md - docs/backends/sglang/README.md - docs/backends/trtllm/README.md - docs/backends/vllm/README.md - docs/backends/vllm/prometheus.md - fern/pages/backends/sglang/README.md Signed-off-by: akshatha-k <akshutk@gmail.com>

Update links in fern documentation to match new KVBM structure: - Update KVBM and LMCache links in fern/pages/backends/ - Fix cross-references in fern/pages/integrations/sglang-hicache.md Signed-off-by: akshatha-k <akshutk@gmail.com>

dagil-nvidia · 2026-02-04T23:17:49Z

/ok to test ce8103b

This commit reverts all changes made to the fern/ directory, restoring it to its state before this PR's documentation changes. Signed-off-by: akshatha-k <akshutk@gmail.com>

Signed-off-by: akshatha-k <akshutk@gmail.com>

dagil-nvidia · 2026-02-05T15:55:55Z

/ok to test a1405da

dagil-nvidia · 2026-02-05T21:27:17Z

/ok to test 130ed97

- Remove trailing whitespace in kvbm_guide.md (pre-commit failure) - Remove deleted kvbm/vllm-setup.md and kvbm/trtllm-setup.md from hidden_toctree.rst - Remove moved backends/vllm/LMCache_Integration.md from hidden_toctree.rst - Fix spacing in kvbm_intro.rst: "GPU memory(in future) ," → "GPU memory (future)," Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

dagil-nvidia · 2026-02-05T21:38:37Z

/ok to test 9edc4ba

nv-kmcgill53

I tried to inline everything the best I could. I approve of the new layout, just some changes on the content

lib/bindings/kvbm/AGENTS.md

docs/kvbm/kvbm_design.md

- Remove lib/bindings/kvbm/AGENTS.md from this PR (submit separately) - Add blank lines before image captions for proper rendering - Add G4 (Remote Storage) to storage pools section - Add tier labels (G3, G4) to Host → Disk offload descriptions Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

dagil-nvidia · 2026-02-05T22:06:22Z

/ok to test cf7598a

Co-authored-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com> Signed-off-by: dagil-nvidia <dagil@nvidia.com>

dagil-nvidia · 2026-02-05T22:17:52Z

/ok to test e86c9ed

…o#5905) Signed-off-by: akshatha-k <akshutk@gmail.com> Signed-off-by: Dan Gil <dagil@nvidia.com> Signed-off-by: dagil-nvidia <dagil@nvidia.com> Co-authored-by: dagil-nvidia <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com>

akshatha-k requested review from harryskim and ziqifan617 February 3, 2026 05:07

pull-request-size bot added the size/XXL label Feb 3, 2026

github-actions bot added docs documentation Improvements or additions to documentation labels Feb 3, 2026

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

akshatha-k force-pushed the caching-doc-fixes branch 2 times, most recently from 1389998 to 69fa1fe Compare February 4, 2026 01:58

akshatha-k added 6 commits February 3, 2026 18:00

docs: move kvbm_metrics_grafana.png to docs/images

547bcde

Move the Grafana screenshot to the central images directory and update the reference in kvbm_guide.md to use the new path. Signed-off-by: akshatha-k <akshutk@gmail.com>

docs: fix broken links in fern pages

ce8103b

Update links in fern documentation to match new KVBM structure: - Update KVBM and LMCache links in fern/pages/backends/ - Fix cross-references in fern/pages/integrations/sglang-hicache.md Signed-off-by: akshatha-k <akshutk@gmail.com>

akshatha-k force-pushed the caching-doc-fixes branch from 69fa1fe to ce8103b Compare February 4, 2026 02:00

akshatha-k added 2 commits February 4, 2026 15:42

Revert: Remove all fern documentation changes from this PR

e676237

This commit reverts all changes made to the fern/ directory, restoring it to its state before this PR's documentation changes. Signed-off-by: akshatha-k <akshutk@gmail.com>

docs: restore sgl-hicache-example.md

2d4dfb2

Signed-off-by: akshatha-k <akshutk@gmail.com>

akshatha-k force-pushed the caching-doc-fixes branch from 5ef845f to 2d4dfb2 Compare February 4, 2026 23:43

akshatha-k added the kvbm label Feb 5, 2026

docs: replace sglang_hicache.md with simplified content

a1405da

Signed-off-by: akshatha-k <akshutk@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB February 5, 2026 15:56 Inactive

Merge branch 'main' into caching-doc-fixes

130ed97

copy-pr-bot bot had a problem deploying to GITLAB February 5, 2026 21:27 Failure

nv-kmcgill53 reviewed Feb 5, 2026

View reviewed changes

dagil-nvidia requested a review from nv-kmcgill53 February 5, 2026 22:06

nv-kmcgill53 approved these changes Feb 5, 2026

View reviewed changes

dagil-nvidia and others added 3 commits February 5, 2026 16:12

Apply suggestion from @nv-kmcgill53

fff6447

Co-authored-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com> Signed-off-by: dagil-nvidia <dagil@nvidia.com>

Apply suggestion from @nv-kmcgill53

19aafc7

Co-authored-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com> Signed-off-by: dagil-nvidia <dagil@nvidia.com>

Apply suggestion from @nv-kmcgill53

e86c9ed

Co-authored-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com> Signed-off-by: dagil-nvidia <dagil@nvidia.com>

dagil-nvidia enabled auto-merge (squash) February 5, 2026 22:21

dagil-nvidia disabled auto-merge February 5, 2026 22:55

dagil-nvidia merged commit 4c3eba2 into main Feb 6, 2026
64 checks passed

dagil-nvidia deleted the caching-doc-fixes branch February 6, 2026 00:02

dagil-nvidia mentioned this pull request Feb 6, 2026

docs: cleanup of docs refactor for components, integrations, and features #6019

Merged

8 tasks

Conversation

akshatha-k commented Feb 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dagil-nvidia commented Feb 4, 2026

Uh oh!

dagil-nvidia commented Feb 5, 2026

Uh oh!

dagil-nvidia commented Feb 5, 2026

Uh oh!

dagil-nvidia commented Feb 5, 2026

Uh oh!

nv-kmcgill53 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dagil-nvidia commented Feb 5, 2026

Uh oh!

dagil-nvidia commented Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akshatha-k commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

nv-kmcgill53 left a comment •

edited

Loading