Skip to content

docs: restructure KVBM documentation into three-tier format#5905

Merged
dagil-nvidia merged 15 commits intomainfrom
caching-doc-fixes
Feb 6, 2026
Merged

docs: restructure KVBM documentation into three-tier format#5905
dagil-nvidia merged 15 commits intomainfrom
caching-doc-fixes

Conversation

@akshatha-k
Copy link
Copy Markdown
Contributor

@akshatha-k akshatha-k commented Feb 3, 2026

Summary

  • Migrate KVBM documentation to a new three-tier structure:

    • Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture
    • Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions)
    • Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations
  • Create integrations folder with:

    • flexkv_integration.md - new FlexKV integration guide from PR feat: FlexKV integration in Dynamo #5858
    • lmcache_integration.md - migrated from backends/vllm/
    • sglang_hicache.md - migrated from backends/sglang/
  • Add AGENTS.md for KVBM component to guide AI agents

  • Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides

Test plan

  • Verify docs website renders correctly with new structure
  • Verify all links work (internal cross-references and external links)
  • Verify all diagrams render properly
  • Verify KVBM usage guides are correct and up to date

Summary by CodeRabbit

Release Notes

  • Documentation
    • Reorganized KV Cache Offloading (KVBM) documentation with a new user guide, design documentation, and deployment guidance.
    • Added comprehensive FlexKV integration documentation.
    • Enhanced SGLang Hierarchical Cache documentation with configuration, deployment, benchmarking, and troubleshooting guidance.
    • Restructured documentation navigation for improved accessibility and clarity.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added docs documentation Improvements or additions to documentation labels Feb 3, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 3, 2026

Walkthrough

This pull request restructures KVBM documentation by consolidating scattered backend-specific guides into unified integration documentation, removing fragmented setup files, and introducing comprehensive design and operational guides. Updates navigation structure via toctree changes to reflect the new documentation organization.

Changes

Cohort / File(s) Summary
KVBM Documentation Consolidation
docs/kvbm/kvbm_architecture.md, docs/kvbm/kvbm_components.md, docs/kvbm/kvbm_design_deepdive.md, docs/kvbm/kvbm_integrations.md, docs/kvbm/kvbm_motivation.md, docs/kvbm/kvbm_reading.md, docs/kvbm/trtllm-setup.md, docs/kvbm/vllm-setup.md
Removes 8 documentation files (894 lines total), including fragmented architecture, components, design deep-dive, integration guides, and framework-specific setup instructions.
KVBM Documentation Consolidation
docs/kvbm/README.md, docs/kvbm/kvbm_design.md, docs/kvbm/kvbm_guide.md
Adds 3 new files (850 lines total) providing unified KVBM overview, design documentation, and operational guide for deployment and configuration.
Integration Documentation
docs/integrations/flexkv_integration.md, docs/integrations/sglang_hicache.md
Adds 2 comprehensive integration guides (692 lines total) for FlexKV and SGLang HiCache, consolidating previously scattered backend-specific examples into integration-focused documentation.
Documentation Navigation
docs/index.rst, docs/kvbm/kvbm_intro.rst, docs/backends/sglang/sgl-hicache-example.md
Updates toctree structure and navigation references to reflect new documentation organization; removes SGLang HiCache example from backends directory (65 lines).
Content Refinements
docs/integrations/lmcache_integration.md
Minor textual updates, punctuation corrections, and list reflowing without altering functionality or configuration guidance (+35/−17 lines).
AI Agent Documentation
lib/bindings/kvbm/AGENTS.md
Adds new comprehensive guide (288 lines) for AI agents, detailing KVBM architecture, usage patterns, key commands, and integration points.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A burrow of docs, once scattered and wide,
Now nestled together, organized with pride!
New guides like carrots, so golden and clear,
For FlexKV and HiCache to users so dear.
The old paths are pruned, the new ones take flight,
Architecture and guides, all gleaming so bright! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: restructuring KVBM documentation into a three-tier format, which is the primary objective of this PR.
Description check ✅ Passed The description covers all required template sections with substantive details: Overview (summary of three-tier structure), Details (specific file changes and additions), implied reviewer guidance, and test plan checklist.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/kvbm/kvbm_intro.rst (1)

23-23: ⚠️ Potential issue | 🟡 Minor

Tighten spacing/wording in the memory list item.

“GPU memory(in future) ,” reads as a typo. Consider “GPU memory (future)” and remove the extra space before the comma.

🤖 Fix all issues with AI agents
In `@docs/integrations/flexkv_integration.md`:
- Around line 70-74: The docs reference three missing launch
scripts—agg_flexkv.sh, agg_flexkv_router.sh, and disagg_flexkv.sh—so either add
those scripts to the launch scripts set with the expected
aggregator/router/disaggregator start logic (matching the style and env var
usage of the other launch scripts) or update the documentation to point to the
actual script names or remove the calls; make the change by adding the three
shell scripts with clear usage/help text or editing the doc examples to use the
correct existing script names so the examples run as shown.

In `@docs/integrations/sglang_hicache.md`:
- Line 1: Remove the trailing whitespace at the end of line 1 in
docs/integrations/sglang_hicache.md (the empty/HTML comment line at the top),
save the file, and commit the change so the pre-commit hook and pipeline no
longer fail.
- Line 445: Update the broken markdown link in
docs/integrations/sglang_hicache.md by replacing the removed target
"../kvbm/kvbm_architecture.md" with the new file "../kvbm/kvbm_design.md" so the
"**[KVBM Architecture](... )**" link points to the existing kvbm_design.md;
ensure the link text ("KVBM Architecture") remains the same and the relative
path is correct.
- Around line 122-124: Update the fenced code block that contains the formula
"Host KV Cache Size = Device KV Cache Size × hicache-ratio" to include a
language specifier (use "text") after the opening triple backticks so the block
is rendered correctly; locate the code block in
docs/integrations/sglang_hicache.md by searching for that exact formula and
change the opening fence from ``` to ```text.

In `@docs/kvbm/kvbm_design.md`:
- Around line 103-105: The fenced code block containing the formula
"block_stride_in_bytes = align_up(num_layers × layer_stride, alignment);" should
include a language specifier (e.g., text) so renderers treat it as plain
text/pseudocode; update the triple-backtick fence to include `text` (or convert
the single-line formula to inline code) around the expression involving
block_stride_in_bytes, align_up, num_layers, layer_stride, and alignment.

In `@docs/kvbm/kvbm_guide.md`:
- Line 1: Remove the trailing whitespace found on line 1 of the
docs/kvbm/kvbm_guide.md file: edit the file to delete the extra space at the end
of the first line (so it ends with the last visible character, not a space),
save and recommit so the pre-commit hook can pass; no code changes required
beyond trimming that whitespace.
- Around line 46-50: Update the broken link to the KVBM bindings README by
replacing the lowercase filename reference "readme.md" with the actual uppercase
filename "README.md" in the link text that points to the "KVBM bindings README"
(the line containing "To build KVBM from source, see the detailed instructions
in the [KVBM bindings
README](../../lib/bindings/kvbm/readme.md#build-from-source)"). Ensure the
Markdown link now uses "README.md" so it works on case-sensitive filesystems.

@akshatha-k akshatha-k force-pushed the caching-doc-fixes branch 2 times, most recently from 1389998 to 69fa1fe Compare February 4, 2026 01:58
Migrate KVBM documentation to a new three-tier structure:
- Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture
- Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions)
- Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations

Create integrations folder with:
- flexkv_integration.md - new FlexKV integration guide from PR #5858
- lmcache_integration.md - migrated from backends/vllm/
- sglang_hicache.md - migrated from backends/sglang/

Add AGENTS.md for KVBM component to guide AI agents.

Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides.

Signed-off-by: akshatha-k <akshutk@gmail.com>
Update fern directory with new three-tier KVBM documentation:
- Add fern/pages/kvbm/README.md (Quick Start)
- Add fern/pages/kvbm/kvbm-guide.md (Guide)
- Add fern/pages/kvbm/kvbm-design.md (Design)

Create fern/pages/integrations/ with:
- lmcache-integration.md
- flexkv-integration.md
- sglang-hicache.md

Update fern/versions/next.yml navigation:
- Add "KV Cache Offloading" to User Guides section
- Update KVBM section with new structure
- Add Integrations section

Delete old fern KVBM files that were replaced.

Signed-off-by: akshatha-k <akshutk@gmail.com>
- Remove references to non-existent FlexKV launch scripts (agg_flexkv.sh,
  agg_flexkv_router.sh, disagg_flexkv.sh) from docs and fern
- Add language specifier 'text' to code blocks showing formulas in
  sglang_hicache.md and kvbm_design.md
- Fix broken link: kvbm_architecture.md → kvbm_design.md in sglang_hicache.md
- Fix case sensitivity: readme.md → README.md in kvbm_guide.md

Signed-off-by: akshatha-k <akshutk@gmail.com>
Move the Grafana screenshot to the central images directory and update
the reference in kvbm_guide.md to use the new path.

Signed-off-by: akshatha-k <akshutk@gmail.com>
Update links to reflect new documentation structure:
- kvbm_architecture.md → README.md (KVBM overview)
- LMCache_Integration.md → integrations/lmcache_integration.md
- trtllm-setup.md → kvbm_guide.md#run-kvbm-in-dynamo-with-tensorrt-llm

Affected files:
- README.md
- docs/backends/sglang/README.md
- docs/backends/trtllm/README.md
- docs/backends/vllm/README.md
- docs/backends/vllm/prometheus.md
- fern/pages/backends/sglang/README.md

Signed-off-by: akshatha-k <akshutk@gmail.com>
Update links in fern documentation to match new KVBM structure:
- Update KVBM and LMCache links in fern/pages/backends/
- Fix cross-references in fern/pages/integrations/sglang-hicache.md

Signed-off-by: akshatha-k <akshutk@gmail.com>
@dagil-nvidia
Copy link
Copy Markdown
Collaborator

/ok to test ce8103b

This commit reverts all changes made to the fern/ directory,
restoring it to its state before this PR's documentation changes.

Signed-off-by: akshatha-k <akshutk@gmail.com>
Signed-off-by: akshatha-k <akshutk@gmail.com>
Signed-off-by: akshatha-k <akshutk@gmail.com>
@dagil-nvidia
Copy link
Copy Markdown
Collaborator

/ok to test a1405da

@dagil-nvidia
Copy link
Copy Markdown
Collaborator

/ok to test 130ed97

- Remove trailing whitespace in kvbm_guide.md (pre-commit failure)
- Remove deleted kvbm/vllm-setup.md and kvbm/trtllm-setup.md from hidden_toctree.rst
- Remove moved backends/vllm/LMCache_Integration.md from hidden_toctree.rst
- Fix spacing in kvbm_intro.rst: "GPU memory(in future) ," → "GPU memory (future),"

Signed-off-by: Dan Gil <dagil@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@dagil-nvidia
Copy link
Copy Markdown
Collaborator

/ok to test 9edc4ba

Copy link
Copy Markdown
Contributor

@nv-kmcgill53 nv-kmcgill53 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to inline everything the best I could. I approve of the new layout, just some changes on the content

- Remove lib/bindings/kvbm/AGENTS.md from this PR (submit separately)
- Add blank lines before image captions for proper rendering
- Add G4 (Remote Storage) to storage pools section
- Add tier labels (G3, G4) to Host → Disk offload descriptions

Signed-off-by: Dan Gil <dagil@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@dagil-nvidia
Copy link
Copy Markdown
Collaborator

/ok to test cf7598a

dagil-nvidia and others added 3 commits February 5, 2026 16:12
Co-authored-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com>
Signed-off-by: dagil-nvidia <dagil@nvidia.com>
Co-authored-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com>
Signed-off-by: dagil-nvidia <dagil@nvidia.com>
Co-authored-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com>
Signed-off-by: dagil-nvidia <dagil@nvidia.com>
@dagil-nvidia
Copy link
Copy Markdown
Collaborator

/ok to test e86c9ed

@dagil-nvidia dagil-nvidia enabled auto-merge (squash) February 5, 2026 22:21
@dagil-nvidia dagil-nvidia disabled auto-merge February 5, 2026 22:55
@dagil-nvidia dagil-nvidia merged commit 4c3eba2 into main Feb 6, 2026
64 checks passed
@dagil-nvidia dagil-nvidia deleted the caching-doc-fixes branch February 6, 2026 00:02
soodoshll pushed a commit to soodoshll/dynamo that referenced this pull request Feb 12, 2026
…o#5905)

Signed-off-by: akshatha-k <akshutk@gmail.com>
Signed-off-by: Dan Gil <dagil@nvidia.com>
Signed-off-by: dagil-nvidia <dagil@nvidia.com>
Co-authored-by: dagil-nvidia <dagil@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs documentation Improvements or additions to documentation kvbm size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants