Skip to content

docs: Migrate router documentation to three-tier structure#5979

Merged
dagil-nvidia merged 16 commits intomainfrom
router-doc-fixes
Feb 6, 2026
Merged

docs: Migrate router documentation to three-tier structure#5979
dagil-nvidia merged 16 commits intomainfrom
router-doc-fixes

Conversation

@akshatha-k
Copy link
Copy Markdown
Contributor

@akshatha-k akshatha-k commented Feb 5, 2026

Summary

  • Reorganize router documentation into a clearer three-tier structure:
    • README.md: Quick start guide (what is this, get running in 60 seconds)
    • router_guide.md: Production configuration, tuning, disaggregated serving
    • router_examples.md: Python API, K8s examples, custom routing patterns
    • router_design.md: Architecture details, algorithms, event transport modes
  • Move kv_events.md to integrations/kv_events_custom_engines.md
  • Update all references across the codebase (20 files)
  • Add "KV Aware Routing" to User Guides in docs index

Test plan

  • Verify all internal links work
  • Verify diagrams render correctly
  • Review content accuracy

Summary by CodeRabbit

Release Notes

  • Documentation
    • Added comprehensive Router design documentation covering architecture, KV cache optimization, and event handling.
    • Added Router Guide and Examples documentation for setup, usage patterns, and API reference.
    • Added KVBM User Guide and Design documentation for KV Block Manager features and configuration.
    • Added new integration guides for FlexKV and SGLang HiCache.
    • Reorganized and consolidated documentation structure for improved navigation.
    • Updated documentation links throughout to reflect new structure.

Migrate KVBM documentation to a new three-tier structure:
- Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture
- Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions)
- Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations

Create integrations folder with:
- flexkv_integration.md - new FlexKV integration guide from PR #5858
- lmcache_integration.md - migrated from backends/vllm/
- sglang_hicache.md - migrated from backends/sglang/

Add AGENTS.md for KVBM component to guide AI agents.

Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides.

Signed-off-by: akshatha-k <akshutk@gmail.com>
Update fern directory with new three-tier KVBM documentation:
- Add fern/pages/kvbm/README.md (Quick Start)
- Add fern/pages/kvbm/kvbm-guide.md (Guide)
- Add fern/pages/kvbm/kvbm-design.md (Design)

Create fern/pages/integrations/ with:
- lmcache-integration.md
- flexkv-integration.md
- sglang-hicache.md

Update fern/versions/next.yml navigation:
- Add "KV Cache Offloading" to User Guides section
- Update KVBM section with new structure
- Add Integrations section

Delete old fern KVBM files that were replaced.

Signed-off-by: akshatha-k <akshutk@gmail.com>
- Remove references to non-existent FlexKV launch scripts (agg_flexkv.sh,
  agg_flexkv_router.sh, disagg_flexkv.sh) from docs and fern
- Add language specifier 'text' to code blocks showing formulas in
  sglang_hicache.md and kvbm_design.md
- Fix broken link: kvbm_architecture.md → kvbm_design.md in sglang_hicache.md
- Fix case sensitivity: readme.md → README.md in kvbm_guide.md

Signed-off-by: akshatha-k <akshutk@gmail.com>
Move the Grafana screenshot to the central images directory and update
the reference in kvbm_guide.md to use the new path.

Signed-off-by: akshatha-k <akshutk@gmail.com>
Update links to reflect new documentation structure:
- kvbm_architecture.md → README.md (KVBM overview)
- LMCache_Integration.md → integrations/lmcache_integration.md
- trtllm-setup.md → kvbm_guide.md#run-kvbm-in-dynamo-with-tensorrt-llm

Affected files:
- README.md
- docs/backends/sglang/README.md
- docs/backends/trtllm/README.md
- docs/backends/vllm/README.md
- docs/backends/vllm/prometheus.md
- fern/pages/backends/sglang/README.md

Signed-off-by: akshatha-k <akshutk@gmail.com>
Update links in fern documentation to match new KVBM structure:
- Update KVBM and LMCache links in fern/pages/backends/
- Fix cross-references in fern/pages/integrations/sglang-hicache.md

Signed-off-by: akshatha-k <akshutk@gmail.com>
This commit reverts all changes made to the fern/ directory,
restoring it to its state before this PR's documentation changes.

Signed-off-by: akshatha-k <akshutk@gmail.com>
Signed-off-by: akshatha-k <akshutk@gmail.com>
Reorganize router documentation into a clearer three-tier structure:
- README.md: Quick start guide (what is this, get running in 60 seconds)
- router_guide.md: Production configuration, tuning, disaggregated serving
- router_examples.md: Python API, K8s examples, custom routing patterns
- router_design.md: Architecture details, algorithms, event transport modes

Also moves kv_events.md to integrations/kv_events_custom_engines.md and
updates all references across the codebase.

Signed-off-by: akshatha-k <akshutk@gmail.com>
@akshatha-k akshatha-k requested review from a team as code owners February 5, 2026 00:32
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added docs documentation Improvements or additions to documentation backend::vllm Relates to the vllm backend backend::trtllm Relates to the trtllm backend router Relates to routing, KV-aware routing, etc. labels Feb 5, 2026
@akshatha-k akshatha-k removed size/XXL backend::vllm Relates to the vllm backend backend::trtllm Relates to the trtllm backend labels Feb 5, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 5, 2026

Walkthrough

This PR restructures KVBM and router documentation by consolidating fragmented files into centralized guides (kvbm_guide.md, router_guide.md), creating comprehensive design documents (kvbm_design.md, router_design.md), adding new integration documentation (FlexKV, SGLang HiCache), and systematically updating cross-references throughout the codebase to reflect new file locations.

Changes

Cohort / File(s) Summary
KVBM Documentation Consolidation
docs/kvbm/kvbm_architecture.md, kvbm_components.md, kvbm_motivation.md, kvbm_reading.md, vllm-setup.md, trtllm-setup.md, kvbm_integrations.md, kvbm_design_deepdive.md (deleted)
docs/kvbm/README.md, kvbm_guide.md, kvbm_design.md (created)
Consolidated KVBM documentation from 8 dispersed files into centralized README, comprehensive guide, and design document. Removed redundant/superseded docs; centralized setup, configuration, and architecture guidance.
Router Documentation Consolidation
docs/router/kv_cache_routing.md (deleted)
docs/router/README.md (refactored), router_guide.md, router_examples.md (created)
docs/design_docs/router_design.md (created)
Restructured router documentation: removed monolithic KV cache routing doc; refactored README to high-level overview; created router_guide for configuration and tuning, router_examples for API and deployment patterns, and router_design for architecture and event handling.
Cross-Reference Updates
README.md, benchmarks/router/README.md, components/src/dynamo/router/README.md, deploy/inference-gateway/README.md, docs/backends/sglang/README.md, docs/backends/trtllm/README.md, docs/backends/vllm/README.md, docs/backends/vllm/prometheus.md, docs/design_docs/architecture.md, docs/reference/feature-matrix.md, docs/index.rst, examples/backends/trtllm/deploy/README.md, examples/backends/vllm/deploy/README.md, examples/basics/multinode/README.md, lib/bindings/kvbm/README.md
Updated documentation links from deprecated paths (kv_cache_routing.md, kvbm_architecture.md, vllm-setup.md, trtllm-setup.md) to new consolidated locations (README.md, kvbm_guide.md, router_guide.md, router_design.md). Links updated in feature matrices, deployment guides, and architecture docs.
New Integration Documentation
docs/integrations/flexkv_integration.md, docs/integrations/sglang_hicache.md (created)
docs/integrations/lmcache_integration.md, docs/integrations/kv_events_custom_engines.md (updated)
Added comprehensive FlexKV and SGLang HiCache integration guides with setup, configuration, and deployment examples. Updated LMCache integration with minor formatting and link adjustments; added router-related references to KV events documentation.
KVBM Bindings Documentation
lib/bindings/kvbm/AGENTS.md (created)
New documentation file covering KVBM bindings architecture, command workflows, design decisions, environment variables, and integration patterns with inference frameworks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The warren's docs were scattered wide and far,
Through kvbm and router paths, like stars!
Now consolidated, organized, and bright—
Fresh guides and designs make everything right! ✨
A tidy burrow of knowledge, clean and complete,
Documentation restructured, oh what a treat!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: reorganizing router documentation into a three-tier structure, which is the primary objective of this PR.
Description check ✅ Passed The description provides a clear summary of changes, lists the four-tier documentation structure with each tier's purpose, mentions moving kv_events.md, notes updating 20 files, and includes a test plan with verification checkboxes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@docs/design_docs/router_design.md`:
- Line 125: Update the wording to use standard spelling "built-in" instead of
"builtin" in the sentence that references Python's built-in hash (the phrase
containing `hash()` and `PYTHONHASHSEED=0`), e.g., change "Python's builtin
`hash()`" to "Python's built-in `hash()`" so the docs for KvIndexer/router and
engine configuration use correct terminology.

In `@docs/integrations/sglang_hicache.md`:
- Around line 50-69: The example uses the same port for both the worker and
frontend (the worker invocation "python -m dynamo.sglang" uses --port 8000 and
the frontend invocation "python -m dynamo.frontend" uses --http-port 8000),
which will collide; update the frontend command to a different HTTP port (e.g.,
--http-port 8001) and make sure the frontend is configured to point to the
worker backend at 0.0.0.0:8000 using the frontend's backend connection option
(e.g., --backend-url or --backend-host/--backend-port if available); update both
commands in the docs so the worker remains on --port 8000 and the frontend
listens on the new HTTP port and connects to the worker host:port.
- Around line 379-383: Update the SGLang version note in the HiCache checklist:
change the line that reads "Check SGLang version supports HiCache (v0.5.3+
recommended)" to clarify minimum and recommended versions (e.g., "SGLang v0.5.2+
(HiCache introduced in v0.5.2); v0.5.3+ recommended for stability"), or adjust
it to match the current upstream recommendation—ensure the checklist entry
referencing SGLang and the string "--enable-hierarchical-cache" reflects the
corrected minimum/recommended versions.

In `@docs/router/router_guide.md`:
- Around line 62-64: The fenced code block containing the command "python -m
dynamo.frontend --router-mode kv" lacks a language identifier; update the block
opener from ``` to ```bash so the snippet becomes a bash-highlighted block and
retains the exact command text inside the fence to enable proper syntax
highlighting in docs/router/router_guide.md.
- Around line 152-154: Update the fenced code block that contains the log
example "Formula for worker_1: 125.3 = 1.0 * 100.5 + 25.0 (cached_blocks: 15)"
to include a language identifier (e.g., use ```text) so the snippet renders
correctly; locate the block in router_guide.md containing that exact log string
and change the opening fence to ```text while keeping the content and closing
fence unchanged.
- Line 61: Replace the unhyphenated compound adjective in the sentence "To
enable KV cache aware routing start the frontend node like this:" by hyphenating
the phrase so it reads "To enable KV-cache-aware routing start the frontend node
like this:" (and apply the same "KV-cache-aware" hyphenation wherever "KV cache
aware" appears in the document).

In `@lib/bindings/kvbm/AGENTS.md`:
- Around line 11-72: The markdown file's fenced code blocks (the directory tree
blocks showing lib/bindings/kvbm/ and the nested python/ and src/ trees) are
missing language identifiers; update each triple-backtick block (the top-level
tree block and the subsequent blocks that list python/, vllm_integration/,
trtllm_integration/, and src/ subtrees) to include an appropriate language tag
such as text (e.g., ```text), bash, or yaml so markdownlint warnings are
resolved; locate the blocks by searching for the directory listings starting
with "lib/bindings/kvbm/" and the nested "python/" and "src/" sections and add
the language identifier to each opening ``` fence.
🧹 Nitpick comments (3)
docs/router/README.md (1)

78-80: Consider varying sentence structure.

Three consecutive sentences begin with "For", which can feel repetitive. Consider rephrasing for better flow.

✍️ Proposed rewording
-For A/B testing and advanced K8s setup, see the [KV Router A/B Benchmarking Guide](../benchmarks/kv-router-ab-testing.md).
-
-For more configuration options and tuning guidelines, see the [Router Guide](router_guide.md).
+For A/B testing and advanced K8s setup, see the [KV Router A/B Benchmarking Guide](../benchmarks/kv-router-ab-testing.md).
+
+Additional configuration options and tuning guidelines are available in the [Router Guide](router_guide.md).
docs/design_docs/router_design.md (2)

33-33: Consider more concise wording.

"based off of" is wordy. Use "based on" for cleaner prose.

✍️ Proposed fix
-The leading Large Language Models (LLMs) today are auto-regressive and based off of the [transformer architecture](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf).
+The leading Large Language Models (LLMs) today are auto-regressive and based on the [transformer architecture](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf).

78-78: Consider more concise wording.

Replace "in an efficient way" with "efficiently" for more concise prose.

✍️ Proposed fix
-Every inference framework will have a KV Cache for each worker. A popular inference framework library is [vLLM](https://github.com/vllm-project/vllm) where a key contribution was [PagedAttention](https://arxiv.org/abs/2309.06180), which allowed them to manage KV Cache in an efficient way by chunking requests into blocks.
+Every inference framework will have a KV Cache for each worker. A popular inference framework library is [vLLM](https://github.com/vllm-project/vllm) where a key contribution was [PagedAttention](https://arxiv.org/abs/2309.06180), which allowed them to manage KV Cache efficiently by chunking requests into blocks.

Signed-off-by: akshatha-k <akshutk@gmail.com>
@github-actions github-actions bot added backend::vllm Relates to the vllm backend backend::trtllm Relates to the trtllm backend labels Feb 5, 2026
@dagil-nvidia
Copy link
Copy Markdown
Collaborator

/ok to test 6cf32aa

dagil-nvidia and others added 2 commits February 5, 2026 15:39
Signed-off-by: dagil-nvidia <dagil@nvidia.com>
- Fix "builtin" → "built-in" spelling (router_design.md)
- Fix "KV cache aware" → "KV cache-aware" hyphenation (router_guide.md)
- Add language identifiers to fenced code blocks (router_guide.md, AGENTS.md)
- Remove trailing whitespace (kvbm_guide.md, router_guide.md)
- Update hidden_toctree.rst: remove deleted docs, add renamed/new docs
- Add router_design.md to Design Docs toctree in index.rst

Signed-off-by: Dan Gil <dagil@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@dagil-nvidia
Copy link
Copy Markdown
Collaborator

/ok to test 1296914

Signed-off-by: Dan Gil <dagil@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@dagil-nvidia
Copy link
Copy Markdown
Collaborator

/ok to test 0f64bca

@dagil-nvidia dagil-nvidia requested a review from PeaBrane February 5, 2026 22:09
dagil-nvidia and others added 2 commits February 5, 2026 18:10
- hidden_toctree.rst: keep Router integration entries
- index.rst: keep KV Aware Routing in User Guides
- kvbm_design.md: accept main's KVBM version (G4 storage, formatting)

Signed-off-by: Dan Gil <dagil@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Update LoRA README cross-reference from deleted kv_cache_routing.md
to its replacement router_guide.md.

Signed-off-by: Dan Gil <dagil@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@dagil-nvidia
Copy link
Copy Markdown
Collaborator

/ok to test d032a29

dagil-nvidia added a commit that referenced this pull request Feb 6, 2026
Changes anticipating Frontend (#6002) and Router (#5979) merges:

Frontend cleanup:
- Delete docs/frontends/kserve.md and openapi.json (moved to components/frontend/)
- Delete docs/_sections/frontends.rst (obsolete, only pointed to kserve.md)
- Update index.rst: Frontends -> Frontend at components/frontend/README
- Add redirect in conf.py: frontends/kserve -> components/frontend/frontend_guide
- Update README.md: openapi.json path to docs/reference/api/

Toctree cleanup:
- Remove frontends/kserve.md and _sections/frontends.rst from hidden_toctree.rst
- Remove router/kv_cache_routing.md and router/kv_events.md from hidden_toctree.rst
  (these files will be deleted/moved by Router PR)

Note: router old-path references in other files are left for Router PR to handle
to avoid merge conflicts.

Signed-off-by: Dan Gil <dagil@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@dagil-nvidia
Copy link
Copy Markdown
Collaborator

/ok to test 2e6ed34

@dagil-nvidia dagil-nvidia merged commit 80e7baf into main Feb 6, 2026
66 of 67 checks passed
@dagil-nvidia dagil-nvidia deleted the router-doc-fixes branch February 6, 2026 01:55
soodoshll pushed a commit to soodoshll/dynamo that referenced this pull request Feb 12, 2026
…#5979)

Signed-off-by: akshatha-k <akshutk@gmail.com>
Signed-off-by: dagil-nvidia <dagil@nvidia.com>
Signed-off-by: Dan Gil <dagil@nvidia.com>
Co-authored-by: dagil-nvidia <dagil@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::trtllm Relates to the trtllm backend backend::vllm Relates to the vllm backend docs documentation Improvements or additions to documentation router Relates to routing, KV-aware routing, etc. size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants