docs: Migrate router documentation to three-tier structure by akshatha-k · Pull Request #5979 · ai-dynamo/dynamo

akshatha-k · 2026-02-05T00:32:36Z

Summary

Reorganize router documentation into a clearer three-tier structure:
- README.md: Quick start guide (what is this, get running in 60 seconds)
- router_guide.md: Production configuration, tuning, disaggregated serving
- router_examples.md: Python API, K8s examples, custom routing patterns
- router_design.md: Architecture details, algorithms, event transport modes
Move kv_events.md to integrations/kv_events_custom_engines.md
Update all references across the codebase (20 files)
Add "KV Aware Routing" to User Guides in docs index

Test plan

Verify all internal links work
Verify diagrams render correctly
Review content accuracy

Summary by CodeRabbit

Release Notes

Documentation
- Added comprehensive Router design documentation covering architecture, KV cache optimization, and event handling.
- Added Router Guide and Examples documentation for setup, usage patterns, and API reference.
- Added KVBM User Guide and Design documentation for KV Block Manager features and configuration.
- Added new integration guides for FlexKV and SGLang HiCache.
- Reorganized and consolidated documentation structure for improved navigation.
- Updated documentation links throughout to reflect new structure.

Migrate KVBM documentation to a new three-tier structure: - Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture - Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions) - Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations Create integrations folder with: - flexkv_integration.md - new FlexKV integration guide from PR #5858 - lmcache_integration.md - migrated from backends/vllm/ - sglang_hicache.md - migrated from backends/sglang/ Add AGENTS.md for KVBM component to guide AI agents. Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides. Signed-off-by: akshatha-k <akshutk@gmail.com>

Update fern directory with new three-tier KVBM documentation: - Add fern/pages/kvbm/README.md (Quick Start) - Add fern/pages/kvbm/kvbm-guide.md (Guide) - Add fern/pages/kvbm/kvbm-design.md (Design) Create fern/pages/integrations/ with: - lmcache-integration.md - flexkv-integration.md - sglang-hicache.md Update fern/versions/next.yml navigation: - Add "KV Cache Offloading" to User Guides section - Update KVBM section with new structure - Add Integrations section Delete old fern KVBM files that were replaced. Signed-off-by: akshatha-k <akshutk@gmail.com>

- Remove references to non-existent FlexKV launch scripts (agg_flexkv.sh, agg_flexkv_router.sh, disagg_flexkv.sh) from docs and fern - Add language specifier 'text' to code blocks showing formulas in sglang_hicache.md and kvbm_design.md - Fix broken link: kvbm_architecture.md → kvbm_design.md in sglang_hicache.md - Fix case sensitivity: readme.md → README.md in kvbm_guide.md Signed-off-by: akshatha-k <akshutk@gmail.com>

Move the Grafana screenshot to the central images directory and update the reference in kvbm_guide.md to use the new path. Signed-off-by: akshatha-k <akshutk@gmail.com>

Update links to reflect new documentation structure: - kvbm_architecture.md → README.md (KVBM overview) - LMCache_Integration.md → integrations/lmcache_integration.md - trtllm-setup.md → kvbm_guide.md#run-kvbm-in-dynamo-with-tensorrt-llm Affected files: - README.md - docs/backends/sglang/README.md - docs/backends/trtllm/README.md - docs/backends/vllm/README.md - docs/backends/vllm/prometheus.md - fern/pages/backends/sglang/README.md Signed-off-by: akshatha-k <akshutk@gmail.com>

Update links in fern documentation to match new KVBM structure: - Update KVBM and LMCache links in fern/pages/backends/ - Fix cross-references in fern/pages/integrations/sglang-hicache.md Signed-off-by: akshatha-k <akshutk@gmail.com>

This commit reverts all changes made to the fern/ directory, restoring it to its state before this PR's documentation changes. Signed-off-by: akshatha-k <akshutk@gmail.com>

Signed-off-by: akshatha-k <akshutk@gmail.com>

Reorganize router documentation into a clearer three-tier structure: - README.md: Quick start guide (what is this, get running in 60 seconds) - router_guide.md: Production configuration, tuning, disaggregated serving - router_examples.md: Python API, K8s examples, custom routing patterns - router_design.md: Architecture details, algorithms, event transport modes Also moves kv_events.md to integrations/kv_events_custom_engines.md and updates all references across the codebase. Signed-off-by: akshatha-k <akshutk@gmail.com>

copy-pr-bot · 2026-02-05T00:32:39Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-05T00:39:50Z

Walkthrough

This PR restructures KVBM and router documentation by consolidating fragmented files into centralized guides (kvbm_guide.md, router_guide.md), creating comprehensive design documents (kvbm_design.md, router_design.md), adding new integration documentation (FlexKV, SGLang HiCache), and systematically updating cross-references throughout the codebase to reflect new file locations.

Changes

Cohort / File(s)	Summary
KVBM Documentation Consolidation `docs/kvbm/kvbm_architecture.md`, `kvbm_components.md`, `kvbm_motivation.md`, `kvbm_reading.md`, `vllm-setup.md`, `trtllm-setup.md`, `kvbm_integrations.md`, `kvbm_design_deepdive.md` (deleted) `docs/kvbm/README.md`, `kvbm_guide.md`, `kvbm_design.md` (created)	Consolidated KVBM documentation from 8 dispersed files into centralized README, comprehensive guide, and design document. Removed redundant/superseded docs; centralized setup, configuration, and architecture guidance.
Router Documentation Consolidation `docs/router/kv_cache_routing.md` (deleted) `docs/router/README.md` (refactored), `router_guide.md`, `router_examples.md` (created) `docs/design_docs/router_design.md` (created)	Restructured router documentation: removed monolithic KV cache routing doc; refactored README to high-level overview; created router_guide for configuration and tuning, router_examples for API and deployment patterns, and router_design for architecture and event handling.
Cross-Reference Updates `README.md`, `benchmarks/router/README.md`, `components/src/dynamo/router/README.md`, `deploy/inference-gateway/README.md`, `docs/backends/sglang/README.md`, `docs/backends/trtllm/README.md`, `docs/backends/vllm/README.md`, `docs/backends/vllm/prometheus.md`, `docs/design_docs/architecture.md`, `docs/reference/feature-matrix.md`, `docs/index.rst`, `examples/backends/trtllm/deploy/README.md`, `examples/backends/vllm/deploy/README.md`, `examples/basics/multinode/README.md`, `lib/bindings/kvbm/README.md`	Updated documentation links from deprecated paths (kv_cache_routing.md, kvbm_architecture.md, vllm-setup.md, trtllm-setup.md) to new consolidated locations (README.md, kvbm_guide.md, router_guide.md, router_design.md). Links updated in feature matrices, deployment guides, and architecture docs.
New Integration Documentation `docs/integrations/flexkv_integration.md`, `docs/integrations/sglang_hicache.md` (created) `docs/integrations/lmcache_integration.md`, `docs/integrations/kv_events_custom_engines.md` (updated)	Added comprehensive FlexKV and SGLang HiCache integration guides with setup, configuration, and deployment examples. Updated LMCache integration with minor formatting and link adjustments; added router-related references to KV events documentation.
KVBM Bindings Documentation `lib/bindings/kvbm/AGENTS.md` (created)	New documentation file covering KVBM bindings architecture, command workflows, design decisions, environment variables, and integration patterns with inference frameworks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The warren's docs were scattered wide and far,
Through kvbm and router paths, like stars!
Now consolidated, organized, and bright—
Fresh guides and designs make everything right! ✨
A tidy burrow of knowledge, clean and complete,
Documentation restructured, oh what a treat!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: reorganizing router documentation into a three-tier structure, which is the primary objective of this PR.
Description check	✅ Passed	The description provides a clear summary of changes, lists the four-tier documentation structure with each tier's purpose, mentions moving kv_events.md, notes updating 20 files, and includes a test plan with verification checkboxes.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In `@docs/design_docs/router_design.md`:
- Line 125: Update the wording to use standard spelling "built-in" instead of
"builtin" in the sentence that references Python's built-in hash (the phrase
containing `hash()` and `PYTHONHASHSEED=0`), e.g., change "Python's builtin
`hash()`" to "Python's built-in `hash()`" so the docs for KvIndexer/router and
engine configuration use correct terminology.

In `@docs/integrations/sglang_hicache.md`:
- Around line 50-69: The example uses the same port for both the worker and
frontend (the worker invocation "python -m dynamo.sglang" uses --port 8000 and
the frontend invocation "python -m dynamo.frontend" uses --http-port 8000),
which will collide; update the frontend command to a different HTTP port (e.g.,
--http-port 8001) and make sure the frontend is configured to point to the
worker backend at 0.0.0.0:8000 using the frontend's backend connection option
(e.g., --backend-url or --backend-host/--backend-port if available); update both
commands in the docs so the worker remains on --port 8000 and the frontend
listens on the new HTTP port and connects to the worker host:port.
- Around line 379-383: Update the SGLang version note in the HiCache checklist:
change the line that reads "Check SGLang version supports HiCache (v0.5.3+
recommended)" to clarify minimum and recommended versions (e.g., "SGLang v0.5.2+
(HiCache introduced in v0.5.2); v0.5.3+ recommended for stability"), or adjust
it to match the current upstream recommendation—ensure the checklist entry
referencing SGLang and the string "--enable-hierarchical-cache" reflects the
corrected minimum/recommended versions.

In `@docs/router/router_guide.md`:
- Around line 62-64: The fenced code block containing the command "python -m
dynamo.frontend --router-mode kv" lacks a language identifier; update the block
opener from ``` to ```bash so the snippet becomes a bash-highlighted block and
retains the exact command text inside the fence to enable proper syntax
highlighting in docs/router/router_guide.md.
- Around line 152-154: Update the fenced code block that contains the log
example "Formula for worker_1: 125.3 = 1.0 * 100.5 + 25.0 (cached_blocks: 15)"
to include a language identifier (e.g., use ```text) so the snippet renders
correctly; locate the block in router_guide.md containing that exact log string
and change the opening fence to ```text while keeping the content and closing
fence unchanged.
- Line 61: Replace the unhyphenated compound adjective in the sentence "To
enable KV cache aware routing start the frontend node like this:" by hyphenating
the phrase so it reads "To enable KV-cache-aware routing start the frontend node
like this:" (and apply the same "KV-cache-aware" hyphenation wherever "KV cache
aware" appears in the document).

In `@lib/bindings/kvbm/AGENTS.md`:
- Around line 11-72: The markdown file's fenced code blocks (the directory tree
blocks showing lib/bindings/kvbm/ and the nested python/ and src/ trees) are
missing language identifiers; update each triple-backtick block (the top-level
tree block and the subsequent blocks that list python/, vllm_integration/,
trtllm_integration/, and src/ subtrees) to include an appropriate language tag
such as text (e.g., ```text), bash, or yaml so markdownlint warnings are
resolved; locate the blocks by searching for the directory listings starting
with "lib/bindings/kvbm/" and the nested "python/" and "src/" sections and add
the language identifier to each opening ``` fence.

🧹 Nitpick comments (3)

docs/router/README.md (1)

78-80: Consider varying sentence structure.

Three consecutive sentences begin with "For", which can feel repetitive. Consider rephrasing for better flow.

✍️ Proposed rewording

-For A/B testing and advanced K8s setup, see the [KV Router A/B Benchmarking Guide](../benchmarks/kv-router-ab-testing.md).
-
-For more configuration options and tuning guidelines, see the [Router Guide](router_guide.md).
+For A/B testing and advanced K8s setup, see the [KV Router A/B Benchmarking Guide](../benchmarks/kv-router-ab-testing.md).
+
+Additional configuration options and tuning guidelines are available in the [Router Guide](router_guide.md).

docs/design_docs/router_design.md (2)

33-33: Consider more concise wording.

"based off of" is wordy. Use "based on" for cleaner prose.

✍️ Proposed fix

-The leading Large Language Models (LLMs) today are auto-regressive and based off of the [transformer architecture](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf).
+The leading Large Language Models (LLMs) today are auto-regressive and based on the [transformer architecture](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf).

78-78: Consider more concise wording.

Replace "in an efficient way" with "efficiently" for more concise prose.

✍️ Proposed fix

-Every inference framework will have a KV Cache for each worker. A popular inference framework library is [vLLM](https://github.com/vllm-project/vllm) where a key contribution was [PagedAttention](https://arxiv.org/abs/2309.06180), which allowed them to manage KV Cache in an efficient way by chunking requests into blocks.
+Every inference framework will have a KV Cache for each worker. A popular inference framework library is [vLLM](https://github.com/vllm-project/vllm) where a key contribution was [PagedAttention](https://arxiv.org/abs/2309.06180), which allowed them to manage KV Cache efficiently by chunking requests into blocks.

docs/design_docs/router_design.md

docs/integrations/sglang_hicache.md

docs/router/router_guide.md

lib/bindings/kvbm/AGENTS.md

Signed-off-by: akshatha-k <akshutk@gmail.com>

dagil-nvidia · 2026-02-05T15:56:31Z

/ok to test 6cf32aa

Signed-off-by: dagil-nvidia <dagil@nvidia.com>

- Fix "builtin" → "built-in" spelling (router_design.md) - Fix "KV cache aware" → "KV cache-aware" hyphenation (router_guide.md) - Add language identifiers to fenced code blocks (router_guide.md, AGENTS.md) - Remove trailing whitespace (kvbm_guide.md, router_guide.md) - Update hidden_toctree.rst: remove deleted docs, add renamed/new docs - Add router_design.md to Design Docs toctree in index.rst Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

dagil-nvidia · 2026-02-05T21:53:26Z

/ok to test 1296914

Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

dagil-nvidia · 2026-02-05T22:08:57Z

/ok to test 0f64bca

- hidden_toctree.rst: keep Router integration entries - index.rst: keep KV Aware Routing in User Guides - kvbm_design.md: accept main's KVBM version (G4 storage, formatting) Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Update LoRA README cross-reference from deleted kv_cache_routing.md to its replacement router_guide.md. Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

dagil-nvidia · 2026-02-06T00:16:55Z

/ok to test d032a29

Changes anticipating Frontend (#6002) and Router (#5979) merges: Frontend cleanup: - Delete docs/frontends/kserve.md and openapi.json (moved to components/frontend/) - Delete docs/_sections/frontends.rst (obsolete, only pointed to kserve.md) - Update index.rst: Frontends -> Frontend at components/frontend/README - Add redirect in conf.py: frontends/kserve -> components/frontend/frontend_guide - Update README.md: openapi.json path to docs/reference/api/ Toctree cleanup: - Remove frontends/kserve.md and _sections/frontends.rst from hidden_toctree.rst - Remove router/kv_cache_routing.md and router/kv_events.md from hidden_toctree.rst (these files will be deleted/moved by Router PR) Note: router old-path references in other files are left for Router PR to handle to avoid merge conflicts. Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

dagil-nvidia · 2026-02-06T01:04:15Z

/ok to test 2e6ed34

…#5979) Signed-off-by: akshatha-k <akshutk@gmail.com> Signed-off-by: dagil-nvidia <dagil@nvidia.com> Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: dagil-nvidia <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

akshatha-k added 9 commits February 3, 2026 18:00

docs: move kvbm_metrics_grafana.png to docs/images

547bcde

Move the Grafana screenshot to the central images directory and update the reference in kvbm_guide.md to use the new path. Signed-off-by: akshatha-k <akshutk@gmail.com>

docs: fix broken links in fern pages

ce8103b

Update links in fern documentation to match new KVBM structure: - Update KVBM and LMCache links in fern/pages/backends/ - Fix cross-references in fern/pages/integrations/sglang-hicache.md Signed-off-by: akshatha-k <akshutk@gmail.com>

Revert: Remove all fern documentation changes from this PR

e676237

This commit reverts all changes made to the fern/ directory, restoring it to its state before this PR's documentation changes. Signed-off-by: akshatha-k <akshutk@gmail.com>

docs: restore sgl-hicache-example.md

2d4dfb2

Signed-off-by: akshatha-k <akshutk@gmail.com>

akshatha-k requested review from a team as code owners February 5, 2026 00:32

pull-request-size bot added the size/XXL label Feb 5, 2026

github-actions bot added docs documentation Improvements or additions to documentation backend::vllm Relates to the vllm backend backend::trtllm Relates to the trtllm backend router Relates to routing, KV-aware routing, etc. labels Feb 5, 2026

akshatha-k removed size/XXL backend::vllm Relates to the vllm backend backend::trtllm Relates to the trtllm backend labels Feb 5, 2026

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

docs: replace sglang_hicache.md with simplified content

6cf32aa

Signed-off-by: akshatha-k <akshutk@gmail.com>

pull-request-size bot added the size/XXL label Feb 5, 2026

github-actions bot added backend::vllm Relates to the vllm backend backend::trtllm Relates to the trtllm backend labels Feb 5, 2026

dagil-nvidia and others added 2 commits February 5, 2026 15:39

Merge branch 'main' into router-doc-fixes

3feac5c

Signed-off-by: dagil-nvidia <dagil@nvidia.com>

copy-pr-bot bot had a problem deploying to GITLAB February 5, 2026 21:53 Failure

fix: remove out-of-scope AGENTS.md from docs PR

0f64bca

Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

dagil-nvidia requested a review from PeaBrane February 5, 2026 22:09

PeaBrane approved these changes Feb 5, 2026

View reviewed changes

dagil-nvidia and others added 2 commits February 5, 2026 18:10

fix: update broken link to deleted kv_cache_routing.md

d032a29

Update LoRA README cross-reference from deleted kv_cache_routing.md to its replacement router_guide.md. Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

copy-pr-bot bot temporarily deployed to GITLAB February 6, 2026 00:17 Inactive

dagil-nvidia enabled auto-merge (squash) February 6, 2026 00:17

hhzhang16 approved these changes Feb 6, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB February 6, 2026 00:34 Inactive

dagil-nvidia mentioned this pull request Feb 6, 2026

docs: cleanup of docs refactor for components, integrations, and features #6019

Merged

8 tasks

Merge branch 'main' into router-doc-fixes

2e6ed34

copy-pr-bot bot temporarily deployed to GITLAB February 6, 2026 01:04 Inactive

copy-pr-bot bot temporarily deployed to GITLAB February 6, 2026 01:11 Inactive

dagil-nvidia merged commit 80e7baf into main Feb 6, 2026
66 of 67 checks passed

dagil-nvidia deleted the router-doc-fixes branch February 6, 2026 01:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Migrate router documentation to three-tier structure#5979

docs: Migrate router documentation to three-tier structure#5979
dagil-nvidia merged 16 commits intomainfrom
router-doc-fixes

akshatha-k commented Feb 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Feb 5, 2026

Uh oh!

coderabbitai bot commented Feb 5, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dagil-nvidia commented Feb 5, 2026

Uh oh!

dagil-nvidia commented Feb 5, 2026

Uh oh!

dagil-nvidia commented Feb 5, 2026

Uh oh!

dagil-nvidia commented Feb 6, 2026

Uh oh!

dagil-nvidia commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

akshatha-k commented Feb 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Feb 5, 2026

Uh oh!

coderabbitai bot commented Feb 5, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dagil-nvidia commented Feb 5, 2026

Uh oh!

dagil-nvidia commented Feb 5, 2026

Uh oh!

dagil-nvidia commented Feb 5, 2026

Uh oh!

dagil-nvidia commented Feb 6, 2026

Uh oh!

dagil-nvidia commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

akshatha-k commented Feb 5, 2026 •

edited by coderabbitai bot

Loading