docs: Migrate router documentation to three-tier structure#5979
docs: Migrate router documentation to three-tier structure#5979dagil-nvidia merged 16 commits intomainfrom
Conversation
Migrate KVBM documentation to a new three-tier structure: - Tier 1: README.md (Quick Start) - overview, link to user guide, feature matrix, architecture - Tier 2: kvbm_guide.md (Guide) - installation, configuration, deployment for all user paths (pip wheel, via trtllm/vllm or via the dynamo integrations with other kv offloading solutions) - Tier 3: kvbm_design.md (Design) - architecture deep dive, components, data flows, framework integrations Create integrations folder with: - flexkv_integration.md - new FlexKV integration guide from PR #5858 - lmcache_integration.md - migrated from backends/vllm/ - sglang_hicache.md - migrated from backends/sglang/ Add AGENTS.md for KVBM component to guide AI agents. Update docs/index.rst to add "KV Cache Offloading" as first item under User Guides. Signed-off-by: akshatha-k <akshutk@gmail.com>
Update fern directory with new three-tier KVBM documentation: - Add fern/pages/kvbm/README.md (Quick Start) - Add fern/pages/kvbm/kvbm-guide.md (Guide) - Add fern/pages/kvbm/kvbm-design.md (Design) Create fern/pages/integrations/ with: - lmcache-integration.md - flexkv-integration.md - sglang-hicache.md Update fern/versions/next.yml navigation: - Add "KV Cache Offloading" to User Guides section - Update KVBM section with new structure - Add Integrations section Delete old fern KVBM files that were replaced. Signed-off-by: akshatha-k <akshutk@gmail.com>
- Remove references to non-existent FlexKV launch scripts (agg_flexkv.sh, agg_flexkv_router.sh, disagg_flexkv.sh) from docs and fern - Add language specifier 'text' to code blocks showing formulas in sglang_hicache.md and kvbm_design.md - Fix broken link: kvbm_architecture.md → kvbm_design.md in sglang_hicache.md - Fix case sensitivity: readme.md → README.md in kvbm_guide.md Signed-off-by: akshatha-k <akshutk@gmail.com>
Move the Grafana screenshot to the central images directory and update the reference in kvbm_guide.md to use the new path. Signed-off-by: akshatha-k <akshutk@gmail.com>
Update links to reflect new documentation structure: - kvbm_architecture.md → README.md (KVBM overview) - LMCache_Integration.md → integrations/lmcache_integration.md - trtllm-setup.md → kvbm_guide.md#run-kvbm-in-dynamo-with-tensorrt-llm Affected files: - README.md - docs/backends/sglang/README.md - docs/backends/trtllm/README.md - docs/backends/vllm/README.md - docs/backends/vllm/prometheus.md - fern/pages/backends/sglang/README.md Signed-off-by: akshatha-k <akshutk@gmail.com>
Update links in fern documentation to match new KVBM structure: - Update KVBM and LMCache links in fern/pages/backends/ - Fix cross-references in fern/pages/integrations/sglang-hicache.md Signed-off-by: akshatha-k <akshutk@gmail.com>
This commit reverts all changes made to the fern/ directory, restoring it to its state before this PR's documentation changes. Signed-off-by: akshatha-k <akshutk@gmail.com>
Signed-off-by: akshatha-k <akshutk@gmail.com>
Reorganize router documentation into a clearer three-tier structure: - README.md: Quick start guide (what is this, get running in 60 seconds) - router_guide.md: Production configuration, tuning, disaggregated serving - router_examples.md: Python API, K8s examples, custom routing patterns - router_design.md: Architecture details, algorithms, event transport modes Also moves kv_events.md to integrations/kv_events_custom_engines.md and updates all references across the codebase. Signed-off-by: akshatha-k <akshutk@gmail.com>
WalkthroughThis PR restructures KVBM and router documentation by consolidating fragmented files into centralized guides (kvbm_guide.md, router_guide.md), creating comprehensive design documents (kvbm_design.md, router_design.md), adding new integration documentation (FlexKV, SGLang HiCache), and systematically updating cross-references throughout the codebase to reflect new file locations. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@docs/design_docs/router_design.md`:
- Line 125: Update the wording to use standard spelling "built-in" instead of
"builtin" in the sentence that references Python's built-in hash (the phrase
containing `hash()` and `PYTHONHASHSEED=0`), e.g., change "Python's builtin
`hash()`" to "Python's built-in `hash()`" so the docs for KvIndexer/router and
engine configuration use correct terminology.
In `@docs/integrations/sglang_hicache.md`:
- Around line 50-69: The example uses the same port for both the worker and
frontend (the worker invocation "python -m dynamo.sglang" uses --port 8000 and
the frontend invocation "python -m dynamo.frontend" uses --http-port 8000),
which will collide; update the frontend command to a different HTTP port (e.g.,
--http-port 8001) and make sure the frontend is configured to point to the
worker backend at 0.0.0.0:8000 using the frontend's backend connection option
(e.g., --backend-url or --backend-host/--backend-port if available); update both
commands in the docs so the worker remains on --port 8000 and the frontend
listens on the new HTTP port and connects to the worker host:port.
- Around line 379-383: Update the SGLang version note in the HiCache checklist:
change the line that reads "Check SGLang version supports HiCache (v0.5.3+
recommended)" to clarify minimum and recommended versions (e.g., "SGLang v0.5.2+
(HiCache introduced in v0.5.2); v0.5.3+ recommended for stability"), or adjust
it to match the current upstream recommendation—ensure the checklist entry
referencing SGLang and the string "--enable-hierarchical-cache" reflects the
corrected minimum/recommended versions.
In `@docs/router/router_guide.md`:
- Around line 62-64: The fenced code block containing the command "python -m
dynamo.frontend --router-mode kv" lacks a language identifier; update the block
opener from ``` to ```bash so the snippet becomes a bash-highlighted block and
retains the exact command text inside the fence to enable proper syntax
highlighting in docs/router/router_guide.md.
- Around line 152-154: Update the fenced code block that contains the log
example "Formula for worker_1: 125.3 = 1.0 * 100.5 + 25.0 (cached_blocks: 15)"
to include a language identifier (e.g., use ```text) so the snippet renders
correctly; locate the block in router_guide.md containing that exact log string
and change the opening fence to ```text while keeping the content and closing
fence unchanged.
- Line 61: Replace the unhyphenated compound adjective in the sentence "To
enable KV cache aware routing start the frontend node like this:" by hyphenating
the phrase so it reads "To enable KV-cache-aware routing start the frontend node
like this:" (and apply the same "KV-cache-aware" hyphenation wherever "KV cache
aware" appears in the document).
In `@lib/bindings/kvbm/AGENTS.md`:
- Around line 11-72: The markdown file's fenced code blocks (the directory tree
blocks showing lib/bindings/kvbm/ and the nested python/ and src/ trees) are
missing language identifiers; update each triple-backtick block (the top-level
tree block and the subsequent blocks that list python/, vllm_integration/,
trtllm_integration/, and src/ subtrees) to include an appropriate language tag
such as text (e.g., ```text), bash, or yaml so markdownlint warnings are
resolved; locate the blocks by searching for the directory listings starting
with "lib/bindings/kvbm/" and the nested "python/" and "src/" sections and add
the language identifier to each opening ``` fence.
🧹 Nitpick comments (3)
docs/router/README.md (1)
78-80: Consider varying sentence structure.Three consecutive sentences begin with "For", which can feel repetitive. Consider rephrasing for better flow.
✍️ Proposed rewording
-For A/B testing and advanced K8s setup, see the [KV Router A/B Benchmarking Guide](../benchmarks/kv-router-ab-testing.md). - -For more configuration options and tuning guidelines, see the [Router Guide](router_guide.md). +For A/B testing and advanced K8s setup, see the [KV Router A/B Benchmarking Guide](../benchmarks/kv-router-ab-testing.md). + +Additional configuration options and tuning guidelines are available in the [Router Guide](router_guide.md).docs/design_docs/router_design.md (2)
33-33: Consider more concise wording."based off of" is wordy. Use "based on" for cleaner prose.
✍️ Proposed fix
-The leading Large Language Models (LLMs) today are auto-regressive and based off of the [transformer architecture](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf). +The leading Large Language Models (LLMs) today are auto-regressive and based on the [transformer architecture](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf).
78-78: Consider more concise wording.Replace "in an efficient way" with "efficiently" for more concise prose.
✍️ Proposed fix
-Every inference framework will have a KV Cache for each worker. A popular inference framework library is [vLLM](https://github.com/vllm-project/vllm) where a key contribution was [PagedAttention](https://arxiv.org/abs/2309.06180), which allowed them to manage KV Cache in an efficient way by chunking requests into blocks. +Every inference framework will have a KV Cache for each worker. A popular inference framework library is [vLLM](https://github.com/vllm-project/vllm) where a key contribution was [PagedAttention](https://arxiv.org/abs/2309.06180), which allowed them to manage KV Cache efficiently by chunking requests into blocks.
Signed-off-by: akshatha-k <akshutk@gmail.com>
|
/ok to test 6cf32aa |
Signed-off-by: dagil-nvidia <dagil@nvidia.com>
- Fix "builtin" → "built-in" spelling (router_design.md) - Fix "KV cache aware" → "KV cache-aware" hyphenation (router_guide.md) - Add language identifiers to fenced code blocks (router_guide.md, AGENTS.md) - Remove trailing whitespace (kvbm_guide.md, router_guide.md) - Update hidden_toctree.rst: remove deleted docs, add renamed/new docs - Add router_design.md to Design Docs toctree in index.rst Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
/ok to test 1296914 |
Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
/ok to test 0f64bca |
- hidden_toctree.rst: keep Router integration entries - index.rst: keep KV Aware Routing in User Guides - kvbm_design.md: accept main's KVBM version (G4 storage, formatting) Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Update LoRA README cross-reference from deleted kv_cache_routing.md to its replacement router_guide.md. Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
/ok to test d032a29 |
Changes anticipating Frontend (#6002) and Router (#5979) merges: Frontend cleanup: - Delete docs/frontends/kserve.md and openapi.json (moved to components/frontend/) - Delete docs/_sections/frontends.rst (obsolete, only pointed to kserve.md) - Update index.rst: Frontends -> Frontend at components/frontend/README - Add redirect in conf.py: frontends/kserve -> components/frontend/frontend_guide - Update README.md: openapi.json path to docs/reference/api/ Toctree cleanup: - Remove frontends/kserve.md and _sections/frontends.rst from hidden_toctree.rst - Remove router/kv_cache_routing.md and router/kv_events.md from hidden_toctree.rst (these files will be deleted/moved by Router PR) Note: router old-path references in other files are left for Router PR to handle to avoid merge conflicts. Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
/ok to test 2e6ed34 |
…#5979) Signed-off-by: akshatha-k <akshutk@gmail.com> Signed-off-by: dagil-nvidia <dagil@nvidia.com> Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: dagil-nvidia <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
README.md: Quick start guide (what is this, get running in 60 seconds)router_guide.md: Production configuration, tuning, disaggregated servingrouter_examples.md: Python API, K8s examples, custom routing patternsrouter_design.md: Architecture details, algorithms, event transport modeskv_events.mdtointegrations/kv_events_custom_engines.mdTest plan
Summary by CodeRabbit
Release Notes