The NVIDIA Dynamo Platform is a high-performance, low-latency inference framework designed to serve all AI models—across any framework, architecture, or deployment scale.
💎 Discover the latest developments!
This guide is a snapshot of a specific point in time. For the latest information, examples, and Release Assets, see the Dynamo GitHub repository.
.. toctree:: :hidden: :caption: Getting Started Quickstart <self> Support Matrix <reference/support-matrix.md> Feature Matrix <reference/feature-matrix.md> Release Artifacts <reference/release-artifacts.md> Examples <_sections/examples>
.. toctree:: :hidden: :caption: Kubernetes Deployment Deployment Guide <_sections/k8s_deployment> Observability (K8s) <_sections/k8s_observability> Multinode <_sections/k8s_multinode>
.. toctree:: :hidden: :caption: User Guides KV Cache Offloading <kvbm/kvbm_guide.md> Tool Calling <agents/tool-calling.md> Multimodality Support <multimodal/index.md> Finding Best Initial Configs <performance/aiconfigurator.md> Benchmarking <benchmarks/benchmarking.md> Tuning Disaggregated Performance <performance/tuning.md> Writing Python Workers in Dynamo <development/backend-guide.md> Observability (Local) <_sections/observability> Fault Tolerance <_sections/fault_tolerance> Glossary <reference/glossary.md>
.. toctree:: :hidden: :caption: Components Backends <_sections/backends> Frontends <_sections/frontends> Router <router/README> Planner <planner/planner_intro> KVBM <kvbm/kvbm_intro>
.. toctree:: :hidden: :caption: Design Docs Overall Architecture <design_docs/architecture.md> Architecture Flow <design_docs/dynamo_flow.md> Disaggregated Serving <design_docs/disagg_serving.md> Distributed Runtime <design_docs/distributed_runtime.md> Request Plane <design_docs/request_plane.md> Event Plane <design_docs/event_plane.md>