vllm-project · vllm-bot · May 26, 2025 · May 24, 2025 · May 24, 2025 · May 25, 2025
diff --git a/.gitignore b/.gitignore
@@ -146,7 +146,7 @@ venv.bak/
 
 # mkdocs documentation
 /site
-docs/getting_started/examples
+docs/examples
 
 # mypy
 .mypy_cache/

@@ -5,11 +5,9 @@ nav:
       - getting_started/quickstart.md
       - getting_started/installation
     - Examples:
-    - Examples:
+    - Examples: examples
-    - Examples:
+    - Examples: examples
-      - Offline Inference: getting_started/examples/offline_inference
-      - Online Serving: getting_started/examples/online_serving
-      - Others:
-        - LMCache: getting_started/examples/lmcache
-        - getting_started/examples/other/*
+      - Offline Inference: examples/offline_inference
+      - Online Serving: examples/online_serving
+      - Others: examples/others
     - Quick Links:
       - User Guide: usage/README.md
       - Developer Guide: contributing/README.md

diff --git a/docs/design/v1/metrics.md b/docs/design/v1/metrics.md
@@ -61,7 +61,7 @@ These are documented under [Inferencing and Serving -> Production Metrics](../..
 
 ### Grafana Dashboard
 
-vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/getting_started/examples/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
+vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/examples/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
 
 The subset of metrics exposed in the Grafana dashboard gives us an indication of which metrics are especially important:
 
@@ -673,7 +673,7 @@ v0 has support for OpenTelemetry tracing:
 - [OpenTelemetry blog
   post](https://opentelemetry.io/blog/2024/llm-observability/)
 - [User-facing
-  docs](https://docs.vllm.ai/en/latest/getting_started/examples/opentelemetry.html)
+  docs](https://docs.vllm.ai/en/latest/examples/opentelemetry.html)
 - [Blog
   post](https://medium.com/@ronen.schaffer/follow-the-trail-supercharging-vllm-with-opentelemetry-distributed-tracing-aa655229b46f)
 - [IBM product

@@ -9,7 +9,7 @@
 ROOT_DIR = Path(__file__).parent.parent.parent.parent
 ROOT_DIR_RELATIVE = '../../../../..'
 EXAMPLE_DIR = ROOT_DIR / "examples"
-EXAMPLE_DOC_DIR = ROOT_DIR / "docs/getting_started/examples"
+EXAMPLE_DOC_DIR = ROOT_DIR / "docs/examples"
 print(ROOT_DIR.resolve())
 print(EXAMPLE_DIR.resolve())
 print(EXAMPLE_DOC_DIR.resolve())

diff --git a/docs/models/extensions/tensorizer.md b/docs/models/extensions/tensorizer.md
@@ -10,7 +10,7 @@ shorter Pod startup times and CPU memory usage. Tensor encryption is also suppor
 
 For more information on CoreWeave's Tensorizer, please refer to
 [CoreWeave's Tensorizer documentation](https://github.com/coreweave/tensorizer). For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
-the [vLLM example script](https://docs.vllm.ai/en/latest/getting_started/examples/tensorize_vllm_model.html).
+the [vLLM example script](https://docs.vllm.ai/en/latest/examples/tensorize_vllm_model.html).
 
 !!! note
     Note that to use this feature you will need to install `tensorizer` by running `pip install vllm[tensorizer]`.
diff --git a/docs/training/rlhf.md b/docs/training/rlhf.md
@@ -6,6 +6,6 @@ vLLM can be used to generate the completions for RLHF. The best way to do this i
 
 See the following basic examples to get started if you don't want to use an existing library:
 
-- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf.html)
-- [Training and inference processes are colocated on the same GPUs using Ray](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_colocate.html)
-- [Utilities for performing RLHF with vLLM](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_utils.html)
+- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](../../examples/offline_inference/rlhf.py)
+- [Training and inference processes are colocated on the same GPUs using Ray](../../examples/offline_inference/rlhf_colocate.py)
+- [Utilities for performing RLHF with vLLM](../../examples/offline_inference/rlhf_utils.py)
diff --git a/examples/lmcache/README.md → examples/others/lmcache/README.md b/examples/lmcache/README.md → examples/others/lmcache/README.md
diff --git a/examples/lmcache/cpu_offload_lmcache.py → ...les/others/lmcache/cpu_offload_lmcache.py b/examples/lmcache/cpu_offload_lmcache.py → ...les/others/lmcache/cpu_offload_lmcache.py
diff --git a/...ples/lmcache/disagg_prefill_lmcache_v0.py → ...hers/lmcache/disagg_prefill_lmcache_v0.py b/...ples/lmcache/disagg_prefill_lmcache_v0.py → ...hers/lmcache/disagg_prefill_lmcache_v0.py
diff --git a/...he_v1/configs/lmcache-decoder-config.yaml → ...he_v1/configs/lmcache-decoder-config.yaml b/...he_v1/configs/lmcache-decoder-config.yaml → ...he_v1/configs/lmcache-decoder-config.yaml
diff --git a/..._v1/configs/lmcache-prefiller-config.yaml → ..._v1/configs/lmcache-prefiller-config.yaml b/..._v1/configs/lmcache-prefiller-config.yaml → ..._v1/configs/lmcache-prefiller-config.yaml
diff --git a/...prefill_lmcache_v1/disagg_example_nixl.sh → ...prefill_lmcache_v1/disagg_example_nixl.sh b/...prefill_lmcache_v1/disagg_example_nixl.sh → ...prefill_lmcache_v1/disagg_example_nixl.sh
diff --git a/...prefill_lmcache_v1/disagg_proxy_server.py → ...prefill_lmcache_v1/disagg_proxy_server.py b/...prefill_lmcache_v1/disagg_proxy_server.py → ...prefill_lmcache_v1/disagg_proxy_server.py
diff --git a/...refill_lmcache_v1/disagg_vllm_launcher.sh → ...refill_lmcache_v1/disagg_vllm_launcher.sh b/...refill_lmcache_v1/disagg_vllm_launcher.sh → ...refill_lmcache_v1/disagg_vllm_launcher.sh
diff --git a/...es/lmcache/kv_cache_sharing_lmcache_v1.py → ...rs/lmcache/kv_cache_sharing_lmcache_v1.py b/...es/lmcache/kv_cache_sharing_lmcache_v1.py → ...rs/lmcache/kv_cache_sharing_lmcache_v1.py
diff --git a/examples/other/logging_configuration.md → examples/others/logging_configuration.md b/examples/other/logging_configuration.md → examples/others/logging_configuration.md
diff --git a/examples/other/tensorize_vllm_model.py → examples/others/tensorize_vllm_model.py b/examples/other/tensorize_vllm_model.py → examples/others/tensorize_vllm_model.py