Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ venv.bak/

# mkdocs documentation
/site
docs/getting_started/examples
docs/examples

# mypy
.mypy_cache/
Expand Down
8 changes: 3 additions & 5 deletions docs/.nav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,9 @@ nav:
- getting_started/quickstart.md
- getting_started/installation
- Examples:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that there's no LMCache special case, we can just do this and remove the three lines below

Suggested change
- Examples:
- Examples: examples

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted the titles to be capitalized correctly

- Offline Inference: getting_started/examples/offline_inference
- Online Serving: getting_started/examples/online_serving
- Others:
- LMCache: getting_started/examples/lmcache
- getting_started/examples/other/*
- Offline Inference: examples/offline_inference
- Online Serving: examples/online_serving
- Others: examples/others
- Quick Links:
- User Guide: usage/README.md
- Developer Guide: contributing/README.md
Expand Down
4 changes: 2 additions & 2 deletions docs/design/v1/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ These are documented under [Inferencing and Serving -> Production Metrics](../..

### Grafana Dashboard

vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/getting_started/examples/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/examples/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.

The subset of metrics exposed in the Grafana dashboard gives us an indication of which metrics are especially important:

Expand Down Expand Up @@ -673,7 +673,7 @@ v0 has support for OpenTelemetry tracing:
- [OpenTelemetry blog
post](https://opentelemetry.io/blog/2024/llm-observability/)
- [User-facing
docs](https://docs.vllm.ai/en/latest/getting_started/examples/opentelemetry.html)
docs](https://docs.vllm.ai/en/latest/examples/opentelemetry.html)
- [Blog
post](https://medium.com/@ronen.schaffer/follow-the-trail-supercharging-vllm-with-opentelemetry-distributed-tracing-aa655229b46f)
- [IBM product
Expand Down
2 changes: 1 addition & 1 deletion docs/mkdocs/hooks/generate_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
ROOT_DIR = Path(__file__).parent.parent.parent.parent
ROOT_DIR_RELATIVE = '../../../../..'
EXAMPLE_DIR = ROOT_DIR / "examples"
EXAMPLE_DOC_DIR = ROOT_DIR / "docs/getting_started/examples"
EXAMPLE_DOC_DIR = ROOT_DIR / "docs/examples"
print(ROOT_DIR.resolve())
print(EXAMPLE_DIR.resolve())
print(EXAMPLE_DOC_DIR.resolve())
Expand Down
2 changes: 1 addition & 1 deletion docs/models/extensions/tensorizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ shorter Pod startup times and CPU memory usage. Tensor encryption is also suppor

For more information on CoreWeave's Tensorizer, please refer to
[CoreWeave's Tensorizer documentation](https://github.com/coreweave/tensorizer). For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
the [vLLM example script](https://docs.vllm.ai/en/latest/getting_started/examples/tensorize_vllm_model.html).
the [vLLM example script](https://docs.vllm.ai/en/latest/examples/tensorize_vllm_model.html).

!!! note
Note that to use this feature you will need to install `tensorizer` by running `pip install vllm[tensorizer]`.
6 changes: 3 additions & 3 deletions docs/training/rlhf.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@ vLLM can be used to generate the completions for RLHF. The best way to do this i

See the following basic examples to get started if you don't want to use an existing library:

- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf.html)
- [Training and inference processes are colocated on the same GPUs using Ray](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_colocate.html)
- [Utilities for performing RLHF with vLLM](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_utils.html)
- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](../../examples/offline_inference/rlhf.py)
- [Training and inference processes are colocated on the same GPUs using Ray](../../examples/offline_inference/rlhf_colocate.py)
- [Utilities for performing RLHF with vLLM](../../examples/offline_inference/rlhf_utils.py)
File renamed without changes.