Skip to content

Commit ac4d911

Browse files
wayzenglulmer
authored andcommitted
[Doc] Update V1 user guide for fp8 kv cache support (vllm-project#15585)
Signed-off-by: weizeng <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
1 parent ca15761 commit ac4d911

File tree

1 file changed

+1
-3
lines changed

1 file changed

+1
-3
lines changed

docs/source/getting_started/v1_user_guide.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,9 @@ This living user guide outlines a few known **important changes and limitations*
4747
| **Logprobs Calculation** | <nobr>🟢 Functional</nobr> |
4848
| **LoRA** | <nobr>🟢 Functional ([PR #13096](https://github.com/vllm-project/vllm/pull/13096))</nobr>|
4949
| **Multimodal Models** | <nobr>🟢 Functional</nobr> |
50+
| **FP8 KV Cache** | <nobr>🟢 Functional on Hopper devices ([PR #15191](https://github.com/vllm-project/vllm/pull/15191))</nobr>|
5051
| **Spec Decode** | <nobr>🚧 WIP ([PR #13933](https://github.com/vllm-project/vllm/pull/13933))</nobr>|
5152
| **Prompt Logprobs with Prefix Caching** | <nobr>🟡 Planned ([RFC #13414](https://github.com/vllm-project/vllm/issues/13414))</nobr>|
52-
| **FP8 KV Cache** | <nobr>🟡 Planned</nobr> |
5353
| **Structured Output Alternative Backends** | <nobr>🟡 Planned</nobr> |
5454
| **Embedding Models** | <nobr>🟡 Planned ([RFC #12249](https://github.com/vllm-project/vllm/issues/12249))</nobr> |
5555
| **Mamba Models** | <nobr>🟡 Planned</nobr> |
@@ -134,8 +134,6 @@ in progress.
134134

135135
#### Features to Be Supported
136136

137-
- **FP8 KV Cache**: While vLLM V1 introduces new FP8 kernels for model weight quantization, support for an FP8 key–value cache is not yet available. Users must continue using FP16 (or other supported precisions) for the KV cache.
138-
139137
- **Structured Output Alternative Backends**: Structured output alternative backends (outlines, guidance) support is planned. V1 currently
140138
supports only the `xgrammar:no_fallback` mode, meaning that it will error out if the output schema is unsupported by xgrammar.
141139
Details about the structured outputs can be found

0 commit comments

Comments
 (0)