Skip to content

Commit be25ee4

Browse files
hongxiayangLeiWang1999
authored andcommitted
[Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users (vllm-project#6754)
Signed-off-by: LeiWang1999 <[email protected]>
1 parent c48eff4 commit be25ee4

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

docs/source/getting_started/amd-installation.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,3 +142,10 @@ Alternatively, wheels intended for vLLM use can be accessed under the releases.
142142
- Triton flash attention does not currently support sliding window attention. If using half precision, please use CK flash-attention for sliding window support.
143143
- To use CK flash-attention or PyTorch naive attention, please use this flag ``export VLLM_USE_TRITON_FLASH_ATTN=0`` to turn off triton flash attention.
144144
- The ROCm version of PyTorch, ideally, should match the ROCm driver version.
145+
146+
147+
.. tip::
148+
- For MI300x (gfx942) users, to achieve optimal performance, please refer to `MI300x tuning guide <https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/index.html>`_ for performance optimization and tuning tips on system and workflow level.
149+
For vLLM, please refer to `vLLM performance optimization <https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html#vllm-performance-optimization>`_.
150+
151+

0 commit comments

Comments
 (0)