[ROCm] [CI] Add ROCm Dockerfile and AMD CI (vllm v0.12.0)#280
[ROCm] [CI] Add ROCm Dockerfile and AMD CI (vllm v0.12.0)#280DarkLight1337 merged 39 commits intovllm-project:mainfrom
Conversation
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
docker/Dockerfile.rocm
Outdated
| ARG ROCM_BASE_IMAGE=rocm/vllm-dev | ||
| ARG ROCM_BASE_TAG=nightly_main_20251005 |
There was a problem hiding this comment.
Please simply this.
| ARG ROCM_BASE_IMAGE=rocm/vllm-dev | |
| ARG ROCM_BASE_TAG=nightly_main_20251005 | |
| ARG BASE_IMAGE=rocm/vllm-dev:nightly_main_20251005 |
docker/Dockerfile.rocm
Outdated
| @@ -0,0 +1,41 @@ | |||
| ARG ROCM_BASE_IMAGE=rocm/vllm-dev | |||
| ARG ROCM_BASE_TAG=nightly_main_20251005 | |||
| FROM ${ROCM_BASE_IMAGE}:${ROCM_BASE_TAG} | |||
There was a problem hiding this comment.
| FROM ${ROCM_BASE_IMAGE}:${ROCM_BASE_TAG} | |
| FROM ${BASE_IMAGE} |
docker/Dockerfile.rocm
Outdated
| ARG ROCM_BASE_TAG=nightly_main_20251005 | ||
| FROM ${ROCM_BASE_IMAGE}:${ROCM_BASE_TAG} | ||
|
|
||
| ARG APP_DIR=/workspace/vllm-omni |
There was a problem hiding this comment.
Please use the same /app directory as original vllm image
| ARG APP_DIR=/workspace/vllm-omni | |
| ARG COMMON_WORKDIR=/app |
docker/Dockerfile.rocm
Outdated
| ARG VLLM_VERSION=v0.11.0 | ||
| ARG PYTORCH_ROCM_ARCH="gfx942;gfx950" | ||
|
|
||
| WORKDIR ${APP_DIR} |
There was a problem hiding this comment.
| WORKDIR ${APP_DIR} | |
| WORKDIR ${COMMON_WORKDIR} |
docker/Dockerfile.rocm
Outdated
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Step 2: Reinstall vllm from source | ||
| RUN cd ../ && python3 -m pip uninstall -y vllm && \ |
There was a problem hiding this comment.
| RUN cd ../ && python3 -m pip uninstall -y vllm && \ | |
| RUN python3 -m pip uninstall -y vllm && \ |
docker/Dockerfile.rocm
Outdated
| ENV VLLM_ROCM_USE_AITER=1 | ||
| ENV VLLM_ROCM_USE_AITER_MHA=1 | ||
| ENV VLLM_ROCM_USE_AITER_LINEAR=0 | ||
| ENV VLLM_ROCM_USE_AITER_RMSNORM=0 |
There was a problem hiding this comment.
we should not put this in the common Dockerfile.
| ENV VLLM_ROCM_USE_AITER=1 | |
| ENV VLLM_ROCM_USE_AITER_MHA=1 | |
| ENV VLLM_ROCM_USE_AITER_LINEAR=0 | |
| ENV VLLM_ROCM_USE_AITER_RMSNORM=0 |
There was a problem hiding this comment.
These environment variables, if needed, can be set when invoking/running the unit tests.
docker/Dockerfile.rocm
Outdated
| rm -rf vllm/.git | ||
|
|
||
| # Step 3: Copy vllm-omni code and install without uv | ||
| COPY . ${APP_DIR} |
There was a problem hiding this comment.
adjust to use {WORKDIR}
Co-authored-by: Hongxia Yang <[email protected]> Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
… is rocm to handle unit tests that are failing Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
|
(18 Dec 2025) Passed https://buildkite.com/vllm/vllm-omni-amd-ci/builds/37/steps/canvas |
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
…h_attn.layers' Signed-off-by: tjtanaa <[email protected]>
|
This PR also resolves #387 |
|
@hongxiayang your requested review has been addressed. @hsliuustc0106 Could you help to review this PR? Thank you. The unit tests for the Teacache and DiTCache are also passing. |
vllm_omni/diffusion/layers/rope.py
Outdated
| self.is_neox_style = is_neox_style | ||
| self.interleaved = not is_neox_style | ||
| self.triton_rotary_emb = None | ||
| if is_rocm(): |
There was a problem hiding this comment.
Could you explain why we don't need this in https://github.com/vllm-project/vllm/blob/ff2168bca3a195b835c64a5c9012d7b6a9f34e61/vllm/model_executor/layers/rotary_embedding/common.py#L122
There was a problem hiding this comment.
You are right. We can simplify the conditions and not use is_rocm() follow the implementation on vLLM side. Initially, I added is_rocm so that other platform can handle the case themselves. I do not know what are the libraries the other platforms have. Like how we missed the path because authors are not aware of what library does other platform has.
There was a problem hiding this comment.
@ZJY0516 Sorry. I don't know if I understand your question correctly, this is a new abstraction https://github.com/vllm-project/vllm/blob/ff2168bca3a195b835c64a5c9012d7b6a9f34e61/vllm/model_executor/layers/rotary_embedding/common.py#L122 that is after vllm v0.12.0. If we want to adopt this code, I suggest we do this in another PR.
Head branch was pushed to by a user without write access
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
…ct#280) Signed-off-by: tjtanaa <[email protected]> Co-authored-by: Hongxia Yang <[email protected]> Signed-off-by: Didan Deng <[email protected]>
…ct#280) Signed-off-by: tjtanaa <[email protected]> Co-authored-by: Hongxia Yang <[email protected]> Signed-off-by: wangyu31577 <[email protected]>
…ct#280) Signed-off-by: tjtanaa <[email protected]> Co-authored-by: Hongxia Yang <[email protected]>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Setting up CI and AMD Docker images
AMD CI
The vLLM Omni AMD CI is setup following the vLLM AMD CI. Given that the whole setup (agents and agent environment) is the same, I have adapted the bootstrap files from vLLM AMD CI to make it work with vLLM Omni AMD CI.
I suggest that we keep the bootstrap code confined in this repo first for easier maintainability. Let me know if anyone thinks otherwise.
The difference between the vLLM Omni AMD CI and vLLM AMD CI is that we try to make everything upstream compatible. The AMD CI docker image is store at the same aws repo as the other hardware.
The tests that are currently enabled:
The remaining test is under investigation:
AMD Docker images
Build docker image. The docker image is also used for CI.
How to launch the docker image
Update installation step to include docker build instruction and docker launch instruction.
Test Plan
Built locally and is able to run the examples without failure.
Passed the vllm-omni AMD CI.
Latest Run: https://buildkite.com/vllm/vllm-omni-amd-ci/builds/43/steps/canvas You can also find it in the CI check at the bottom of this page.
Test Result
Ran the Qwen-image examples
Extra details for future PR
create_new_process_for_each_testdecorator does not seem to be working. When it is inspawnmode, the content of the test decorated bycreate_new_process_for_each_testis not going to run, and the test returned as PASSED immediately.The tests on CUDA and NPU are able to run correctly because on these two platforms, the
create_new_process_for_each_testwill be evaluated as usingforkand vLLM internally is running asspawn.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)