[ROCm] [CI] Add ROCm Dockerfile and AMD CI (vllm v0.12.0) by tjtanaa · Pull Request #280 · vllm-project/vllm-omni

tjtanaa · 2025-12-11T09:36:56Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Setting up CI and AMD Docker images

AMD CI

The vLLM Omni AMD CI is setup following the vLLM AMD CI. Given that the whole setup (agents and agent environment) is the same, I have adapted the bootstrap files from vLLM AMD CI to make it work with vLLM Omni AMD CI.

I suggest that we keep the bootstrap code confined in this repo first for easier maintainability. Let me know if anyone thinks otherwise.

The difference between the vLLM Omni AMD CI and vLLM AMD CI is that we try to make everything upstream compatible. The AMD CI docker image is store at the same aws repo as the other hardware.

The tests that are currently enabled:

"Diffusion Model Test"
"Diffusion Cache Backend Test"
"Omni Model Test Qwen2-5-Omni"

The remaining test is under investigation:

"Omni Model Test Qwen3-Omni"

AMD Docker images

Build docker image. The docker image is also used for CI.

docker build -f docker/Dockerfile.rocm -t vllm-omni-rocm .

How to launch the docker image

#!/bin/bash
docker run -it \
   --privileged \
   --network=host \
   --group-add=video \
   --ipc=host \
   --cap-add=SYS_PTRACE \
   --security-opt seccomp=unconfined \
   --device /dev/kfd \
   --device /dev/dri \
   --name vllm-omni-rocm \
   vllm-omni-rocm \
   bash

Update installation step to include docker build instruction and docker launch instruction.

Test Plan

Built locally and is able to run the examples without failure.

Passed the vllm-omni AMD CI.
Latest Run: https://buildkite.com/vllm/vllm-omni-amd-ci/builds/43/steps/canvas You can also find it in the CI check at the bottom of this page.

Test Result

Ran the Qwen-image examples

cd examples/offline_inference/qwen_image/

python text_to_image.py \
  --prompt "a cup of coffee on the table" \
  --seed 42 \
  --cfg_scale 4.0 \
  --num_images_per_prompt 1 \
  --num_inference_steps 50 \
  --height 1024 \
  --width 1024 \
  --output outputs/coffee.png

Extra details for future PR

create_new_process_for_each_test decorator does not seem to be working. When it is in spawn mode, the content of the test decorated by create_new_process_for_each_test is not going to run, and the test returned as PASSED immediately.
The tests on CUDA and NPU are able to run correctly because on these two platforms, the create_new_process_for_each_test will be evaluated as using fork and vLLM internally is running as spawn.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: tjtanaa <[email protected]>

hongxiayang · 2025-12-11T15:46:12Z

docker/Dockerfile.rocm

+ARG ROCM_BASE_IMAGE=rocm/vllm-dev
+ARG ROCM_BASE_TAG=nightly_main_20251005


Please simply this.

Suggested change

ARG ROCM_BASE_IMAGE=rocm/vllm-dev

ARG ROCM_BASE_TAG=nightly_main_20251005

ARG BASE_IMAGE=rocm/vllm-dev:nightly_main_20251005

hongxiayang · 2025-12-11T15:46:53Z

docker/Dockerfile.rocm

@@ -0,0 +1,41 @@
+ARG ROCM_BASE_IMAGE=rocm/vllm-dev
+ARG ROCM_BASE_TAG=nightly_main_20251005
+FROM ${ROCM_BASE_IMAGE}:${ROCM_BASE_TAG}


Suggested change

FROM ${ROCM_BASE_IMAGE}:${ROCM_BASE_TAG}

FROM ${BASE_IMAGE}

hongxiayang · 2025-12-11T15:48:20Z

docker/Dockerfile.rocm

+ARG ROCM_BASE_TAG=nightly_main_20251005
+FROM ${ROCM_BASE_IMAGE}:${ROCM_BASE_TAG}
+
+ARG APP_DIR=/workspace/vllm-omni


Please use the same /app directory as original vllm image

Suggested change

ARG APP_DIR=/workspace/vllm-omni

ARG COMMON_WORKDIR=/app

hongxiayang · 2025-12-11T15:49:45Z

docker/Dockerfile.rocm

+ARG VLLM_VERSION=v0.11.0
+ARG PYTORCH_ROCM_ARCH="gfx942;gfx950"
+
+WORKDIR ${APP_DIR}


Suggested change

WORKDIR ${APP_DIR}

WORKDIR ${COMMON_WORKDIR}

hongxiayang · 2025-12-11T15:50:31Z

docker/Dockerfile.rocm

+    rm -rf /var/lib/apt/lists/*
+
+# Step 2: Reinstall vllm from source
+RUN cd ../ && python3 -m pip uninstall -y vllm && \


Suggested change

RUN cd ../ && python3 -m pip uninstall -y vllm && \

RUN python3 -m pip uninstall -y vllm && \

docker/Dockerfile.rocm

hongxiayang · 2025-12-11T16:00:09Z

docker/Dockerfile.rocm

+ENV VLLM_ROCM_USE_AITER=1
+ENV VLLM_ROCM_USE_AITER_MHA=1
+ENV VLLM_ROCM_USE_AITER_LINEAR=0
+ENV VLLM_ROCM_USE_AITER_RMSNORM=0


we should not put this in the common Dockerfile.

Suggested change

ENV VLLM_ROCM_USE_AITER=1

ENV VLLM_ROCM_USE_AITER_MHA=1

ENV VLLM_ROCM_USE_AITER_LINEAR=0

ENV VLLM_ROCM_USE_AITER_RMSNORM=0

These environment variables, if needed, can be set when invoking/running the unit tests.

hongxiayang · 2025-12-11T16:01:38Z

docker/Dockerfile.rocm

+    rm -rf vllm/.git
+
+# Step 3: Copy vllm-omni code and install without uv
+COPY . ${APP_DIR}


adjust to use {WORKDIR}

docker/Dockerfile.rocm

Co-authored-by: Hongxia Yang <[email protected]> Signed-off-by: tjtanaa <[email protected]>

Signed-off-by: tjtanaa <[email protected]>

… is rocm to handle unit tests that are failing Signed-off-by: tjtanaa <[email protected]>

Signed-off-by: tjtanaa <[email protected]>

tjtanaa · 2025-12-18T01:04:47Z

(18 Dec 2025) Passed https://buildkite.com/vllm/vllm-omni-amd-ci/builds/37/steps/canvas

Signed-off-by: tjtanaa <[email protected]>

…h_attn.layers' Signed-off-by: tjtanaa <[email protected]>

tjtanaa · 2025-12-20T03:07:00Z

This PR also resolves #387

tjtanaa · 2025-12-20T05:50:56Z

@hongxiayang your requested review has been addressed.

@hsliuustc0106 Could you help to review this PR? Thank you. The unit tests for the Teacache and DiTCache are also passing.

ZJY0516 · 2025-12-20T07:35:36Z

vllm_omni/diffusion/layers/rope.py

        self.is_neox_style = is_neox_style
        self.interleaved = not is_neox_style
+        self.triton_rotary_emb = None
+        if is_rocm():


Could you explain why we don't need this in https://github.com/vllm-project/vllm/blob/ff2168bca3a195b835c64a5c9012d7b6a9f34e61/vllm/model_executor/layers/rotary_embedding/common.py#L122

You are right. We can simplify the conditions and not use is_rocm() follow the implementation on vLLM side. Initially, I added is_rocm so that other platform can handle the case themselves. I do not know what are the libraries the other platforms have. Like how we missed the path because authors are not aware of what library does other platform has.

@ZJY0516 Sorry. I don't know if I understand your question correctly, this is a new abstraction https://github.com/vllm-project/vllm/blob/ff2168bca3a195b835c64a5c9012d7b6a9f34e61/vllm/model_executor/layers/rotary_embedding/common.py#L122 that is after vllm v0.12.0. If we want to adopt this code, I suggest we do this in another PR.

Signed-off-by: tjtanaa <[email protected]>

…ct#280) Signed-off-by: tjtanaa <[email protected]> Co-authored-by: Hongxia Yang <[email protected]> Signed-off-by: Didan Deng <[email protected]>

…ct#280) Signed-off-by: tjtanaa <[email protected]> Co-authored-by: Hongxia Yang <[email protected]> Signed-off-by: wangyu31577 <[email protected]>

…ct#280) Signed-off-by: tjtanaa <[email protected]> Co-authored-by: Hongxia Yang <[email protected]>

add Dockerfile.rocm

4e4a0c6

Signed-off-by: tjtanaa <[email protected]>

tjtanaa requested a review from hsliuustc0106 as a code owner December 11, 2025 09:36

tjtanaa added 6 commits December 11, 2025 09:48

Merge remote-tracking branch 'origin/main' into dockerfile-amd-2

8cac62d

add dockerfile build instruction

2b7ff41

Signed-off-by: tjtanaa <[email protected]>

add preliminary CI files

b03d282

Signed-off-by: tjtanaa <[email protected]>

fix local error

d5c75c3

Signed-off-by: tjtanaa <[email protected]>

simplify amd test to just test build docker

e2c24b7

Signed-off-by: tjtanaa <[email protected]>

use amd-cpu to build image like in vLLM

46da880

Signed-off-by: tjtanaa <[email protected]>

hongxiayang suggested changes Dec 11, 2025

View reviewed changes

hongxiayang reviewed Dec 11, 2025

View reviewed changes

docker/Dockerfile.rocm Show resolved Hide resolved

tjtanaa and others added 15 commits December 12, 2025 11:54

apply review feedback

bb03847

Co-authored-by: Hongxia Yang <[email protected]> Signed-off-by: tjtanaa <[email protected]>

fix precommit

055090d

Signed-off-by: tjtanaa <[email protected]>

Merge remote-tracking branch 'origin/main' into dockerfile-amd-2

c3f1a06

test pushing CI docker

e8374b5

Signed-off-by: tjtanaa <[email protected]>

try using cpu_queue_premerge_us_east_1 to build image

02e68e2

Signed-off-by: tjtanaa <[email protected]>

add preliminary script to run amd ci

57661bd

Signed-off-by: tjtanaa <[email protected]>

Merge remote-tracking branch 'origin/main' into dockerfile-amd-2

ae8a0cc

add change the working directory of vllm omni docker image in CI; add…

89987f3

… is rocm to handle unit tests that are failing Signed-off-by: tjtanaa <[email protected]>

Merge remote-tracking branch 'origin/main' into dockerfile-amd-2

81c0dd8

fix test path; add qwen25 omni

19c3056

Signed-off-by: tjtanaa <[email protected]>

add necessary env flag for mi325 vllm 0.11.0

8ae3569

Signed-off-by: tjtanaa <[email protected]>

fix get device; add qwen3-omni unit tests

442dc44

Signed-off-by: tjtanaa <[email protected]>

fix the file pointed by qwen3 omni test

3532ec7

Signed-off-by: tjtanaa <[email protected]>

trying to fix aiter mi325x arch auto detection issue

056fe9a

Signed-off-by: tjtanaa <[email protected]>

fix the rocm qwen3 omni unit test

7c689e3

Signed-off-by: tjtanaa <[email protected]>

tjtanaa changed the title ~~[ROCm] Add ROCm Dockerfile~~ [ROCm] [CI] Add ROCm Dockerfile and AMD CI Dec 17, 2025

tjtanaa added 4 commits December 17, 2025 14:17

remove qwen3 unit tests first; reuse AITER_ROCM_ARCH from base image

ce45b1f

Signed-off-by: tjtanaa <[email protected]>

sync with upstream

055de64

Signed-off-by: tjtanaa <[email protected]>

remove print

5e9c4d3

Signed-off-by: tjtanaa <[email protected]>

simplify more

d865d18

Signed-off-by: tjtanaa <[email protected]>

tjtanaa added 2 commits December 18, 2025 02:03

remove unwanted print

c94d67e

Signed-off-by: tjtanaa <[email protected]>

remove qwen3 omni test relate file for now

5c104bb

Signed-off-by: tjtanaa <[email protected]>

tjtanaa requested a review from hongxiayang December 18, 2025 05:24

hsliuustc0106 mentioned this pull request Dec 19, 2025

[Roadmap]: preparing for v0.12.0 release #165

Closed

61 tasks

tjtanaa added 3 commits December 19, 2025 14:39

Merge remote-tracking branch 'origin/main' into dockerfile-amd-2

2ff27c6

upgrade vllm version to 0.12.0 following main

e6d5b32

Signed-off-by: tjtanaa <[email protected]>

fix import error ModuleNotFoundError: No module named 'vllm.vllm_flas…

ef7a50d

…h_attn.layers' Signed-off-by: tjtanaa <[email protected]>

tjtanaa changed the title ~~[ROCm] [CI] Add ROCm Dockerfile and AMD CI~~ [ROCm] [CI] Add ROCm Dockerfile and AMD CI (vllm v0.12.0 Dec 20, 2025

tjtanaa changed the title ~~[ROCm] [CI] Add ROCm Dockerfile and AMD CI (vllm v0.12.0~~ [ROCm] [CI] Add ROCm Dockerfile and AMD CI (vllm v0.12.0) Dec 20, 2025

ywang96 approved these changes Dec 20, 2025

View reviewed changes

ywang96 added the ready label to trigger buildkite CI label Dec 20, 2025

ywang96 enabled auto-merge (squash) December 20, 2025 06:56

ZJY0516 reviewed Dec 20, 2025

View reviewed changes

tjtanaa added 2 commits December 20, 2025 19:00

Merge remote-tracking branch 'origin/main' into dockerfile-amd-2

453ce5c

add forward_hip instead of sharing the same path with cuda

cc225dd

Signed-off-by: tjtanaa <[email protected]>

auto-merge was automatically disabled December 20, 2025 19:46
Head branch was pushed to by a user without write access

tjtanaa added 5 commits December 20, 2025 19:59

revert forward_cuda

858f74e

Signed-off-by: tjtanaa <[email protected]>

add forward hip dispatching logic

5695e2f

Signed-off-by: tjtanaa <[email protected]>

try to do torch sync when destructing omni runner in tests

32233ae

Signed-off-by: tjtanaa <[email protected]>

revert the create_new_process_for_each_test for test_qwen25omni

dd3e6db

Signed-off-by: tjtanaa <[email protected]>

fix create_new_process_for_each_test

fb1b1d8

Signed-off-by: tjtanaa <[email protected]>

DarkLight1337 merged commit 13584fd into vllm-project:main Dec 21, 2025
7 checks passed

hsliuustc0106 mentioned this pull request Dec 22, 2025

[RFC]: Trigger mechanism of non-GPU CI #348

Open

1 task

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026

[ROCm] [CI] Add ROCm Dockerfile and AMD CI (vllm v0.12.0) (vllm-proje…

183a864

…ct#280) Signed-off-by: tjtanaa <[email protected]> Co-authored-by: Hongxia Yang <[email protected]>

loveysuby mentioned this pull request Feb 23, 2026

[CI/Build] Add Dockerfile.cuda for NVIDIA GPU users #1439

Open

5 tasks

		ARG ROCM_BASE_IMAGE=rocm/vllm-dev
		ARG ROCM_BASE_TAG=nightly_main_20251005

	ARG ROCM_BASE_IMAGE=rocm/vllm-dev
	ARG ROCM_BASE_TAG=nightly_main_20251005
	ARG BASE_IMAGE=rocm/vllm-dev:nightly_main_20251005

	RUN cd ../ && python3 -m pip uninstall -y vllm && \
	RUN python3 -m pip uninstall -y vllm && \

Conversation

tjtanaa commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

AMD CI

AMD Docker images

Test Plan

Test Result

Extra details for future PR

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tjtanaa commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjtanaa commented Dec 20, 2025

Uh oh!

tjtanaa commented Dec 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjtanaa Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tjtanaa commented Dec 11, 2025 •

edited

Loading

tjtanaa commented Dec 18, 2025 •

edited

Loading

tjtanaa Dec 20, 2025 •

edited

Loading