Rocm fix #568

Aman-Dwivedi · 2025-09-18T14:38:06Z

Added amd gpu support.
Updated requirements for rocm and added functions in setup.py to detect amd gpus. An example script has also been added by yiakwy-xpu-ml-framework-team.

avjves · 2025-09-22T07:36:21Z

Hi! This makes a custom requirements file for ROCm specifically - is there a reason for it? Also, that yunchang branch / version it installs is a year old with some changes. Yunchang already supports AMD GPUs in the upstream repo via flash_attn or AITER (the latest way to call FA with AMD GPUs). This seems like a regression.

Also, this breaks the changes made by PR #559 due to the duplicate imports in xfuser/core/long_ctx_attention/ring/ring_flash_attn.py. Currently it's gated like this:

xDiT/xfuser/core/long_ctx_attention/ring/ring_flash_attn.py

Line 18 in cd06115

try:

feifeibear

LGTM

feifeibear · 2025-09-23T04:46:42Z

@Aman-Dwivedi could you please check the duplicate imports problem mentioned before?

avjves · 2025-09-23T07:25:09Z

This line in the requirements:

yunchang @ git+https://github.com/yiakwy-xpu-ml-framework-team/xDiT-long-context-attention-fork.git@add_amd_gpu_suppport

is also a big problem in general for AMD GPUs. Could it also be removed? 😄

eppaneamd · 2025-09-23T12:48:13Z

@feifeibear kindly note that this PR should be revisited and its merits re-evaluated.

@Aman-Dwivedi could you elaborate why this PR is needed, how xDiT and yunchang is not working for AMD GPUs currently? Why gfx942 is the only allowed gpu arch? Have you tried newer images than rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0, such as rocm7.0_ubuntu22.04_py3.10_pytorch_release_2.8.0?

Aman-Dwivedi · 2025-10-12T18:41:24Z

@avjves @eppaneamd This PR is in response to #437. I cherrypicked the commit mention in the issue and made some changes to run it smoothly on the amd cluster. I was able to run a example. I tried running the main branch since then. After facing a lot of hurdles I am able to run it now but I don't think it is running properly. It produced a gibberish image, attached below. I tried following this readme (https://github.com/feifeibear/long-context-attention/blob/main/docs/install_amd.md) for installing yunchang but it resulted in errors. Is there another updated readme on making xdit run on amd cluster? Is the image generated due to some error on my part or is there an issue with xdit?

avjves · 2025-10-15T16:42:38Z

@Aman-Dwivedi
xDiT and yunchang should both work OOB with AMD GPUs, though pipeline paralleism I've really used myself. There are some recent AMD related commits that are not yet in releases, so I'd recommend building both from source. It should be enough to to run pip3 install -e . inside the cloned repositories. You should also install AITER or flash_attn to speed up attention. Instructions for AITER are here (https://github.com/ROCm/aiter?tab=readme-ov-file#installation).

Is the above image ran with the run_amdgpu_1x8.sh included in this PR script? If not, can you post the command you used? :)

Aman-Dwivedi · 2025-10-16T20:13:15Z

@avjves
I tried checking out and building both xDiT and yunchang from source along with AITER. I am still facing some errors (pasted below). I also found a doc for installing yunchang on AMD GPUs (https://github.com/feifeibear/long-context-attention/blob/main/docs/install_amd.md). Initially, I tried installing yunchange directly using pip install . but it resulted in the below error. Following the doc resulted in some other errors and I was not able to install using that method. Is the doc still updated with the newer version of yunchang or directly installing yunchang on AMD gpus would work?

For generating the above image I used the below command:
python -m torch.distributed.run --nproc_per_node=1 examples/pixartalpha_example.py --model PixArt-alpha/PixArt-XL-2-1024-MS --height 512 --width 512 --prompt "a cute dog" --num_inference_steps 10 --guidance_scale 4.5

Error:
(xdit) yangzhou@chi-mi300x-041:~/aman/xDiT$ torchrun --nproc_per_node=8
examples/pixartalpha_example.py
--model models/PixArt-XL-2-1024-MS
--pipefusion_parallel_degree 2
--ulysses_degree 2
--num_inference_steps 20
--warmup_steps 0
--prompt "A cute dog"
--use_cfg_parallel
W1016 20:03:19.944000 1048711 site-packages/torch/distributed/run.py:803]
W1016 20:03:19.944000 1048711 site-packages/torch/distributed/run.py:803] *****************************************
W1016 20:03:19.944000 1048711 site-packages/torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1016 20:03:19.944000 1048711 site-packages/torch/distributed/run.py:803] *****************************************
[aiter] import [module_aiter_enum] under /home/yangzhou/aman/aiter/aiter/jit/module_aiter_enum.so
[aiter] import [module_aiter_enum] under /home/yangzhou/aman/aiter/aiter/jit/module_aiter_enum.so
INFO 10-16 20:03:54 [envs.py:196] Using AITER as the attention library
[aiter] import [module_aiter_enum] under /home/yangzhou/aman/aiter/aiter/jit/module_aiter_enum.so
[aiter] import [module_aiter_enum] under /home/yangzhou/aman/aiter/aiter/jit/module_aiter_enum.so
[aiter] import [module_aiter_enum] under /home/yangzhou/aman/aiter/aiter/jit/module_aiter_enum.so
[aiter] import [module_aiter_enum] under /home/yangzhou/aman/aiter/aiter/jit/module_aiter_enum.so
[aiter] import [module_aiter_enum] under /home/yangzhou/aman/aiter/aiter/jit/module_aiter_enum.so
[aiter] import [module_aiter_enum] under /home/yangzhou/aman/aiter/aiter/jit/module_aiter_enum.so
INFO 10-16 20:03:54 [envs.py:196] Using AITER as the attention library
INFO 10-16 20:03:55 [envs.py:196] Using AITER as the attention library
INFO 10-16 20:03:55 [envs.py:196] Using AITER as the attention library
WARNING 10-16 20:03:55 [args.py:377] Distributed environment is not initialized. Initializing...
Traceback (most recent call last):
File "/home/yangzhou/aman/xDiT/examples/pixartalpha_example.py", line 83, in
main()
File "/home/yangzhou/aman/xDiT/examples/pixartalpha_example.py", line 21, in main
engine_config, input_config = engine_args.create_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yangzhou/aman/xDiT/xfuser/config/args.py", line 380, in create_config
init_distributed_environment()
File "/home/yangzhou/aman/xDiT/xfuser/core/distributed/parallel_state.py", line 221, in init_distributed_environment
backend = envs.get_torch_distributed_backend()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yangzhou/aman/xDiT/xfuser/envs.py", line 133, in get_torch_distributed_backend
raise NotImplementedError(
NotImplementedError: No Accelerators(AMD/NV/MTT GPU, AMD MI instinct accelerators) available

avjves · 2025-10-17T09:11:34Z

@Aman-Dwivedi Aha, it seems the PR #566 that was just merged broke support for AMD devices accidentally. I have a PR open to fix that #577 , but you can cherry pick the changes from there if you want to test it prior it being merged. After that as long as you have a working pytorch environment, you should be good. EDIT: PR already merged :)

I haven't really tested pipeline parallelism myself and running your above comand with the fixes still runs into an error, though I don't believe that to be AMD specific. Pure sequence parallelism with cfg works at least OOB:

torchrun --nproc_per_node=8 examples/pixartalpha_example.py --model PixArt-alpha/PixArt-XL-2-1024-MS  --ulysses_degree 4 --num_inference_steps 20 --warmup_steps 0 --prompt "A cute dog" --use_cfg_parallel

pixart_alpha_result_dp1_cfg2_ulysses4_ringNone_pp1_patchNone_tc_False_0

jcaraban · 2025-10-29T08:24:40Z

Closing because this PR lost focus and doesn't seem to fix what it claims.
xDiT indeed runs OOB with ROCm devices, as long as PyTorch is installed correctly

We can extend the README to make this a more clear. We could also add a docker with ready ROCm environment.
@Aman-Dwivedi please let me know if you still face specific issues.

avjves · 2025-10-29T09:36:25Z

@Aman-Dwivedi

Here's a small Dockerfile to run PixArt:

FROM ubuntu
WORKDIR /app
RUN apt update && apt install python3-pip git -y && pip3 install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm7.0 --break-system-packages
RUN git clone https://github.com/feifeibear/long-context-attention.git && cd long-context-attention && pip install -e . --break-system-packages
RUN git clone https://github.com/xdit-project/xDiT.git && cd xDiT && pip install -e . --break-system-packages
CMD cd xDiT && torchrun --nproc_per_node=8 examples/pixartalpha_example.py --model PixArt-alpha/PixArt-XL-2-1024-MS  --ulysses_degree 4 --num_inference_steps 20 --warmup_steps 0 --prompt "A cute dog" --use_cfg_parallel

Build that and run it:

docker run --ipc host --device /dev/dri --device /dev/kfd --privileged --shm-size 128G -v $PWD/results:/app/xDiT/results <built_image_tag>

after it's done the picture should now be in results folder :)

Aman-Dwivedi · 2025-10-29T16:59:27Z

@avjves Thanks for sharing this. I was able to run xDiT across multiple nodes and within a node. Thankyou so much for your help. I have tried it out with AITER and it works. I haven't tried with flash attention. I agree with @jcaraban about extending the README. Since, pipeline parallelism does not work the README can be updated where the demo command does not have pipefusion_parallel_degree. Once again, thankyou so much for all your help!
Also, can you close issue #437. That was my initial motivator to build AMD support, but clearly it is already added

yiakwy-xpu-ml-framework-team and others added 7 commits August 26, 2024 08:53

add amd gpu support

6236e66

add rocm requirements

2b852c3

fix rocm version

3097c90

flash_attn fix for pytorch upgrade

39ef892

Merge remote-tracking branch 'upstream/main' into rocm-fix

0058bb0

Merge remote-tracking branch 'upstream/main' into rocm-fix

e377e08

updated diffusers version

62349aa

feifeibear approved these changes Sep 23, 2025

View reviewed changes

jcaraban closed this Oct 29, 2025

Rocm fix #568

Rocm fix #568

Uh oh!

Conversation

Aman-Dwivedi commented Sep 18, 2025

Uh oh!

avjves commented Sep 22, 2025

Uh oh!

feifeibear left a comment

Choose a reason for hiding this comment

Uh oh!

feifeibear commented Sep 23, 2025

Uh oh!

avjves commented Sep 23, 2025

Uh oh!

eppaneamd commented Sep 23, 2025

Uh oh!

Aman-Dwivedi commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avjves commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aman-Dwivedi commented Oct 16, 2025

Uh oh!

avjves commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcaraban commented Oct 29, 2025

Uh oh!

avjves commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aman-Dwivedi commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Aman-Dwivedi commented Oct 12, 2025 •

edited

Loading

avjves commented Oct 15, 2025 •

edited

Loading

avjves commented Oct 17, 2025 •

edited

Loading

avjves commented Oct 29, 2025 •

edited

Loading

Aman-Dwivedi commented Oct 29, 2025 •

edited

Loading