-
Notifications
You must be signed in to change notification settings - Fork 291
Rocm fix #568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rocm fix #568
Conversation
|
Hi! This makes a custom requirements file for ROCm specifically - is there a reason for it? Also, that yunchang branch / version it installs is a year old with some changes. Yunchang already supports AMD GPUs in the upstream repo via flash_attn or AITER (the latest way to call FA with AMD GPUs). This seems like a regression. Also, this breaks the changes made by PR #559 due to the duplicate imports in xfuser/core/long_ctx_attention/ring/ring_flash_attn.py. Currently it's gated like this:
|
feifeibear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@Aman-Dwivedi could you please check the duplicate imports problem mentioned before? |
|
This line in the requirements: is also a big problem in general for AMD GPUs. Could it also be removed? 😄 |
|
@feifeibear kindly note that this PR should be revisited and its merits re-evaluated. @Aman-Dwivedi could you elaborate why this PR is needed, how xDiT and yunchang is not working for AMD GPUs currently? Why gfx942 is the only allowed gpu arch? Have you tried newer images than |
|
@avjves @eppaneamd This PR is in response to #437. I cherrypicked the commit mention in the issue and made some changes to run it smoothly on the amd cluster. I was able to run a example. I tried running the main branch since then. After facing a lot of hurdles I am able to run it now but I don't think it is running properly. It produced a gibberish image, attached below. I tried following this readme (https://github.com/feifeibear/long-context-attention/blob/main/docs/install_amd.md) for installing yunchang but it resulted in errors. Is there another updated readme on making xdit run on amd cluster? Is the image generated due to some error on my part or is there an issue with xdit? |
|
@Aman-Dwivedi Is the above image ran with the |
|
@avjves For generating the above image I used the below command: Error: |
|
@Aman-Dwivedi Aha, it seems the PR #566 that was just merged broke support for AMD devices accidentally. I have a PR open to fix that #577 , but you can cherry pick the changes from there if you want to test it prior it being merged. After that as long as you have a working pytorch environment, you should be good. EDIT: PR already merged :) I haven't really tested pipeline parallelism myself and running your above comand with the fixes still runs into an error, though I don't believe that to be AMD specific. Pure sequence parallelism with cfg works at least OOB:
|
|
Closing because this PR lost focus and doesn't seem to fix what it claims.
We can extend the README to make this a more clear. We could also add a docker with ready ROCm environment. |
|
Here's a small Dockerfile to run PixArt: Build that and run it: after it's done the picture should now be in |
|
@avjves Thanks for sharing this. I was able to run xDiT across multiple nodes and within a node. Thankyou so much for your help. I have tried it out with AITER and it works. I haven't tried with flash attention. I agree with @jcaraban about extending the README. Since, pipeline parallelism does not work the README can be updated where the demo command does not have pipefusion_parallel_degree. Once again, thankyou so much for all your help! |


Added amd gpu support.
Updated requirements for rocm and added functions in setup.py to detect amd gpus. An example script has also been added by yiakwy-xpu-ml-framework-team.