-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Core][AMD] Migrate fully transparent sleep mode to ROCm platform #12695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
|
Hi @youkaichao! Thank you for your amazing work in #11743. I've checked all the APIs and found that all the CuMem calls used here actually have equivalent Hip versions, so I tried to migrate the feature into ROCm as well. I almost succeeded with all the code compiles, but we had a strange OOM issue with For easier reproduction, I modified the demo and pushed it here: https://github.com/HollowMan6/vllm_allocator_adaptor The error log when running |
15e1489 to
261ee50
Compare
|
Hi @HollowMan6, just a quick feedback data point as I was testing this PR. For V0 inference seems to work and the " from vllm.device_allocator.cumem import CuMemAllocator" import error I was facing on main is fixed but for V1 I get this error: Not sure if you're aware of this, so I thought I quickly leave comment. Let me know if you need more info to repro. EDIT: |
this usually means error happens for when I implement the PR, the most difficult part is to make sure the call to cuda driver API succeeds. I don't know if simply replacing some functions works to migrate these API calls. We need rocm experts to confirm the API usage, as they are quite low-level (and error-prone). |
Hi @mreso! Thank you for your feedback! Are you sure it's working for the sleep/wake without the strange OOM error on AMD with this PR? It looks like you failed to link to the ROCm library, so the sleep/wake shouldn't work here. Remove
Oh really? it sounds like they have some failed logic in checking whether you are using NVIDIA or AMD, as in the original codebase, CuMemAllocator is disabled for AMD GPUs. I can't produce this on my side, though, so maybe you want to file a separate issue about this. Regarding the V1 one, neither this PR nor #11743 made any modification to this, so it should be a separate issue, too. |
|
Okay, it turns out to be a hardware issue on my side, I'm using MI250x, and it looks like it doesn't support virtual memory management, as is confirmed by https://github.com/ROCm/hip-tests/blob/84a460d96bafb00615969304cc5eaddc3b20bc3d/catch/unit/memory/hip_vmm_common.hh#L27-L37 Maybe this can work on some advanced AMD hardware, such as MI300+? Unfortunately, I don't have access to hardware such as MI300+, so I can't help with this PR any further. I will set this PR as ready as directly mapping those CUDA calls seems to be OK, we also have some test cases here for reference: https://github.com/ROCm/hip-tests/blob/84a460d96bafb00615969304cc5eaddc3b20bc3d/catch/unit/virtualMemoryManagement/hipMemSetGetAccess.cc#L213-L256 Feel free to take this PR over @youkaichao or anyone who is interested and has those GPUs. I have set this PR to allow edits and access to secrets by maintainers |
|
@hongxiayang are you able to help with this? |
Thanks for your contribution. @HollowMan6 @DarkLight1337 I will check this when I have more bandwidth. |
|
Thanks @HollowMan6 you're right, that will be an env issue and I was not specifically testing sleep/wake explicitly so it will most likely crash if I did.
Interesting that this does not work in my case. This line is causing an exception on my end and I assume if cuda is not present its supposed to swallow the ModuleNotFoundError and disable the CuMemAllocator that way, right? But in my case an AssertionError is thrown here which is not caught. If I raise a ModuleNotFoundError error instead its working. Is there another mechanism I am missing that should disable CuMemAllocator? |
What hardware are you using? If it's MI300 and above, then it's likely that it supports virtual memory management and the sleep/wake mode will not crash in that OOM way, you are welcome to test this out and let us know what you found out!
In main, |
Thats it, thanks a lot! Was an unclean build env. There is no trace of CUDA on the machines but it must have created the file anyways and so the ModuleNotFoundError was not raised... After removing all files and rebuilding it works. I am on MI300s and will give it a try. |
|
Gave it a quick try using this test script: but seems like sleep did not free any memory and after wakeup I got an OOM error: This is my hw: Let me know if the test script is incorrect for this or I can try anything else. |
|
Thank you @mreso! It looks like MI300 supports virtual memory management, but it gives an OOM error at a different line and then ends up with a segment fault, maybe Hip APIs should be called in a different way than CUDA. Unfortunately, I can't help with this PR any further as I don't have MI300 in my hand, but thank you anyway, @mreso! |
I can also confirm that the current solution does not work on mi300 (on v0.7.2). |
|
By the way, feel free to tell me what I can help with on this pull. |
f0cc987 to
d1948f1
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
@HollowMan6 @hongxiayang I can't seem to compile due to missing linker This compilation error is resolved through:
HIP-TESTs of VirtualMemoryManagementI am using the hip and rocm in the vLLM docker image to build the test I am getting two failure when running VirtualMemoryManagement tests HIP-TEST setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request migrates the fully transparent sleep mode to the ROCm platform, enabling it for AMD GPUs. The changes are extensive, touching CMake build files, C++ allocator code, Python wrappers, and configuration files to support ROCm alongside CUDA. The core logic introduces chunked memory allocation for ROCm as a workaround for platform limitations. While the overall approach is sound, I've identified critical resource leak issues in the error handling paths of the new C++ code for ROCm. Specifically, if memory creation or mapping fails, previously allocated resources are not cleaned up, which will lead to GPU memory leaks. I've provided suggestions to fix these issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Hollow Man <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully migrates the fully transparent sleep mode to the ROCm platform, which is a significant feature enhancement. The changes span across CMake build scripts, C++ extensions, and Python code, and are generally well-implemented. However, I've identified a few critical issues in the C++ memory allocator (csrc/cumem_allocator.cpp) concerning resource leaks and incorrect memory management API usage. These issues could lead to memory corruption or crashes and should be addressed before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Hollow Man <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully migrates the fully transparent sleep mode to the ROCm platform, which is a significant feature enhancement. The implementation is well-structured, utilizing a compatibility header for CUDA/HIP APIs and conditional compilation for platform-specific logic. The changes across CMake, Python, and C++ are consistent and well-executed. However, I've identified a critical issue in the C++ extension concerning the lack of validation for Python objects passed from the Python layer. This could lead to process crashes if malformed data is provided. I've included a detailed comment with a code suggestion to address this vulnerability. Overall, this is a great contribution, and with the suggested fix, it will be a robust addition to the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Hollow Man <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully migrates the fully transparent sleep mode to the ROCm platform for AMD GPUs, which is a significant feature enhancement. The changes involve conditional compilation for CUDA and ROCm, a new compatibility header for HIP APIs, and modifications to the build system. The core logic for memory management has been adapted to handle memory in chunks for ROCm, including robust error handling in most places. However, I've identified a few critical areas in the C++ extension where error handling for Python C-API calls is missing, which could lead to process crashes. Addressing these will make the implementation much more robust.
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
|
I addressed all the potential memory leak issues above with the help of LLMs reviews above, now both sleep level 2 and level 1 looks Okay when testing. Increasing Test with default Test with I think this PR is now ready for merge, please let me know if you have more concerns here @tjtanaa @kliuae @gshtras @mgoin |
Signed-off-by: Hollow Man <[email protected]>
@HollowMan6 Yea, so we hope upcoming ROCm can resolve the VMM API can allocate memory without chunking, like on NVIDIA. Theoretically, that should be the fastest. |
Signed-off-by: Hollow Man <[email protected]>
|
@HollowMan6 do you mind if we help update the documentation on your branch directly? |
That's OKay, feel free to do so, not only related to the documentation, but also code if you feel necessary. |
Co-authored-by: kliuae <[email protected]> Signed-off-by: tjtanaa <[email protected]>
|
Documentation preview: https://vllm--12695.org.readthedocs.build/en/12695/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mgoin @youkaichao PTAL
Signed-off-by: Hollow Man <[email protected]>
FIX #10714 for AMD GPUs
Related to #11743