Skip to content

Conversation

@YangWang92
Copy link
Contributor

Add AMD performance tuning documentation including:

  1. vLLM patch for sleep mode on AMD GPUs
  2. Workarounds for CUDA graph capture issues in ROCm

@CLAassistant
Copy link

CLAassistant commented Apr 24, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@yushengsu-thu yushengsu-thu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-haibin-lin and @PeterSH6
Could you please check the PR? It looks good to me.
@YangWang92 Added a tutorial to guide AMD users on how to enable the newer vLLM (v0.8) and resolve the sleep mode issue within it making VeRL training more efficient. This PR addresses the question you raised in my previous PR.

Copy link
Collaborator

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Is the vllm PR 12695 expected to be merged in the future?

@YangWang92
Copy link
Contributor Author

Thanks for the PR. Is the vllm PR 12695 expected to be merged in the future?

Yes, they will: vllm-project/vllm#12695 (comment) .They will merge the pull with ROCm 6.4 released. And I have remindered in the document: you can first use community patched code (from this pull request) to build vLLM from the source code in the corresponding pull request. After the patch merged in vLLM main branch, you can directly install vLLM from the latest version..

@eric-haibin-lin eric-haibin-lin merged commit 5bd1ce3 into volcengine:main Apr 24, 2025
3 checks passed
@YangWang92 YangWang92 deleted the wy/amd-patch branch April 25, 2025 02:01
ScottCTD pushed a commit to ScottCTD/verl that referenced this pull request May 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants