-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[ROCm][Quantization][Kernel] Using HIP FP8 header #12593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tom one. The header exists since ROCm6.2 Signed-off-by: Gregory Shtrasberg <[email protected]>
Signed-off-by: Gregory Shtrasberg <[email protected]>
Signed-off-by: Gregory Shtrasberg <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
|
This looks like a good change to me. We love to see lots of deleted code :). Thanks for the contribution. |
CMakeLists.txt
Outdated
| # | ||
| if(VLLM_GPU_LANG STREQUAL "HIP") | ||
| # | ||
| # Setting up debug flags for pleasant debug experience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: typo, also is there a reason -ggdb3 is used specifically? And I thought -O0 was there by default already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ggdb3 produces more debug information, it doesn't hurt to have it all
By default the current build will add -O, so appending this will turn it to "-O -O0" with the latter overriding the former
CMakeLists.txt
Outdated
|
|
||
|
|
||
| # | ||
| # Suppress the noisy warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could you add a more detailed comment along the lines of "Both CUDA and HIP device calls return an error code but they are marked [[noreturn]] only in HIP. This warning can be reenabled once we handle error codes from all device calls".
| */ | ||
|
|
||
| // fp8 -> half |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these were moved? I don't feel strongly either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversions are now better grouped by type than before, also certain things may be reordered due to internal dependencies
Signed-off-by: Gregory Shtrasberg <[email protected]>
ProExpertProg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments!
Head branch was pushed to by a user without write access
|
I am rerunning the CI, not sure why it blew up. |
|
PR needs to be merged with main to pass CI (apologies, we had some issues with our S3 buckets) |
|
Retrying last test for merge |
|
The failure is unrelated to this PR, merging. |
…ined Related to vllm-project#12593 In older versions of "hip/hip_fp8.h" header (before ROCm/clr@353f15a), both `HIP_FP8_TYPE_OCP` and `HIP_FP8_TYPE_FNUZ` are not defined, and the code implementation is in FNUZ, so it's better to write it in this way, such that we don't check if `HIP_FP8_TYPE_FNUZ` is there or not, and we just use FP8 FNUZ when `HIP_FP8_TYPE_OCP` is 0 or undefined. Signed-off-by: Hollow Man <[email protected]>
Signed-off-by: Louis Ulmer <[email protected]>
Using the official hip fp8 header supplied with ROCm instead of a custom one. The header exists since ROCm6.2
It abstracts away the usage of either hardware instructions, or software emulation for fp8 conversions, as well as both NANOO and OCP types