-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Kernel] Refactor Cutlass c3x #10049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Refactor Cutlass c3x #10049
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
|
@tlrmchlsmth @ProExpertProg @LucasWilkinson PTAL. Thanks! |
|
LGTM (just FYI may conflict with #9855) |
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
7dbe3b3 to
4f44aac
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
4f44aac to
16879db
Compare
|
@tlrmchlsmth re-requesting review as the PR is now rebased. |
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! (Likely wait for #10995)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually do you need to turn off clang-format here? The reason for turning it off is that CUTLASS headers need to be included in a specific order but it looks like that's not the case following the refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using clang-format in this block, turns,
#include <cudaTypedefs.h>
#if defined CUDA_VERSION && CUDA_VERSION >= 12000
#include "scaled_mm_c3x_sm90_fp8_dispatch.cuh"
#include "scaled_mm_c3x_sm90_int8_dispatch.cuh"
#include "cutlass_extensions/epilogue/scaled_mm_epilogues_c3x.hpp"
using namespace vllm;
into
#include <cudaTypedefs.h>
#if defined CUDA_VERSION && CUDA_VERSION >= 12000
#include "scaled_mm_c3x_sm90_fp8_dispatch.cuh"
#include "scaled_mm_c3x_sm90_int8_dispatch.cuh"
#include "cutlass_extensions/epilogue/scaled_mm_epilogues_c3x.hpp"
using namespace vllm;
the #if seems trigger inconsistent indenting. I switched off clang-format in this block to avoid that.
I moved the original clang-format toggle to scaled_mm_c3x.cuh. https://github.com/vllm-project/vllm/blob/dad8e4760f9d24faadfc076a13ed7d130211ea2d/csrc/quantization/cutlass_w8a8/scaled_mm_c3x.cuh#L3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had second thoughts about this and removed the clang-format block. It is probably better to stick to the convention.
55d9927 to
e5f324b
Compare
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
e5f324b to
e033b41
Compare
Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Refactor Cutlass c3x kernels for better maintainability and easier experimentation.