[Kernel] Refactor Cutlass c3x #10049

varun-sundar-rabindranath · 2024-11-05T20:22:28Z

Refactor Cutlass c3x kernels for better maintainability and easier experimentation.

Break scaled_mm_c3x.cu into,
- scaled_mm_c3x.cuh : All the base cutlass c3x code (cutlass_3x_gemm and cutlass_gemm_caller).
- scaled_mm_c3x_sm90_fp8_dispatch.cuh : All fp8 kernels along with the gemm shape based dispatch function.
- scaled_mm_c3x_sm90_int8_dispatch.cuh : All int8 kernels along with the gemm shape based dispatch function.
- scaled_mm_c3x.cu : interfaces expected by scaled_mm_entry.cu

github-actions · 2024-11-05T20:22:41Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

varun-sundar-rabindranath · 2024-11-12T03:29:34Z

@tlrmchlsmth @ProExpertProg @LucasWilkinson PTAL. Thanks!

LucasWilkinson · 2024-11-12T21:20:30Z

LGTM (just FYI may conflict with #9855)

tlrmchlsmth

LGTM

csrc/quantization/cutlass_w8a8/scaled_mm_c3x.cuh

mergify · 2024-11-18T20:01:01Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @varun-sundar-rabindranath.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

varun-sundar-rabindranath · 2024-12-16T18:22:55Z

@tlrmchlsmth re-requesting review as the PR is now rebased.

csrc/quantization/cutlass_w8a8/scaled_mm_c3x.cu

tlrmchlsmth

Looks good! (Likely wait for #10995)

tlrmchlsmth · 2024-12-16T19:08:48Z

csrc/quantization/cutlass_w8a8/scaled_mm_c3x.cu

Actually do you need to turn off clang-format here? The reason for turning it off is that CUTLASS headers need to be included in a specific order but it looks like that's not the case following the refactor

Using clang-format in this block, turns,

#include <cudaTypedefs.h> #if defined CUDA_VERSION && CUDA_VERSION >= 12000 #include "scaled_mm_c3x_sm90_fp8_dispatch.cuh" #include "scaled_mm_c3x_sm90_int8_dispatch.cuh" #include "cutlass_extensions/epilogue/scaled_mm_epilogues_c3x.hpp" using namespace vllm;

into

#include <cudaTypedefs.h> #if defined CUDA_VERSION && CUDA_VERSION >= 12000 #include "scaled_mm_c3x_sm90_fp8_dispatch.cuh" #include "scaled_mm_c3x_sm90_int8_dispatch.cuh" #include "cutlass_extensions/epilogue/scaled_mm_epilogues_c3x.hpp" using namespace vllm;

the #if seems trigger inconsistent indenting. I switched off clang-format in this block to avoid that.

I moved the original clang-format toggle to scaled_mm_c3x.cuh. https://github.com/vllm-project/vllm/blob/dad8e4760f9d24faadfc076a13ed7d130211ea2d/csrc/quantization/cutlass_w8a8/scaled_mm_c3x.cuh#L3

Had second thoughts about this and removed the clang-format block. It is probably better to stick to the convention.

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

varun-sundar-rabindranath marked this pull request as draft November 5, 2024 20:35

varun-sundar-rabindranath marked this pull request as ready for review November 12, 2024 03:27

tlrmchlsmth approved these changes Nov 12, 2024

View reviewed changes

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 12, 2024

ProExpertProg reviewed Nov 12, 2024

View reviewed changes

csrc/quantization/cutlass_w8a8/scaled_mm_c3x.cuh Outdated Show resolved Hide resolved

varun-sundar-rabindranath force-pushed the varun/cutlass-c3x-refactor branch 2 times, most recently from 7dbe3b3 to 4f44aac Compare November 15, 2024 14:35

mergify bot added the needs-rebase label Nov 18, 2024

varun-sundar-rabindranath force-pushed the varun/cutlass-c3x-refactor branch from 4f44aac to 16879db Compare December 16, 2024 18:03

mergify bot removed the needs-rebase label Dec 16, 2024

varun-sundar-rabindranath requested a review from tlrmchlsmth December 16, 2024 18:22

tlrmchlsmth reviewed Dec 16, 2024

View reviewed changes

csrc/quantization/cutlass_w8a8/scaled_mm_c3x.cu Outdated Show resolved Hide resolved

tlrmchlsmth reviewed Dec 16, 2024

View reviewed changes

varun-sundar-rabindranath force-pushed the varun/cutlass-c3x-refactor branch 2 times, most recently from 55d9927 to e5f324b Compare December 18, 2024 16:13

Varun Sundar Rabindranath added 2 commits December 19, 2024 03:28

Refactor cutlass-c3x

4068c8e

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

fix common header

e033b41

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

varun-sundar-rabindranath force-pushed the varun/cutlass-c3x-refactor branch from e5f324b to e033b41 Compare December 19, 2024 03:29

tlrmchlsmth enabled auto-merge (squash) December 19, 2024 03:31

tlrmchlsmth merged commit 8936316 into vllm-project:main Dec 19, 2024
75 checks passed

mzusman pushed a commit to mzusman/vllm that referenced this pull request Mar 12, 2025

[Kernel] Refactor Cutlass c3x (vllm-project#10049)

bb789f5

Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

Uh oh!

[Kernel] Refactor Cutlass c3x #10049

[Kernel] Refactor Cutlass c3x #10049

Uh oh!

Conversation

varun-sundar-rabindranath commented Nov 5, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 5, 2024

Uh oh!

varun-sundar-rabindranath commented Nov 12, 2024

Uh oh!

LucasWilkinson commented Nov 12, 2024

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Nov 18, 2024

Uh oh!

varun-sundar-rabindranath commented Dec 16, 2024

Uh oh!

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath Dec 17, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

varun-sundar-rabindranath commented Nov 5, 2024 •

edited by github-actions bot

Loading