-
-
Notifications
You must be signed in to change notification settings - Fork 12k
[NVIDIA] Enable Thor and Spark with CUDA 13 #23469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: johnnynunez <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to enable support for CUDA 13, which involves updating CMake build configurations and CUDA kernel code. While most changes correctly adapt to the new CUDA version by replacing deprecated CUB APIs, I've identified two critical issues. There is a logical error in CMakeLists.txt with a duplicated if/elseif condition, making a code path unreachable. Additionally, there is a syntax error in csrc/quantization/fp8/common.cu due to misplaced parentheses that will cause a compilation failure. Addressing these issues is essential for the correctness and functionality of the build.
Signed-off-by: johnnynunez <[email protected]>
|
Looks like it failed to build the docker https://buildkite.com/vllm/fastcheck/builds/37234/steps/canvas?jid=0198d8e8-fcf7-438a-a2cc-1a4afa09eb24#0198d8e8-fcf7-438a-a2cc-1a4afa09eb24/127-5758 |
oh shit! tag not exists yet |
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: zjy0516 <[email protected]> Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: Benji Beck <[email protected]> Signed-off-by: johnnynunez <[email protected]>
…llm-project#23477) Signed-off-by: 22quinn <[email protected]> Signed-off-by: youkaichao <[email protected]> Co-authored-by: Eric Marcus <[email protected]> Co-authored-by: youkaichao <[email protected]> Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: Benji Beck <[email protected]> Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: czhu-cohere <[email protected]> Signed-off-by: johnnynunez <[email protected]>
…ing (vllm-project#23305) Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: teekenl <[email protected]> Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: 汪志鹏 <[email protected]> Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: johnnynunez <[email protected]>
|
Concretely, which SM versions are we adding? Seems like 10.3a and 11.0a? |
missing in main vllm: Cuda 13 $ nvcc --list-gpu-arch
compute_75
compute_80
compute_86
compute_87
compute_88
compute_89
compute_90
compute_100
compute_110
compute_103
compute_120
compute_121you can appreciate that Thor 10.1 (nvgpu) it was changed to 11.0 (OpenRM) |
|
I don't think we should be building 11.0 Thor by default as I don't know a vLLM use case there. I can see 12.1 Spark being used with vLLM, but this is again blowing up our wheel size. We will probably pursue a path where popular data center GPUs |
we can remove by default or try this with only cuda 13 that is working well: |
|
I’m surprised these really interesting changes weren’t accepted. My apologies, @johnnynunez , I didn’t realize you’d already suggested a lot of them. |
no worries, we are here to help. I suggest on the recent PR moves to blackwell family. But i don't know how to mantain support with <12.9 without do a lot of "ifs" I have vllm with cuda13.0 and cutlass 4.2.0 but i suggest the changes for general public |
|
Would you mind if I incorporated your changes into the CMakeLists.txt file, along with some new modifications? I think the best approach would be to define a transformation function – someone suggested placing it in cuda_compat.h – and then update the cub:: calls as needed. |
feel free! use it: also this: #24673 |
This fix some erros in cuda 13 compilation and enable Thor and Spark.
Cutlass v4.2.0 enable Thor and Spark support