Skip to content

Conversation

@johnnynunez
Copy link
Contributor

@johnnynunez johnnynunez commented Aug 23, 2025

This fix some erros in cuda 13 compilation and enable Thor and Spark.
Cutlass v4.2.0 enable Thor and Spark support

Signed-off-by: johnnynunez <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to enable support for CUDA 13, which involves updating CMake build configurations and CUDA kernel code. While most changes correctly adapt to the new CUDA version by replacing deprecated CUB APIs, I've identified two critical issues. There is a logical error in CMakeLists.txt with a duplicated if/elseif condition, making a code path unreachable. Additionally, there is a syntax error in csrc/quantization/fp8/common.cu due to misplaced parentheses that will cause a compilation failure. Addressing these issues is essential for the correctness and functionality of the build.

Signed-off-by: johnnynunez <[email protected]>
@mgoin
Copy link
Member

mgoin commented Aug 23, 2025

@johnnynunez
Copy link
Contributor Author

Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
ZJY0516 and others added 9 commits August 24, 2025 20:19
…llm-project#23477)

Signed-off-by: 22quinn <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Co-authored-by: Eric Marcus <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: czhu-cohere <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: 汪志鹏 <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
@mergify mergify bot added documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) new-model Requests to new models performance Performance-related issues v1 labels Aug 24, 2025
@simon-mo
Copy link
Collaborator

Concretely, which SM versions are we adding? Seems like 10.3a and 11.0a?

@johnnynunez
Copy link
Contributor Author

johnnynunez commented Aug 24, 2025

Concretely, which SM versions are we adding? Seems like 10.3a and 11.0a?

missing in main vllm:
10.3 GB300
11.0 Thor
12.1 Spark

Cuda 13

$ nvcc --list-gpu-arch
compute_75
compute_80
compute_86
compute_87
compute_88
compute_89
compute_90
compute_100
compute_110
compute_103
compute_120
compute_121

you can appreciate that Thor 10.1 (nvgpu) it was changed to 11.0 (OpenRM)

@simon-mo
Copy link
Collaborator

I don't think we should be building 11.0 Thor by default as I don't know a vLLM use case there. I can see 12.1 Spark being used with vLLM, but this is again blowing up our wheel size.

We will probably pursue a path where popular data center GPUs 8.0, 8.9, 9.0, 10.0, 10.3, 12.0 is built by default and distributed on PyPI while others are available only in wheels.vllm.ai

@johnnynunez
Copy link
Contributor Author

I don't think we should be building 11.0 Thor by default as I don't know a vLLM use case there. I can see 12.1 Spark being used with vLLM, but this is again blowing up our wheel size.

We will probably pursue a path where popular data center GPUs 8.0, 8.9, 9.0, 10.0, 10.3, 12.0 is built by default and distributed on PyPI while others are available only in wheels.vllm.ai

we can remove by default or try this with only cuda 13 that is working well:
https://github.com/pytorch/pytorch/pulls?q=is%3Apr+size+is%3Aclosed

@DrStone1971
Copy link
Contributor

I’m surprised these really interesting changes weren’t accepted. My apologies, @johnnynunez , I didn’t realize you’d already suggested a lot of them.

@johnnynunez
Copy link
Contributor Author

johnnynunez commented Sep 13, 2025

I’m surprised these really interesting changes weren’t accepted. My apologies, @johnnynunez , I didn’t realize you’d already suggested a lot of them.

no worries, we are here to help. I suggest on the recent PR moves to blackwell family. But i don't know how to mantain support with <12.9 without do a lot of "ifs"

I have vllm with cuda13.0 and cutlass 4.2.0 but i suggest the changes for general public

@DrStone1971
Copy link
Contributor

DrStone1971 commented Sep 13, 2025

Would you mind if I incorporated your changes into the CMakeLists.txt file, along with some new modifications? I think the best approach would be to define a transformation function – someone suggested placing it in cuda_compat.h – and then update the cub:: calls as needed.

@johnnynunez
Copy link
Contributor Author

Would you mind if I incorporated your changes into the CMakeLists.txt file, along with some new modifications? I think the best approach would be to define a transformation function – someone suggested placing it in cuda_compat.h – and then update the cub:: calls as needed.

feel free! use it: also this: #24673

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) new-model Requests to new models performance Performance-related issues v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.