[NVIDIA] Enable Thor and Spark with CUDA 13 #23469

johnnynunez · 2025-08-23T21:46:55Z

This fix some erros in cuda 13 compilation and enable Thor and Spark.
Cutlass v4.2.0 enable Thor and Spark support

Signed-off-by: johnnynunez <[email protected]>

gemini-code-assist

Code Review

This pull request aims to enable support for CUDA 13, which involves updating CMake build configurations and CUDA kernel code. While most changes correctly adapt to the new CUDA version by replacing deprecated CUB APIs, I've identified two critical issues. There is a logical error in CMakeLists.txt with a duplicated if/elseif condition, making a code path unreachable. Additionally, there is a syntax error in csrc/quantization/fp8/common.cu due to misplaced parentheses that will cause a compilation failure. Addressing these issues is essential for the correctness and functionality of the build.

CMakeLists.txt

csrc/quantization/fp8/common.cu

Signed-off-by: johnnynunez <[email protected]>

mgoin · 2025-08-23T23:31:28Z

Looks like it failed to build the docker https://buildkite.com/vllm/fastcheck/builds/37234/steps/canvas?jid=0198d8e8-fcf7-438a-a2cc-1a4afa09eb24#0198d8e8-fcf7-438a-a2cc-1a4afa09eb24/127-5758

johnnynunez · 2025-08-23T23:39:05Z

Looks like it failed to build the docker https://buildkite.com/vllm/fastcheck/builds/37234/steps/canvas?jid=0198d8e8-fcf7-438a-a2cc-1a4afa09eb24#0198d8e8-fcf7-438a-a2cc-1a4afa09eb24/127-5758

oh shit! tag not exists yet

Signed-off-by: johnnynunez <[email protected]>

Signed-off-by: zjy0516 <[email protected]> Signed-off-by: johnnynunez <[email protected]>

Signed-off-by: Benji Beck <[email protected]> Signed-off-by: johnnynunez <[email protected]>

…llm-project#23477) Signed-off-by: 22quinn <[email protected]> Signed-off-by: youkaichao <[email protected]> Co-authored-by: Eric Marcus <[email protected]> Co-authored-by: youkaichao <[email protected]> Signed-off-by: johnnynunez <[email protected]>

Signed-off-by: Benji Beck <[email protected]> Signed-off-by: johnnynunez <[email protected]>

Signed-off-by: czhu-cohere <[email protected]> Signed-off-by: johnnynunez <[email protected]>

…ing (vllm-project#23305) Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: johnnynunez <[email protected]>

Signed-off-by: teekenl <[email protected]> Signed-off-by: johnnynunez <[email protected]>

Signed-off-by: 汪志鹏 <[email protected]> Signed-off-by: johnnynunez <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: johnnynunez <[email protected]>

simon-mo · 2025-08-24T21:24:01Z

Concretely, which SM versions are we adding? Seems like 10.3a and 11.0a?

johnnynunez · 2025-08-24T21:26:14Z

Concretely, which SM versions are we adding? Seems like 10.3a and 11.0a?

missing in main vllm:
10.3 GB300
11.0 Thor
12.1 Spark

Cuda 13

$ nvcc --list-gpu-arch
compute_75
compute_80
compute_86
compute_87
compute_88
compute_89
compute_90
compute_100
compute_110
compute_103
compute_120
compute_121

you can appreciate that Thor 10.1 (nvgpu) it was changed to 11.0 (OpenRM)

simon-mo · 2025-08-24T22:13:50Z

I don't think we should be building 11.0 Thor by default as I don't know a vLLM use case there. I can see 12.1 Spark being used with vLLM, but this is again blowing up our wheel size.

We will probably pursue a path where popular data center GPUs 8.0, 8.9, 9.0, 10.0, 10.3, 12.0 is built by default and distributed on PyPI while others are available only in wheels.vllm.ai

johnnynunez · 2025-08-25T11:55:09Z

I don't think we should be building 11.0 Thor by default as I don't know a vLLM use case there. I can see 12.1 Spark being used with vLLM, but this is again blowing up our wheel size.

We will probably pursue a path where popular data center GPUs 8.0, 8.9, 9.0, 10.0, 10.3, 12.0 is built by default and distributed on PyPI while others are available only in wheels.vllm.ai

we can remove by default or try this with only cuda 13 that is working well:
https://github.com/pytorch/pytorch/pulls?q=is%3Apr+size+is%3Aclosed

DrStone1971 · 2025-09-13T14:43:18Z

I’m surprised these really interesting changes weren’t accepted. My apologies, @johnnynunez , I didn’t realize you’d already suggested a lot of them.

johnnynunez · 2025-09-13T14:45:26Z

I’m surprised these really interesting changes weren’t accepted. My apologies, @johnnynunez , I didn’t realize you’d already suggested a lot of them.

no worries, we are here to help. I suggest on the recent PR moves to blackwell family. But i don't know how to mantain support with <12.9 without do a lot of "ifs"

I have vllm with cuda13.0 and cutlass 4.2.0 but i suggest the changes for general public

DrStone1971 · 2025-09-13T15:27:49Z

Would you mind if I incorporated your changes into the CMakeLists.txt file, along with some new modifications? I think the best approach would be to define a transformation function – someone suggested placing it in cuda_compat.h – and then update the cub:: calls as needed.

johnnynunez · 2025-09-13T17:18:02Z

Would you mind if I incorporated your changes into the CMakeLists.txt file, along with some new modifications? I think the best approach would be to define a transformation function – someone suggested placing it in cuda_compat.h – and then update the cub:: calls as needed.

feel free! use it: also this: #24673

johnnynunez requested review from LucasWilkinson and tlrmchlsmth as code owners August 23, 2025 21:46

mergify bot added the ci/build label Aug 23, 2025

enable thor and spark

1467e68

Signed-off-by: johnnynunez <[email protected]>

johnnynunez force-pushed the main branch from 12ff7b6 to 1467e68 Compare August 23, 2025 21:47

gemini-code-assist bot reviewed Aug 23, 2025

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

csrc/quantization/fp8/common.cu Outdated Show resolved Hide resolved

enable thor and spark

7253822

Signed-off-by: johnnynunez <[email protected]>

johnnynunez force-pushed the main branch from 2ce449d to 7253822 Compare August 23, 2025 21:49

enable fallback if tag is not released yet

10558e1

Signed-off-by: johnnynunez <[email protected]>

johnnynunez force-pushed the main branch from 6afdba5 to 10558e1 Compare August 23, 2025 23:44

enable thor and spark

fa44997

Signed-off-by: johnnynunez <[email protected]>

johnnynunez force-pushed the main branch from 1774c02 to fa44997 Compare August 24, 2025 08:32

enable thor and spark

84cd00b

Signed-off-by: johnnynunez <[email protected]>

johnnynunez force-pushed the main branch from 1e6d9f4 to 84cd00b Compare August 24, 2025 08:56

ZJY0516 and others added 9 commits August 24, 2025 20:19

[gpt-oss] Streaming Output for Python Tool (vllm-project#23409)

da037dd

Signed-off-by: zjy0516 <[email protected]> Signed-off-by: johnnynunez <[email protected]>

Migrate Pixtral inputs to TensorSchema (vllm-project#23472)

06d42d0

Signed-off-by: Benji Beck <[email protected]> Signed-off-by: johnnynunez <[email protected]>

Migrate Paligemma inputs to TensorSchema (vllm-project#23470)

4000030

Signed-off-by: Benji Beck <[email protected]> Signed-off-by: johnnynunez <[email protected]>

[kernel] Support W4A8 on Hopper (vllm-project#23198)

c4e80b6

Signed-off-by: czhu-cohere <[email protected]> Signed-off-by: johnnynunez <[email protected]>

[Misc] update dict parse to EPLBConfig from json dumps to dict unpack…

0852652

…ing (vllm-project#23305) Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: johnnynunez <[email protected]>

(Misc): add missing test for zero truncation size. (vllm-project#23457)

61b6bc3

Signed-off-by: teekenl <[email protected]> Signed-off-by: johnnynunez <[email protected]>

[New Model]Donut model (vllm-project#23229)

bdf3c3d

Signed-off-by: 汪志鹏 <[email protected]> Signed-off-by: johnnynunez <[email protected]>

[Model] Enable BLOOM on V1 (vllm-project#23488)

fafd85f

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: johnnynunez <[email protected]>

johnnynunez force-pushed the main branch from c2bffa0 to fafd85f Compare August 24, 2025 18:19

johnnynunez requested review from DarkLight1337, hmellor, patrickvonplaten and ywang96 as code owners August 24, 2025 18:19

johnnynunez requested review from ProExpertProg, aarnphm, alexm-redhat, comaniac, houseroad, mgoin, njhill, robertgshaw2-redhat, simon-mo, youkaichao and zhuohan123 as code owners August 24, 2025 18:19

mergify bot added documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) new-model Requests to new models performance Performance-related issues v1 labels Aug 24, 2025

Merge branch 'main' into main

ef61811

johnnynunez added 3 commits August 26, 2025 00:22

Merge branch 'main' into main

6db9a91

Merge branch 'main' into main

c275344

Merge branch 'vllm-project:main' into main

32cf9f9

johnnynunez closed this Aug 26, 2025

Uh oh!

[NVIDIA] Enable Thor and Spark with CUDA 13 #23469

[NVIDIA] Enable Thor and Spark with CUDA 13 #23469

Uh oh!

Conversation

johnnynunez commented Aug 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mgoin commented Aug 23, 2025

Uh oh!

johnnynunez commented Aug 23, 2025

Uh oh!

simon-mo commented Aug 24, 2025

Uh oh!

johnnynunez commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simon-mo commented Aug 24, 2025

Uh oh!

johnnynunez commented Aug 25, 2025

Uh oh!

DrStone1971 commented Sep 13, 2025

Uh oh!

johnnynunez commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrStone1971 commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnynunez commented Sep 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

johnnynunez commented Aug 23, 2025 •

edited by github-actions bot

Loading

johnnynunez commented Aug 24, 2025 •

edited

Loading

johnnynunez commented Sep 13, 2025 •

edited

Loading

DrStone1971 commented Sep 13, 2025 •

edited

Loading