Skip to content

Conversation

@johnnynunez
Copy link
Contributor

@johnnynunez johnnynunez commented Sep 11, 2025

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the CMake configuration to support the new NVIDIA Blackwell architecture family, aligning with CUDA 12.9+ features. The changes introduce new architecture codes and update the minimum required CUDA version for Blackwell-specific kernels. While this is a necessary update, I've identified a critical issue with how the new architecture suffixes are handled, which will likely cause build failures. Additionally, there's a potential regression for users on CUDA 12.8 that should be addressed.

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 13, 2025

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 13, 2025

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

@pytorch-bot pytorch-bot bot removed the ci/build label Sep 13, 2025
@mergify mergify bot added the ci/build label Sep 13, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 13, 2025

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

@johnnynunez johnnynunez changed the title Blackwell Family [NVIDIA] Blackwell Family Sep 13, 2025
@hmellor
Copy link
Member

hmellor commented Sep 15, 2025

Thanks for the PR! Could you please add some more information to the PR description about what this enables, for example:

  • Which new hardware does this enable?
  • What effect does it have on the wheel size?

@johnnynunez
Copy link
Contributor Author

Thanks for the PR! Could you please add some more information to the PR description about what this enables, for example:

  • Which new hardware does this enable?
  • What effect does it have on the wheel size?

This enable correctly:
Spark
Thor
Gb300
Without increase binaries size.

@mgoin
Copy link
Member

mgoin commented Sep 15, 2025

@johnnynunez do you want to update cutlass separately? 4.2 tag hasn't been made yet

[2025-09-13T21:11:55Z] #25 13.59 CMake Error at cutlass-subbuild/cutlass-populate-prefix/tmp/cutlass-populate-gitclone.cmake:61 (message):
[2025-09-13T21:11:55Z] #25 13.59   Failed to checkout tag: 'v4.2.0'

@johnnynunez
Copy link
Contributor Author

johnnynunez commented Sep 15, 2025

@johnnynunez do you want to update cutlass separately? 4.2 tag hasn't been made yet

[2025-09-13T21:11:55Z] #25 13.59 CMake Error at cutlass-subbuild/cutlass-populate-prefix/tmp/cutlass-populate-gitclone.cmake:61 (message):
[2025-09-13T21:11:55Z] #25 13.59   Failed to checkout tag: 'v4.2.0'

hello, cutlass v4.2.0 is out today!

NVIDIA/cutlass@6a35b4d
image

@johnnynunez
Copy link
Contributor Author

@johnnynunez
Copy link
Contributor Author

cc @Aidyn-A could you review this PR?

Copy link
Contributor

@Aidyn-A Aidyn-A left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some of the comments. I would like you to test the build, make sure it is 100% successful and double test all the kernels on all machines. I doubt that those CUTLASS kernels need extra arch flags, as CUTLASS kernels tend to be arch specific (e.g. cutlass kernel written for sm_100 can be unusable on sm_120 or sm_110). Please keep in mind that not all arch conditional instructions can be replaced with family conditional.

Additionally, I have not dig into it, but I am pretty sure, the function cuda_archs_loose_intersection needs to be modified for family conditional flags like it does for arch conditional:

vllm/cmake/utils.cmake

Lines 315 to 325 in f4cd80f

set(_CUDA_ARCHS)
foreach(_arch ${_SRC_CUDA_ARCHS})
if(_arch MATCHES "\\a$")
list(REMOVE_ITEM _SRC_CUDA_ARCHS "${_arch}")
string(REPLACE "a" "" _base "${_arch}")
if ("${_base}" IN_LIST TGT_CUDA_ARCHS)
list(REMOVE_ITEM _TGT_CUDA_ARCHS "${_base}")
list(APPEND _CUDA_ARCHS "${_arch}")
endif()
endif()
endforeach()

Lastly. The regular arch flags and their family conditional counterparts are conflicting with each other. For example:

nvcc code.cu -gencode arch=compute_120,code=sm_120 -gencode arch=compute_120f,code=sm_120f

will end up failing:

nvcc fatal   : The same GPU code (`sm_120`) generated for non family-specific and family-specific GPU arch

That being said, here it is important to be very precise on what flags to apply. I would modify cuda_archs_loose_intersection to exclude the basic arch flag if family conditional is being passed.

@jasl
Copy link
Contributor

jasl commented Sep 28, 2025

@khluu khluu removed the ready ONLY add when PR is ready to merge/full CI is needed label Sep 29, 2025
@DrStone1971
Copy link
Contributor

https://github.com/vllm-project/vllm/pull/24673/files#diff-c1cdf2ae7a3604efb26f96752f519b9d86fe38fd05dacca78301c522fb20d819R256-R262

Does cutlass_moe_mm_sm100 work on SM120?

is strange this code:

#if defined CUDA_VERSION
if (cuda_device_capability >= 100) {
return CUDA_VERSION >= 12080;
}
if (cuda_device_capability >= 90) {
return CUDA_VERSION >= 12030;
}
#endif

i have need to analyze and test.

@jasl have a simple code for testing if capavility is active and run ?

@jasl
Copy link
Contributor

jasl commented Sep 29, 2025

https://github.com/vllm-project/vllm/pull/24673/files#diff-c1cdf2ae7a3604efb26f96752f519b9d86fe38fd05dacca78301c522fb20d819R256-R262

@DrStone71

My testing machine is an x86 with RTX Pro 6000 (SM120)

I'm not sure this kernel labeled _sm100 will work on SM120.
Looking at the CMakeList https://github.com/vllm-project/vllm/blob/main/CMakeLists.txt#L678-L696 it only works for SM100.
So I guess we need an upper bound check here

And cutlass_moe_mm_sm100 causes trouble for me when running vllm, I got importError: /home/jasl/.venv/lib/python3.12/site-packages/vllm/_C.abi3.so: undefined symbol: _Z20cutlass_moe_mm_sm100RN2at6TensorERKS0_S3_S3_S3_S3_S3_S3_S3_S3_bb

I have a patch that works for me jasl@b52d720
(I'm not sure this is correct)

Because SM120 doesn't contain SM100 features, it's difficult to disable the compilation for non-SM100 platforms

@jasl
Copy link
Contributor

jasl commented Sep 29, 2025

image

With my additional patch, vllm@CUDA 13 is working on my RTX Pro 6000

I'm testing on my Thor now.

@DrStone1971
Copy link
Contributor

is need a check with Cuda 13 and Sm_120 ?

i use a physical machine, there is a need of particular environment for testing (sw version or pip sw ?)

DrStone71

@johnnynunez
Copy link
Contributor Author

is need a check with Cuda 13 and Sm_120 ?

i use a physical machine, there is a need of particular environment for testing (sw version or pip sw ?)

DrStone71

Test MoE, gptoss etc

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 1, 2025
@vllm-bot vllm-bot merged commit 5234dc7 into vllm-project:main Oct 1, 2025
83 of 85 checks passed
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
Signed-off-by: Johnny <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: Johnny <[email protected]>
Signed-off-by: Salvatore Cena <[email protected]>
Co-authored-by: Aidyn-A <[email protected]>
Co-authored-by: Salvatore Cena <[email protected]>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Johnny <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: Johnny <[email protected]>
Signed-off-by: Salvatore Cena <[email protected]>
Co-authored-by: Aidyn-A <[email protected]>
Co-authored-by: Salvatore Cena <[email protected]>
Signed-off-by: yewentao256 <[email protected]>
tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025
Signed-off-by: Johnny <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: Johnny <[email protected]>
Signed-off-by: Salvatore Cena <[email protected]>
Co-authored-by: Aidyn-A <[email protected]>
Co-authored-by: Salvatore Cena <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Johnny <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: Johnny <[email protected]>
Signed-off-by: Salvatore Cena <[email protected]>
Co-authored-by: Aidyn-A <[email protected]>
Co-authored-by: Salvatore Cena <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>

# moe_data.cu is used by all CUTLASS MoE kernels.
cuda_archs_loose_intersection(CUTLASS_MOE_DATA_ARCHS "9.0a;10.0a" "${CUDA_ARCHS}")
if(${CMAKE_CUDA_COMPILER_VERSION} VERSION_GREATER_EQUAL 13.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it 13.0 rather than 12.9?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: Johnny <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: Johnny <[email protected]>
Signed-off-by: Salvatore Cena <[email protected]>
Co-authored-by: Aidyn-A <[email protected]>
Co-authored-by: Salvatore Cena <[email protected]>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Johnny <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: Johnny <[email protected]>
Signed-off-by: Salvatore Cena <[email protected]>
Co-authored-by: Aidyn-A <[email protected]>
Co-authored-by: Salvatore Cena <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Johnny <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: Johnny <[email protected]>
Signed-off-by: Salvatore Cena <[email protected]>
Co-authored-by: Aidyn-A <[email protected]>
Co-authored-by: Salvatore Cena <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: Johnny <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: Johnny <[email protected]>
Signed-off-by: Salvatore Cena <[email protected]>
Co-authored-by: Aidyn-A <[email protected]>
Co-authored-by: Salvatore Cena <[email protected]>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Signed-off-by: Johnny <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: Johnny <[email protected]>
Signed-off-by: Salvatore Cena <[email protected]>
Co-authored-by: Aidyn-A <[email protected]>
Co-authored-by: Salvatore Cena <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.