Skip to content

Conversation

@johnnynunez
Copy link

@johnnynunez johnnynunez commented Sep 23, 2025

The errors in flash_api_sparse.cpp and hopper/flash_api.cpp are C++11 narrowing warnings (treated as errors in strict builds) when initializing at::cuda::CUDAGuard with a non-constant char cast to c10::DeviceIndex (signed char).

Previously, integer division caused num_consumer_warpgroups_per_cluster to be 0 when params.num_consumers (e.g., 32) was less than NumThreadsPerWarpGroup (128), leading to a compiler failure during barrier initialization. Changed to round-up division to ensure a minimum value of 1.

Add Thor support

@johnnynunez
Copy link
Author

waiting: Dao-AILab#1904 (comment)

@johnnynunez johnnynunez force-pushed the main branch 2 times, most recently from 23a6d73 to 6ed2576 Compare September 24, 2025 08:32
@johnnynunez
Copy link
Author

@ProExpertProg ready to merge

Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
@ProExpertProg
Copy link

CC @LucasWilkinson

@LucasWilkinson
Copy link
Collaborator

If anyone has time can they test this with torch 2.8? if not I will test it tomorrow before landing 👍

@johnnynunez
Copy link
Author

If anyone has time can they test this with torch 2.8? if not I will test it tomorrow before landing 👍

it is here running now: vllm-project/vllm#25695

@LucasWilkinson
Copy link
Collaborator

@johnnynunez
Copy link
Author

Running a longer build here: https://buildkite.com/vllm/ci/builds/33143#0199a333-a3a5-4a3b-a2b9-4682b3d60b1b

seems was stopped for external reason

@LucasWilkinson
Copy link
Collaborator

Went ahead and built it locally; looks good 👍

@LucasWilkinson LucasWilkinson merged commit 45b5dac into vllm-project:main Oct 2, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants