-
Notifications
You must be signed in to change notification settings - Fork 111
[BUILD] FIX CUDA 13 #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
waiting: Dao-AILab#1904 (comment) |
23a6d73 to
6ed2576
Compare
|
@ProExpertProg ready to merge |
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
Signed-off-by: johnnynunez <[email protected]>
|
If anyone has time can they test this with torch 2.8? if not I will test it tomorrow before landing 👍 |
it is here running now: vllm-project/vllm#25695 |
|
Running a longer build here: https://buildkite.com/vllm/ci/builds/33143#0199a333-a3a5-4a3b-a2b9-4682b3d60b1b |
seems was stopped for external reason |
|
Went ahead and built it locally; looks good 👍 |
The errors in flash_api_sparse.cpp and hopper/flash_api.cpp are C++11 narrowing warnings (treated as errors in strict builds) when initializing at::cuda::CUDAGuard with a non-constant char cast to c10::DeviceIndex (signed char).
Previously, integer division caused num_consumer_warpgroups_per_cluster to be 0 when params.num_consumers (e.g., 32) was less than NumThreadsPerWarpGroup (128), leading to a compiler failure during barrier initialization. Changed to round-up division to ensure a minimum value of 1.
Add Thor support