CUDA: update compilation flags for improved performance#1099
CUDA: update compilation flags for improved performance#1099royshil wants to merge 2 commits intoggml-org:masterfrom
Conversation
src/ggml-cuda/CMakeLists.txt
Outdated
| set(CUDA_CXX_FLAGS "") | ||
|
|
||
| set(CUDA_FLAGS -use_fast_math) | ||
| set(CUDA_FLAGS -use_fast_math --threads=0 --split-compile=0) |
There was a problem hiding this comment.
should we better create a new cmake option: eg CUDA_COMPILE_THREADS ?
There was a problem hiding this comment.
yea thats an easy fix. let me do that
|
What is the advantage vs. specifying the number of threads via CMake? For example, this is the command that I use locally: cmake -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON .. && time cmake --build . -j 32 -- --quiet |
@JohannesGaessler this is for the |
|
BTW using this option i was able to cut from-scratch compile time by 50% from ~4hrs to 2hr. |
|
CMake |
@JohannesGaessler on Windows it does not. |
|
How are you testing? |
Even at 2 hours, that's much higher than expected, even when building for all the supported architectures. Can you share more details about the setup that you are using to build? Hardware, MSVC and CUDA toolkit versions, and anything else that you think may be relevant. |
|
@slaren if you look at recent windows cublas builds for whisper.cpp e.g. https://github.com/ggerganov/whisper.cpp/actions/runs/13115822916/job/36589762164 you'll notice it takes roughly 4 hours to complete. |
|
Um yeah, the whisper CI does not even use |
|
@slaren so i just tried cmake |
|
Yes absolutely, if it improves performance we should add it. But it may also cause thread contention if used together with |
|
Here are my results overall
So we can conclude either |
This adds CUDA
nvcccompile parallelization to speed up.cufiles compilation (which take >3 hours today).Setting
--threads=0lets the system find out how many cores it can use for parallelization.Per NVidia documents: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#threads-number-t