Releases: woct0rdho/triton-windows
v3.2.0-windows.post13
TinyCC is bundled in the wheels, so you don't need to install MSVC to use Triton. Packages that directly call triton.jit, such as SageAttention, will just work.
You still need to install a C++ compiler if you use torch.compile targeting CPU. This may happen when you use nodes like 'CompileModel' in ComfyUI. Triton does not affect how PyTorch configures the C++ compiler in this case.
tcc
TinyCC is bundled in the wheels, so you don't need to install MSVC to use Triton.
TinyCC 0.9.27 is downloaded from https://download.savannah.gnu.org/releases/tinycc/tcc-0.9.27-win64-bin.zip
TinyCC 0.9.28rc is built from https://github.com/TinyCC/tinycc . It implements stdalign.h, which is needed since Triton 3.5, see triton-lang#7987 .
The def files used by Triton are generated by
tcc -impdef C:\Windows\System32\nvcuda.dll -o lib\cuda.def
tcc -impdef path\to\python3.dll -o lib\python3.def
tcc -impdef path\to\python39.dll -o lib\python39.def
...For reference, the versions of these DLLs are
nvcuda.dll 32.0.15.7270
python3.dll 3.9.13150.1013
python39.dll 3.9.13150.1013
python310.dll 3.10.11150.1013
python311.dll 3.11.9150.1013
python312.dll 3.12.9150.1013
python313.dll 3.13.2150.1013
python314.dll 3.14.123.1013
python3t.dll 3.13.2150.1013
python313t.dll 3.13.2150.1013
python314t.dll 3.14.123.1013
The pip package tinycc was not used because these def files also need to be bundled.
v3.2.0-windows.post12
Let the environment variables TRITON_LIBCUDA_PATH and CUDA_PATH take higher precedence than the bundled CUDA
v3.2.0-windows.post11
- Since the release post11, the wheels are published to https://pypi.org/project/triton-windows/ , and no longer to GitHub. You can simply install the wheel usingpip install -U triton-windows
- A minimal toolchain of CUDA is bundled in the wheels, so you don't need to manually install it. (You still need to manually install MSVC, Windows SDK, and vcredist)
- The wheels are linked against the LLVM from oaitriton.blob.core.windows.net, built by https://github.com/triton-lang/triton/blob/main/.github/workflows/llvm-build.yml , to better align with the official Triton
- The JIT-compiled C binaries (cuda_utils.pyd,__triton_launcher.pyd) are linked against the Python stable ABI, so there should be less error likeDLL load failed while importing cuda_utilswhen switching the Python version
v3.2.0-windows.post10
For conda, support pytorch-gpu installed in conda-forge channel and cuda-toolkit installed in nvidia channel. Starting from PyTorch 2.6, PyTorch is no longer released in pytorch channel
v3.2.0-windows.post9
Following the official Triton, I release wheels for Python 3.9 to 3.13 .
v3.1.0-windows.post9
- Fix PTX ISA version for CUDA 12.8
- Fix int64 overflow in make_launcher
v3.1.0-windows.post8
Support CUDA from pip
v3.1.0-windows.post7
Fix build.py
v3.1.0-windows.post6
- Avoid creating files like main.objin the working dir
- Refactor windows_utils.pyto more thoroughly find MSVC and CUDA
Edit: Do not use this release, as there was a silly mistake