Releases · woct0rdho/triton-windows

12 Mar 13:07

v3.2.0-windows.post13

57a9a56

v3.2.0-windows.post13

TinyCC is bundled in the wheels, so you don't need to install MSVC to use Triton. Packages that directly call triton.jit, such as SageAttention, will just work.

You still need to install a C++ compiler if you use torch.compile targeting CPU. This may happen when you use nodes like 'CompileModel' in ComfyUI. Triton does not affect how PyTorch configures the C++ compiler in this case.

Assets 2

12 Mar 10:06

woct0rdho

tcc

df50695

tcc Pre-release

Pre-release

TinyCC is bundled in the wheels, so you don't need to install MSVC to use Triton.

TinyCC 0.9.27 is downloaded from https://download.savannah.gnu.org/releases/tinycc/tcc-0.9.27-win64-bin.zip

TinyCC 0.9.28rc is built from https://github.com/TinyCC/tinycc . It implements stdalign.h, which is needed since Triton 3.5, see triton-lang#7987 .

The def files used by Triton are generated by

tcc -impdef C:\Windows\System32\nvcuda.dll -o lib\cuda.def
tcc -impdef path\to\python3.dll -o lib\python3.def
tcc -impdef path\to\python39.dll -o lib\python39.def
...

For reference, the versions of these DLLs are

nvcuda.dll 32.0.15.7270
python3.dll 3.9.13150.1013
python39.dll 3.9.13150.1013
python310.dll 3.10.11150.1013
python311.dll 3.11.9150.1013
python312.dll 3.12.9150.1013
python313.dll 3.13.2150.1013
python314.dll 3.14.123.1013
python3t.dll 3.13.2150.1013
python313t.dll 3.13.2150.1013
python314t.dll 3.14.123.1013

The pip package tinycc was not used because these def files also need to be bundled.

Assets 4

10 Mar 15:34

woct0rdho

v3.2.0-windows.post12

13edc97

v3.2.0-windows.post12

Let the environment variables TRITON_LIBCUDA_PATH and CUDA_PATH take higher precedence than the bundled CUDA

Assets 2

10 Mar 06:35

woct0rdho

v3.2.0-windows.post11

cf815b6

v3.2.0-windows.post11

Since the release post11, the wheels are published to https://pypi.org/project/triton-windows/ , and no longer to GitHub. You can simply install the wheel using pip install -U triton-windows
A minimal toolchain of CUDA is bundled in the wheels, so you don't need to manually install it. (You still need to manually install MSVC, Windows SDK, and vcredist)
The wheels are linked against the LLVM from oaitriton.blob.core.windows.net, built by https://github.com/triton-lang/triton/blob/main/.github/workflows/llvm-build.yml , to better align with the official Triton
The JIT-compiled C binaries (cuda_utils.pyd, __triton_launcher.pyd) are linked against the Python stable ABI, so there should be less error like DLL load failed while importing cuda_utils when switching the Python version

Assets 2

19 Feb 04:18

woct0rdho

v3.2.0-windows.post10

2d4b2db

v3.2.0-windows.post10

For conda, support pytorch-gpu installed in conda-forge channel and cuda-toolkit installed in nvidia channel. Starting from PyTorch 2.6, PyTorch is no longer released in pytorch channel

Assets 7