Releases: woct0rdho/triton-windows
v3.5.0-windows.post21
- Find Windows SDK with some weird installation, see #131
- The patch to support fp8 on RTX 30xx is merged, see #140
- There is a known issue that shows Exception Code: 0x80000003inregisterImplicitTypeID. It's a bug in either LLVM or Triton, and we need more tests to find a minimal reproducer and fix it, see thu-ml/SageAttention#270
Note again that Triton 3.5 only works with PyTorch >= 2.9 .
To install Triton 3.5 and prevent breaking with your installed PyTorch when a new version of Triton is released in future, you need to limit the version to be < 3.6:
pip install -U "triton-windows<3.6"triton-windows 3.2.0.post21 is also released, which supports fp8 on RTX 20xx and PyTorch 2.6 .
triton-windows 3.3.1.post21 and 3.4.0.post21 are also released, which support PyTorch 2.7 and 2.8 respectively.
v3.4.0-windows.post20
Note again that Triton 3.4 only works with PyTorch >= 2.8 .
To install Triton 3.4 and prevent breaking with your installed PyTorch when a new version of Triton is released in future, you need to limit the version to be < 3.5:
pip install -U "triton-windows<3.5"v3.3.1-windows.post19
This is identical with triton-windows 3.3.0.post19, but I bump the version number to match the official one.
The only difference between the official triton 3.3.0 and 3.3.1 is triton-lang#6771 , which affects RTX 50xx GPUs. I've already added this patch since triton-windows 3.3.0.post14.
v3.3.0-windows.post19
- Fix JIT compilation using Clang
Note again that Triton 3.3 only works with PyTorch >= 2.7, and Triton 3.2 only works with PyTorch >= 2.6 .
To install Triton 3.3 and prevent breaking with your installed PyTorch when a new version of Triton is released in future, you need to limit the version to be < 3.4:
pip install -U "triton-windows<3.4"v3.2.0-windows.post18
- Find MSVC and Windows SDK from environment variables set by Launch-VsDevShell.ps1orVsDevCmd.bat, see #106
- Print cc_cmdfor debugging when failed to compile
empty
Here are some empty wheels named triton. You can add them to your build system if it tells you that some package requires triton rather than triton-windows, and also add triton-windows to the build system.
You may use transient-package to create such packages.
v3.2.0-windows.post17
Fix when multiple processes create __triton_launcher.pyd in parallel, see intel/intel-xpu-backend-for-triton#3270 . Now torch.compile autotune will work in general.
Note that ComfyUI enables cudaMalloc by default, but cudaMalloc does not work with CUDA graphs. Also, many models and nodes in ComfyUI are not compatible with CUDA graphs. You may use mode='max-autotune-no-cudagraphs' and see if it has speedup.
v3.2.0-windows.post16
Ensure temp files are closed when calling ptxas in parallel. I still need to investigate some bugs in PyTorch to make torch.compile autotune fully work, see unslothai/unsloth#1999
v3.2.0-windows.post15
Define Py_LIMITED_API and exclude new Python C API that cannot be compiled by TinyCC, see #92
v3.3.0-windows.post14
Fix getMMAVersionSafe for RTX 50xx (sm120), see #83 (comment)