Skip to content

Releases: woct0rdho/triton-windows

v3.5.0-windows.post21

15 Oct 08:57

Choose a tag to compare

  • Find Windows SDK with some weird installation, see #131
  • The patch to support fp8 on RTX 30xx is merged, see #140
  • There is a known issue that shows Exception Code: 0x80000003 in registerImplicitTypeID. It's a bug in either LLVM or Triton, and we need more tests to find a minimal reproducer and fix it, see thu-ml/SageAttention#270

Note again that Triton 3.5 only works with PyTorch >= 2.9 .

To install Triton 3.5 and prevent breaking with your installed PyTorch when a new version of Triton is released in future, you need to limit the version to be < 3.6:

pip install -U "triton-windows<3.6"

triton-windows 3.2.0.post21 is also released, which supports fp8 on RTX 20xx and PyTorch 2.6 .

triton-windows 3.3.1.post21 and 3.4.0.post21 are also released, which support PyTorch 2.7 and 2.8 respectively.

v3.4.0-windows.post20

31 Jul 02:22

Choose a tag to compare

Note again that Triton 3.4 only works with PyTorch >= 2.8 .

To install Triton 3.4 and prevent breaking with your installed PyTorch when a new version of Triton is released in future, you need to limit the version to be < 3.5:

pip install -U "triton-windows<3.5"

v3.3.1-windows.post19

30 May 15:17

Choose a tag to compare

This is identical with triton-windows 3.3.0.post19, but I bump the version number to match the official one.

The only difference between the official triton 3.3.0 and 3.3.1 is triton-lang#6771 , which affects RTX 50xx GPUs. I've already added this patch since triton-windows 3.3.0.post14.

v3.3.0-windows.post19

24 Apr 07:04

Choose a tag to compare

  • Fix JIT compilation using Clang

Note again that Triton 3.3 only works with PyTorch >= 2.7, and Triton 3.2 only works with PyTorch >= 2.6 .

To install Triton 3.3 and prevent breaking with your installed PyTorch when a new version of Triton is released in future, you need to limit the version to be < 3.4:

pip install -U "triton-windows<3.4"

v3.2.0-windows.post18

18 Apr 02:45

Choose a tag to compare

  • Find MSVC and Windows SDK from environment variables set by Launch-VsDevShell.ps1 or VsDevCmd.bat, see #106
  • Print cc_cmd for debugging when failed to compile

empty

25 Mar 13:41
b689470

Choose a tag to compare

empty Pre-release
Pre-release

Here are some empty wheels named triton. You can add them to your build system if it tells you that some package requires triton rather than triton-windows, and also add triton-windows to the build system.

You may use transient-package to create such packages.

v3.2.0-windows.post17

20 Mar 12:26

Choose a tag to compare

Fix when multiple processes create __triton_launcher.pyd in parallel, see intel/intel-xpu-backend-for-triton#3270 . Now torch.compile autotune will work in general.

Note that ComfyUI enables cudaMalloc by default, but cudaMalloc does not work with CUDA graphs. Also, many models and nodes in ComfyUI are not compatible with CUDA graphs. You may use mode='max-autotune-no-cudagraphs' and see if it has speedup.

v3.2.0-windows.post16

20 Mar 05:55

Choose a tag to compare

Ensure temp files are closed when calling ptxas in parallel. I still need to investigate some bugs in PyTorch to make torch.compile autotune fully work, see unslothai/unsloth#1999

v3.2.0-windows.post15

16 Mar 15:35

Choose a tag to compare

Define Py_LIMITED_API and exclude new Python C API that cannot be compiled by TinyCC, see #92

v3.3.0-windows.post14

15 Mar 12:38

Choose a tag to compare

v3.3.0-windows.post14 Pre-release
Pre-release

Fix getMMAVersionSafe for RTX 50xx (sm120), see #83 (comment)