Skip to content

Fix matrix.h: set_slice3x3 -> set_slice_3x3#1784

Merged
hwu36 merged 1 commit intoNVIDIA:mainfrom
lucifer1004:patch-1
Sep 6, 2024
Merged

Fix matrix.h: set_slice3x3 -> set_slice_3x3#1784
hwu36 merged 1 commit intoNVIDIA:mainfrom
lucifer1004:patch-1

Conversation

@lucifer1004
Copy link
Contributor

Fix #1783

@hwu36 hwu36 merged commit 82f5075 into NVIDIA:main Sep 6, 2024
@lucifer1004 lucifer1004 deleted the patch-1 branch September 6, 2024 03:32
hgl71964 pushed a commit to hgl71964/cutlass that referenced this pull request Feb 21, 2025
JackWolfard added a commit to JackWolfard/xla that referenced this pull request Feb 26, 2025
Clang 19 is more strict than Clang 18 about missing members for template
instantiations. CUTLASS has already already fixed this in
NVIDIA/cutlass#1784, but the XLA fork of CUTLASS is out of date and needs
to be patched in order to build GPU kernels with Clang > 19.

To test this change, run the following locally:

```sh
$ docker run --rm -it \
    --name xla \
    -v $PWD:/xla \
    -w /tmp \
    silkeh/clang:19
$ wget https://github.com/bazelbuild/bazelisk/releases/download/v1.25.0/bazelisk-amd64.deb
$ dpkg -i bazelisk-amd64.deb
$ cd /xla
$ ./configure.py \
    --backend=CUDA \
    --cuda_compiler=CLANG \
    --cuda_compute_capabilities=9.0
$ bazel \
    build \
    --repo_env=HERMETIC_PYTHON_VERSION=3.11 \
    --repo_env=HERMETIC_CUDA_VERSION="12.8.0" \
    --repo_env=HERMETIC_CUDNN_VERSION="9.7.1" \
    --test_output=all \
    --spawn_strategy=sandboxed \
    //xla/service/gpu/kernels:cutlass_gemm_kernel_f32xf32_to_f32
```
JackWolfard added a commit to JackWolfard/xla that referenced this pull request Feb 26, 2025
Clang 19 is stricter than Clang 18 about missing members for template
instantiations. CUTLASS has already fixed this in NVIDIA/cutlass#1784,
but the XLA fork of CUTLASS is out of date and needs to be patched in
order to build GPU kernels with Clang > 19.

To test this change, run the following locally:

```sh
$ docker run --rm -it \
    --name xla \
    -v $PWD:/xla \
    -w /tmp \
    silkeh/clang:19
$ wget https://github.com/bazelbuild/bazelisk/releases/download/v1.25.0/bazelisk-amd64.deb
$ dpkg -i bazelisk-amd64.deb
$ cd /xla
$ ./configure.py \
    --backend=CUDA \
    --cuda_compiler=CLANG \
    --cuda_compute_capabilities=9.0
$ bazel \
    build \
    --repo_env=HERMETIC_PYTHON_VERSION=3.11 \
    --repo_env=HERMETIC_CUDA_VERSION="12.8.0" \
    --repo_env=HERMETIC_CUDNN_VERSION="9.7.1" \
    --test_output=all \
    --spawn_strategy=sandboxed \
    //xla/service/gpu/kernels:cutlass_gemm_kernel_f32xf32_to_f32
```
andralex pushed a commit to andralex/cutlass that referenced this pull request Jun 14, 2025
Albresky pushed a commit to Albresky/cutlass that referenced this pull request Oct 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Simple matrix rotation could not compile

2 participants