Fix matrix.h: set_slice3x3 -> set_slice_3x3 by lucifer1004 · Pull Request #1784 · NVIDIA/cutlass

lucifer1004 · 2024-09-05T03:25:02Z

Clang 19 is more strict than Clang 18 about missing members for template instantiations. CUTLASS has already already fixed this in NVIDIA/cutlass#1784, but the XLA fork of CUTLASS is out of date and needs to be patched in order to build GPU kernels with Clang > 19. To test this change, run the following locally: ```sh $ docker run --rm -it \ --name xla \ -v $PWD:/xla \ -w /tmp \ silkeh/clang:19 $ wget https://github.com/bazelbuild/bazelisk/releases/download/v1.25.0/bazelisk-amd64.deb $ dpkg -i bazelisk-amd64.deb $ cd /xla $ ./configure.py \ --backend=CUDA \ --cuda_compiler=CLANG \ --cuda_compute_capabilities=9.0 $ bazel \ build \ --repo_env=HERMETIC_PYTHON_VERSION=3.11 \ --repo_env=HERMETIC_CUDA_VERSION="12.8.0" \ --repo_env=HERMETIC_CUDNN_VERSION="9.7.1" \ --test_output=all \ --spawn_strategy=sandboxed \ //xla/service/gpu/kernels:cutlass_gemm_kernel_f32xf32_to_f32 ```

Clang 19 is stricter than Clang 18 about missing members for template instantiations. CUTLASS has already fixed this in NVIDIA/cutlass#1784, but the XLA fork of CUTLASS is out of date and needs to be patched in order to build GPU kernels with Clang > 19. To test this change, run the following locally: ```sh $ docker run --rm -it \ --name xla \ -v $PWD:/xla \ -w /tmp \ silkeh/clang:19 $ wget https://github.com/bazelbuild/bazelisk/releases/download/v1.25.0/bazelisk-amd64.deb $ dpkg -i bazelisk-amd64.deb $ cd /xla $ ./configure.py \ --backend=CUDA \ --cuda_compiler=CLANG \ --cuda_compute_capabilities=9.0 $ bazel \ build \ --repo_env=HERMETIC_PYTHON_VERSION=3.11 \ --repo_env=HERMETIC_CUDA_VERSION="12.8.0" \ --repo_env=HERMETIC_CUDNN_VERSION="9.7.1" \ --test_output=all \ --spawn_strategy=sandboxed \ //xla/service/gpu/kernels:cutlass_gemm_kernel_f32xf32_to_f32 ```

set_slice3x3 -> set_slice_3x3

871cc2c

hwu36 approved these changes Sep 6, 2024

View reviewed changes

hwu36 merged commit 82f5075 into NVIDIA:main Sep 6, 2024

lucifer1004 deleted the patch-1 branch September 6, 2024 03:32

lucifer1004 mentioned this pull request Sep 6, 2024

[QST] cutlass fails during tensorflow assembly #1603

Closed

hgl71964 pushed a commit to hgl71964/cutlass that referenced this pull request Feb 21, 2025

set_slice3x3 -> set_slice_3x3 (NVIDIA#1784)

af21b72

JackWolfard mentioned this pull request Feb 26, 2025

[xla:gpu] Patch CUTLASS to fix build on Clang > 19 openxla/xla#23117

Closed

pearu mentioned this pull request Mar 27, 2025

JAX build fails when using clang>=19 jax-ml/jax#27091

Closed

andralex pushed a commit to andralex/cutlass that referenced this pull request Jun 14, 2025

set_slice3x3 -> set_slice_3x3 (NVIDIA#1784)

91501eb

Albresky pushed a commit to Albresky/cutlass that referenced this pull request Oct 11, 2025

set_slice3x3 -> set_slice_3x3 (NVIDIA#1784)

2672269

julcst mentioned this pull request Dec 16, 2025

Outdated Cutlass leads to compile error NVlabs/tiny-cuda-nn#530

Open

GriffinYoung mentioned this pull request Mar 2, 2026

When using USE_DEEPSPEED_EVO_ATTENTION: Error in prediction: data must be finite, check for nan or inf IntelliGen-AI/IntelliFold#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix matrix.h: set_slice3x3 -> set_slice_3x3#1784

Fix matrix.h: set_slice3x3 -> set_slice_3x3#1784
hwu36 merged 1 commit intoNVIDIA:mainfrom
lucifer1004:patch-1

lucifer1004 commented Sep 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lucifer1004 commented Sep 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants