metal: Fix dimension constraint violation in matmul2d descriptor by lathrys-at · Pull Request #21048 · ggml-org/llama.cpp

lathrys-at · 2026-03-26T22:31:19Z

Updates Metal tensor API test probes to fix the dimension constraint violation in the matmul2d descriptor (at least one value must be a multiple of 16).

Overview

Some investigation detailed here #16634 (comment) indicated that the test probes for the metal tensor API fails to compile successfully on macOS 26.4, leading to the tensor support in the metal backend being disabled erroneously. This is due to a change in the Apple APIs between the time #16634 was tested and merged by @ggerganov and today. They now require that at least one of the dimensions M and N be a multiple of 16.

Notably, the actual kernels used already respect this constraint (obviously, as they are compiling successfully today), and it is only these test probes which violate it.

I verified this by turning on verbose logging in the calls to ggml_metal_library_init_from_source() and observed the following output when llama-server starts up. Notice ggml_metal_device_init: has tensor = false in the output, which is unexpected on a device with an M5 chip.

Metal compilation error

$ ./build/bin/llama-server --version                                            
ggml_metal_device_init: testing tensor API for f16 support
ggml_metal_library_init_from_source: error compiling source: Error Domain=MTLLibraryErrorDomain Code=3 "In file included from program_source:4:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:389:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3765:5: error: static_assert failed due to requirement '(mpp::tensor_ops::matmul2d_descriptor{8, 8, -1, false, false, false, 0}.m % 16) == 0 || (mpp::tensor_ops::matmul2d_descriptor{8, 8, -1, false, false, false, 0}.n % 16) == 0' "At least one of M or N must be a multiple of 16"
    static_assert((descriptor.m % 16) == 0 || (descriptor.n % 16) == 0, "At least one of M or N must be a multiple of 16");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:415:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{8, 8, -1, false, false, false, 0}, metal::execution_simdgroups<4>, metal::tensor<device half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_handle, metal::tensor_offset>, metal::tensor<device half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_handle, metal::tensor_offset>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{8, 8, -1, false, false, false, 0}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, half, half, float, int>>>' requested here
    __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType,
                       ^
program_source:26:8: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{8, 8, -1, false, false, false, 0}, metal::execution_simdgroups<4>>::run<metal::tensor<device half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_handle, metal::tensor_offset>, metal::tensor<device half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_handle, metal::tensor_offset>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{8, 8, -1, false, false, false, 0}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, half, half, float, int>>, void>' requested here
    mm.run(sB, sA, cT); 
       ^
" UserInfo={NSLocalizedDescription=In file included from program_source:4:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:389:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3765:5: error: static_assert failed due to requirement '(mpp::tensor_ops::matmul2d_descriptor{8, 8, -1, false, false, false, 0}.m % 16) == 0 || (mpp::tensor_ops::matmul2d_descriptor{8, 8, -1, false, false, false, 0}.n % 16) == 0' "At least one of M or N must be a multiple of 16"
    static_assert((descriptor.m % 16) == 0 || (descriptor.n % 16) == 0, "At least one of M or N must be a multiple of 16");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:415:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{8, 8, -1, false, false, false, 0}, metal::execution_simdgroups<4>, metal::tensor<device half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_handle, metal::tensor_offset>, metal::tensor<device half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_handle, metal::tensor_offset>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{8, 8, -1, false, false, false, 0}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, half, half, float, int>>>' requested here
    __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType,
                       ^
program_source:26:8: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{8, 8, -1, false, false, false, 0}, metal::execution_simdgroups<4>>::run<metal::tensor<device half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_handle, metal::tensor_offset>, metal::tensor<device half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_handle, metal::tensor_offset>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{8, 8, -1, false, false, false, 0}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, half, half, float, int>>, void>' requested here
    mm.run(sB, sA, cT); 
       ^
}
ggml_metal_library_init_from_source: failed to create Metal library from source
ggml_metal_device_init: - the tensor API is not supported in this environment - disabling
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.014 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyApple10  (1010)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 19069.67 MB
version: 1 (0fac87b)
built with AppleClang 21.0.0.21000099 for Darwin arm64

With the change in this branch the error no longer reproduces and the following output is observed:

Successful initialization of metal backend

$ ./build/bin/llama-server --version                                            
ggml_metal_device_init: testing tensor API for f16 support
ggml_metal_library_init_from_source: compiled in 0.105 sec
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel'
ggml_metal_library_compile_pipeline: loaded dummy_kernel                                  0x104346e00 | th_max = 1024 | th_width =   32
ggml_metal_device_init: testing tensor API for bfloat support
ggml_metal_library_init_from_source: compiled in 0.070 sec
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel'
ggml_metal_library_compile_pipeline: loaded dummy_kernel                                  0x104374500 | th_max = 1024 | th_width =   32
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 5.792 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyApple10  (1010)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 19069.67 MB
version: 1 (0fac87b)
built with AppleClang 21.0.0.21000099 for Darwin arm64

Additional information

Tests Performed

Verified the expected output above on an M5 device with the latest release of Apple's APIs on macOS 26.4
test-backend-ops: OK

Requirements

I have read and agree with the contributing guidelines: YES
AI usage disclosure: NO

Updates Metal tensor API test probe to fix the dimension constraint violation in the matmul2d descriptor (at least one value must be a multiple of 16).

Developer-Ecosystem-Engineering · 2026-03-27T00:43:21Z

A fix for this issue was also included in #20962

lathrys-at · 2026-03-27T00:48:43Z

A fix for this issue was also included in #20962

excellent. i'll let the owners decide what to do with this then.

ggerganov · 2026-03-27T07:04:09Z

Thank you for helping with this. I'll merge this for now.

@Developer-Ecosystem-Engineering I saw your PR. I'm looking forward to playing with it soon. My M5 Max should arrive in about 2-3 weeks.

Fix dimension constraint violation in matmul2d descriptor

a307582

Updates Metal tensor API test probe to fix the dimension constraint violation in the matmul2d descriptor (at least one value must be a multiple of 16).

lathrys-at requested a review from a team as a code owner March 26, 2026 22:31

lathrys-at mentioned this pull request Mar 26, 2026

metal : initial Metal4 tensor API support #16634

Merged

4 tasks

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 26, 2026

loci-dev mentioned this pull request Mar 27, 2026

UPSTREAM PR #21048: metal: Fix dimension constraint violation in matmul2d descriptor auroralabs-loci/llama.cpp#1304

Open

ggerganov approved these changes Mar 27, 2026

View reviewed changes

ggerganov merged commit 9bcb4ef into ggml-org:master Mar 27, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal: Fix dimension constraint violation in matmul2d descriptor#21048

metal: Fix dimension constraint violation in matmul2d descriptor#21048
ggerganov merged 1 commit intoggml-org:masterfrom
lathrys-at:lathrys/fix-matmul2d-descriptor-constraint-violation

lathrys-at commented Mar 26, 2026

Uh oh!

Developer-Ecosystem-Engineering commented Mar 27, 2026

Uh oh!

lathrys-at commented Mar 27, 2026

Uh oh!

ggerganov commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lathrys-at commented Mar 26, 2026

Overview

Additional information

Tests Performed

Requirements

Uh oh!

Developer-Ecosystem-Engineering commented Mar 27, 2026

Uh oh!

lathrys-at commented Mar 27, 2026

Uh oh!

ggerganov commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants