UPSTREAM PR #16634: metal : initial Metal4 tensor API support by DajanaV · Pull Request #1 · auroralabs-loci/llama.cpp

DajanaV · 2025-10-28T11:51:28Z

Rework matrix-matrix multiplication
Use Tensor API when available

TODOs

Update mul_mm_id kernel
Test on M5 (looking for volunteers to test as I won't have hardware anytime soon)
How to handle missing bfloat tensor API? metal : initial Metal4 tensor API support ggml-org/llama.cpp#16634 (comment)

mtmd: fix vision model processing

HIP/MUSA: fix build for backend sampling

commit 912ed2cd9339d1b2875d98744ca5b51fa62e581e Author: samuel <[email protected]> Date: Sun Dec 7 23:00:29 2025 -0300 speculative (feat): implement recursive MTP drafting for GLM-4.5 commit bdf72d9552e3da64ffc85f175664713388752914 Author: samuel <[email protected]> Date: Sat Dec 6 16:10:16 2025 -0300 sampling (feat): optimize speculative drafting with fast-path selection commit a91980a8f3475a6bbac0a64d8be06dd4b613020e Author: samuel <[email protected]> Date: Sat Dec 6 15:18:19 2025 -0300 mtp (chore): clean old code commit 6de0ecf55db8567db4faa99b0152b72c9e854548 Author: samuel <[email protected]> Date: Sat Dec 6 14:40:13 2025 -0300 mtp (feat): add mtp arg commit ea77394183b8e6c368af969b8274039a54b11486 Author: samuel <[email protected]> Date: Sat Dec 6 13:47:54 2025 -0300 mtp-graph (fix): move llama_get_logits_ith outside the loop commit 15dff208958fb66802f20ec53ce5fcaff133edb7 Merge: 171346c74 cae85fe53 Author: samuel <[email protected]> Date: Thu Oct 16 13:44:41 2025 -0300 Merge branch 'glm4-mtp-batch' of https://github.com/SamuelOliveirads/llama.cpp into glm4-mtp-graph-cache commit cae85fe531876762ee02524fc4c3f6c5e7824c63 Author: samuel <[email protected]> Date: Thu Oct 16 13:42:31 2025 -0300 mtp-batch(fix): avoid logits for mtp kv cache operations commit 171346c742c310bbcfbd786b61250638ccf8b44d Author: samuel <[email protected]> Date: Sun Oct 12 16:33:01 2025 -0300 mtp-graph(feat): Reactivate graph reuse only for main model path commit 0127c6beeb384ec3abbc18b22dbe830f22fcf4b4 Author: samuel <[email protected]> Date: Sat Oct 11 22:20:54 2025 -0300 mtp-batch(chore): Remove final MTP debug logs and dead code commit 4bcc9e261ef57ee4cfaa65d06bcd0fcdeacf7797 Author: samuel <[email protected]> Date: Sat Oct 11 18:51:22 2025 -0300 mtp-batch(fix): Correctly advance cache head and add MTP documentation commit b4cbe030ac25056717763b812d1dd89681c08522 Author: samuel <[email protected]> Date: Sat Oct 11 18:37:40 2025 -0300 mtp-batch(chore): Fix logit flags for speculative sampling and remove debug logs commit a99709d0c1401d0b447dce1bd0101fb56390f50e Author: samuel <[email protected]> Date: Fri Oct 10 17:24:34 2025 -0300 mtp-batch(refactor): Extract decode context and MTP input logic into helper methods commit 913af8f48d2dab1d9e907cf6c48c921a229a295c Author: samuel <[email protected]> Date: Fri Oct 10 16:44:28 2025 -0300 mtp-batch(refactor): Replace MTP boolean flags with an explicit operation enum commit 6f74ba38070d62d37bc0fb71ce9871e1a4ffabcc Author: samuel <[email protected]> Date: Thu Oct 9 22:27:18 2025 -0300 mtp-batch (fix): prevent mtp draft from polluting the cache commit 5e1d719beffccf8c22784c24b52ff6f5ab56b9ff Author: samuel <[email protected]> Date: Thu Oct 9 15:21:23 2025 -0300 mtp-batch (feat): Create and manage sinfo for MTP commit febd8235d27fe9174ee4b54ea7a10e630939fee0 Author: samuel <[email protected]> Date: Sun Oct 5 14:43:40 2025 -0300 mtp-batch (wip): fix how to warmup kv cache for MTP commit 67c6c069e0a5496adfd7d8aa6ca7514db5a6f437 Author: samuel <[email protected]> Date: Sat Sep 27 19:42:32 2025 -0300 mtp-batch (wip): Isolate MTP graph to prevent host embedding buffer corruption commit 75dc25e6fe781c1b65038d69390fb778d760e3a1 Author: samuel <[email protected]> Date: Sat Sep 27 17:17:00 2025 -0300 mtp-batch (wip): organize batch for mtp cache commit 3da7e7f3309dbb576538850c92c1cbf8fdc6d6ee Author: samuel <[email protected]> Date: Tue Sep 23 22:45:11 2025 -0300 mtp-batch (fix): warm mtp cache for small batch size commit df64508b937784112168aa099644b60fef015f05 Author: samuel <[email protected]> Date: Sun Sep 21 21:55:41 2025 -0300 mtp-batch (wip): merge glm graphs commit 042eb8a829876ed175320df9c8133bcea0c40460 Author: samuel <[email protected]> Date: Sun Sep 21 21:29:00 2025 -0300 mtp-batch (wip): merge mtp and model graph commit 1318b2de82716710b9853e07bd640443a5a025bb Author: samuel <[email protected]> Date: Sun Sep 14 10:22:59 2025 -0300 mtp-batch (wip): move mtp execution to batch format commit c6237c71ffd4485df1c35829c380b63e472fc5dd Merge: 9fab53e43 8742ce0e3 Author: Aaron Lee <[email protected]> Date: Sat Sep 13 02:57:01 2025 -0400 Merge pull request #1 from SamuelOliveirads/glm4-moe-mtp feat: implemented sampling for MTP commit 8742ce0e39823eeb101bb5b6099ff4ca7be10c6e Author: samuel <[email protected]> Date: Sat Sep 6 00:21:18 2025 -0300 feat: apply logits + greedy sampler commit 5a5bce85777041d841393b4396e28f8e3065bb10 Author: samuel <[email protected]> Date: Wed Sep 3 17:56:14 2025 -0300 fix: add sample acceptance commit 07670a22c63b1fa335d6ec1c4a1e4255a920848c Author: samuel <[email protected]> Date: Wed Sep 3 13:25:21 2025 -0300 feat: implemented sampling for MTP commit 9fab53e4388c20aef497efd82e86dcb99ca58064 Author: Aaron Lee <[email protected]> Date: Tue Sep 2 17:14:09 2025 -0400 fixed mtp kv cache update step in cases where prompt size > n_batch and n_ubatch commit 98bc0c6bf223f425f4ecea14f13fc46101f1b44a Author: Aaron Lee <[email protected]> Date: Tue Aug 26 01:26:51 2025 -0400 replace standard sampler with greedy sampler for mtp draft commit 471e026327cca9f6f58aeefe32129a6cb9390f4f Author: Aaron Lee <[email protected]> Date: Tue Aug 19 23:10:56 2025 -0400 fixed vram leak commit d72f9d5691054958cd1b139f228e5e588d3974cf Author: Aaron Lee <[email protected]> Date: Tue Aug 19 01:50:34 2025 -0400 kludge-y kv cache management of mtp layer commit 382135aa3619294ab8bf87b0de4b1255ab7942f0 Author: Aaron Lee <[email protected]> Date: Sun Aug 17 21:54:45 2025 -0400 fixed mtp kv cache update sequencing after prompt processing commit 6870f9790c1bb1d0254241267b1a6c8a7fc82830 Author: Aaron Lee <[email protected]> Date: Sun Aug 17 04:59:36 2025 -0400 added proper KV cache management for MTP layers and slightly refactored commit 6e9bafc7a738b4c99f9440c0ec461e08cf6ce702 Author: Aaron Lee <[email protected]> Date: Fri Aug 15 23:13:56 2025 -0400 failed attempt to implement MTP; outputs tokens but KV cache management is unreasonable commit cf0f7c0448c2c1736588673114558e5829db7879 Author: Aaron Lee <[email protected]> Date: Wed Aug 13 02:21:17 2025 -0400 broad thrust of the mtp implementation commit 03231da69eec20677e25e2307d4fe31ac2ede034 Author: Aaron Lee <[email protected]> Date: Tue Aug 12 01:03:59 2025 -0400 add model member function to build mtp graph, to be called from speculative.cpp commit 1f477b375504aa557ed21066aa6783b11781a179 Author: Aaron Lee <[email protected]> Date: Mon Aug 11 20:54:45 2025 -0400 make nextn weights loadable without a crash commit e434f87cc739a1901931d88e33f777170a4e18e7 Author: Aaron Lee <[email protected]> Date: Mon Aug 11 01:21:47 2025 -0400 some work towards building mtp layer graph commit db60623e7926fb151b3cc63f029929122cac342a Author: Aaron Lee <[email protected]> Date: Sun Aug 10 23:52:54 2025 -0400 added getter for nextn layer count and server slot has_mtp property

Remove unused structs

ggerganov added 10 commits October 28, 2025 13:02

metal : rework mat-mat multiplication

83a7499

metal : initial Metal4 support

5e09948

cont

4d1783a

metal : detect tensor support

ac4d564

cont : better ifdefs

b99e72d

metal : support tensors in mul_mm_id

49c1ac0

metal : add env for disabling tensor API

e6373fc

tests : restore

2b6e352

metal : remove unused constants

7984d57

metal : fix check for bfloat tensor support

f2927f4

DajanaV force-pushed the main branch 4 times, most recently from 1983956 to 326a60a Compare October 29, 2025 12:13

DajanaV added the dev-stale Stale dev environment — dashboard not accessible label Oct 30, 2025

DajanaV deleted the branch main October 30, 2025 15:25

DajanaV closed this Oct 30, 2025

DajanaV deleted the upstream-PR16634-branch_ggml-org-gg/metal-mul-mm-rework branch October 30, 2025 15:26

DajanaV mentioned this pull request Nov 18, 2025

UPSTREAM PR #17342: Throughput improvement for small batch sizes #248

Open

loci-dev pushed a commit that referenced this pull request Nov 30, 2025

Merge pull request #1 from bluebread/sf/deepseek-ocr

578c8d7

mtmd: fix vision model processing

loci-dev pushed a commit that referenced this pull request Dec 11, 2025

Merge pull request #1 from JohannesGaessler/gpu-sampling-hip

56720f8

HIP/MUSA: fix build for backend sampling

loci-review bot mentioned this pull request Jan 21, 2026

UPSTREAM PR #18957: common, server : use the same User-Agent by default #978

Open

loci-review bot mentioned this pull request Feb 5, 2026

UPSTREAM PR #18675: Autoparser - complete refactoring of parser architecture #845

Open

loci-dev pushed a commit that referenced this pull request Mar 4, 2026

Merge pull request #1 from wine99/remove_static_variables

56e89f8

Remove unused structs

loci-dev mentioned this pull request Mar 21, 2026

UPSTREAM PR #17342: Throughput improvement for small batch sizes #1279

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #16634: metal : initial Metal4 tensor API support#1

UPSTREAM PR #16634: metal : initial Metal4 tensor API support#1
DajanaV wants to merge 10 commits intomainfrom
upstream-PR16634-branch_ggml-org-gg/metal-mul-mm-rework

DajanaV commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DajanaV commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants