UPSTREAM PR #16490: graph : reuse SSM graphs by DajanaV · Pull Request #133 · auroralabs-loci/llama.cpp

DajanaV · 2025-11-08T12:45:32Z

Not sure if there is a reason not to enable graph reuse for recurrent graphs (mamba, hybrids, SSM, etc.). Did a few tests and seems to work, resulting in some modest perf improvements. cc @gabe-l-hart @compilade

Without graph reuse

make -j && LLAMA_GRAPH_REUSE_DISABLE=1 ./bin/llama-bench -m ../models/mamba-130m/ggml-model-f16.gguf -m ../models/granite-4-h-tiny/ggml-model-q8_0.gguf -m ../models/ai21-jamba-mini-1.7/ggml-model-q8_0.gguf -m ../models/liquidai-lfm2-2.6b/ggml-model-q4_k.gguf -fa 1 -t 1 -n 32

model	size	params	backend	ngl	threads	fa	test	t/s
mamba 0.1B F16	256.96 MiB	129.14 M	Metal	99	1	1	pp512	8415.73 ± 46.47
mamba 0.1B F16	256.96 MiB	129.14 M	Metal	99	1	1	tg32	322.74 ± 0.64
granitehybrid ?B Q8_0	6.88 GiB	6.94 B	Metal	99	1	1	pp512	2119.36 ± 3.31
granitehybrid ?B Q8_0	6.88 GiB	6.94 B	Metal	99	1	1	tg32	77.17 ± 0.11
jamba ?B Q8_0	51.05 GiB	51.57 B	Metal	99	1	1	pp512	603.47 ± 1.83
jamba ?B Q8_0	51.05 GiB	51.57 B	Metal	99	1	1	tg32	42.35 ± 0.02
lfm2 2.6B Q4_K - Medium	1.45 GiB	2.57 B	Metal	99	1	1	pp512	2923.41 ± 3.20
lfm2 2.6B Q4_K - Medium	1.45 GiB	2.57 B	Metal	99	1	1	tg32	169.83 ± 0.67
build: `638e2c2` (6725)

With graph reuse

make -j && ./bin/llama-bench -m ../models/mamba-130m/ggml-model-f16.gguf -m ../models/granite-4-h-tiny/ggml-model-q8_0.gguf -m ../models/ai21-jamba-mini-1.7/ggml-model-q8_0.gguf -m ../models/liquidai-lfm2-2.6b/ggml-model-q4_k.gguf -fa 1 -t 1 -n 32

model	size	params	backend	ngl	threads	fa	test	t/s
mamba 0.1B F16	256.96 MiB	129.14 M	Metal	99	1	1	pp512	8453.65 ± 20.10
mamba 0.1B F16	256.96 MiB	129.14 M	Metal	99	1	1	tg32	348.83 ± 1.67
granitehybrid ?B Q8_0	6.88 GiB	6.94 B	Metal	99	1	1	pp512	2126.12 ± 1.90
granitehybrid ?B Q8_0	6.88 GiB	6.94 B	Metal	99	1	1	tg32	82.26 ± 0.13
jamba ?B Q8_0	51.05 GiB	51.57 B	Metal	99	1	1	pp512	604.56 ± 2.08
jamba ?B Q8_0	51.05 GiB	51.57 B	Metal	99	1	1	tg32	43.22 ± 0.02
lfm2 2.6B Q4_K - Medium	1.45 GiB	2.57 B	Metal	99	1	1	pp512	2928.31 ± 1.78
lfm2 2.6B Q4_K - Medium	1.45 GiB	2.57 B	Metal	99	1	1	tg32	179.18 ± 0.47
build: `638e2c2` (6725)

This reverts commit 00f115f.

ggerganov added 5 commits November 8, 2025 13:53

graph : reuse hybrid graphs

22fd5bd

graph : reuse recurrent graphs

cc23af9

graph : fix reuse check for recurrent inputs

b1cf2eb

memory : move the recurrent state into the memory context

b1865c9

Revert "memory : move the recurrent state into the memory context"

df46214

This reverts commit 00f115f.

DajanaV had a problem deploying to PROD__AL_DEMO November 8, 2025 12:45 — with GitHub Actions Failure

DajanaV force-pushed the main branch 24 times, most recently from 98e1e20 to 2791104 Compare November 11, 2025 09:10

DajanaV force-pushed the main branch 19 times, most recently from 24733fb to 4b4bb7c Compare November 13, 2025 12:15

DajanaV closed this Nov 13, 2025

loci-review bot mentioned this pull request Jan 24, 2026

UPSTREAM PR #18471: Add self‑speculative decoding (no draft model required) #750

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #16490: graph : reuse SSM graphs#133

UPSTREAM PR #16490: graph : reuse SSM graphs#133
DajanaV wants to merge 5 commits intomainfrom
upstream-PR16490-branch_ggml-org-gg/graph-mamba-reuse

DajanaV commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DajanaV commented Nov 8, 2025

Without graph reuse

With graph reuse

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants