Skip to content

Fix KV cache for split mode graph with layers left on CPU#1506

Merged
ikawrakow merged 1 commit intomainfrom
ik/sm_graph_partial_offload
Mar 25, 2026
Merged

Fix KV cache for split mode graph with layers left on CPU#1506
ikawrakow merged 1 commit intomainfrom
ik/sm_graph_partial_offload

Conversation

@ikawrakow
Copy link
Copy Markdown
Owner

When using split mode graph and attention tensors in some layers are left on the CPU, on the main branch the KV cache for these layers is left uninitialized, which leads to a crash during compute graph construction. The PR fixes the bug.

I had never run a dense model with split mode graph and not all layers offloaded to the GPU. Came across this bug while testing the auto-fit functionality from #1504 with a dense model that does not fit in VRAM.

@ikawrakow ikawrakow merged commit dd75fd0 into main Mar 25, 2026
@magikRUKKOLA
Copy link
Copy Markdown

Unable to run smol-IQ2_KS GLM5 full gpu offload after this pull.

/opt/ubergarm/GLM-5-GGUF/smol-IQ2_KS/run-ik_llama.cpp.sh
(gdb) bt full
#0  0x00007ffff7d97918 in llm_build_context::build_deepseek2() () from /opt/ik_llama.cpp/ik_llama.cpp/build/src/libllama.so
No symbol table info available.
#1  0x00007ffff7db17c0 in llm_build_context::llama_build_graph(llama_context&, llama_batch const&, bool) ()
   from /opt/ik_llama.cpp/ik_llama.cpp/build/src/libllama.so
No symbol table info available.
#2  0x00007ffff7ca51d5 in llama_init_from_model () from /opt/ik_llama.cpp/ik_llama.cpp/build/src/libllama.so
No symbol table info available.
#3  0x00005555555c1903 in main ()
No symbol table info available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants