Skip to content

server : support multi-modal context checkpoints and prompt caching#1398

Merged
ikawrakow merged 2 commits intomainfrom
fcp/mtmd_cache
Mar 13, 2026
Merged

server : support multi-modal context checkpoints and prompt caching#1398
ikawrakow merged 2 commits intomainfrom
fcp/mtmd_cache

Conversation

@firecoperana
Copy link
Copy Markdown
Collaborator

Fix #1383

Previously server tokens cannot be copied if they have image in the cache. This PR ports the support from mainline to copy the server cache. It enables the the checkpoint and prompt cache for recurrent multi-modal models.
Context shift also works for mtmd if the model itself is supported, but unfortunately qwen3 VL is not supported, which is also the case in mainline. Recurrently model is not supported as well.
Loosing the criteria to do slot save and recovery. If the model is loaded with mmproj file, but with no image processed, the slot can still be saved and restored, which should work fine for very long system prompt.
Increase checkpoint to 32 and other small bug fixes.

@MrHills-rs
Copy link
Copy Markdown

Would it be impossible to add slot save and recovery with images? Long conversations can often contain images, especially in agentic use cases / web search. We can remove them before save, but the model would lose potentially important contextual information.

@firecoperana
Copy link
Copy Markdown
Collaborator Author

Let's leave it to the future PR. There is no existing function to do that now.

firecoperana added 2 commits March 12, 2026 13:06
do not create checkpoint right after image processing

improve mtmd check for slot ops

fix context shift

do not abort if template parse failed
@ikawrakow ikawrakow merged commit 433531d into main Mar 13, 2026
Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Mar 15, 2026
Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Mar 16, 2026
Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Mar 16, 2026
Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Mar 16, 2026
Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Mar 17, 2026
Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Mar 17, 2026
Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Mar 17, 2026
Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Mar 18, 2026
Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Mar 24, 2026
@firecoperana firecoperana deleted the fcp/mtmd_cache branch March 27, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Qwen 3.5 context cache issue.

3 participants