server : support multi-modal context checkpoints and prompt caching#1398
Merged
server : support multi-modal context checkpoints and prompt caching#1398
Conversation
0e90658 to
cb4a403
Compare
|
Would it be impossible to add slot save and recovery with images? Long conversations can often contain images, especially in agentic use cases / web search. We can remove them before save, but the model would lose potentially important contextual information. |
Collaborator
Author
|
Let's leave it to the future PR. There is no existing function to do that now. |
added 2 commits
March 12, 2026 13:06
do not create checkpoint right after image processing improve mtmd check for slot ops fix context shift do not abort if template parse failed
fa8893d to
26b685c
Compare
ikawrakow
approved these changes
Mar 13, 2026
Nexesenex
added a commit
to Nexesenex/ik_llama.cpp.nxs
that referenced
this pull request
Mar 15, 2026
…aching (ikawrakow#1398)" This reverts commit 433531d.
Nexesenex
added a commit
to Nexesenex/ik_llama.cpp.nxs
that referenced
this pull request
Mar 16, 2026
…aching (ikawrakow#1398)" This reverts commit 433531d.
Nexesenex
added a commit
to Nexesenex/ik_llama.cpp.nxs
that referenced
this pull request
Mar 16, 2026
…aching (ikawrakow#1398)" This reverts commit 433531d.
Nexesenex
added a commit
to Nexesenex/ik_llama.cpp.nxs
that referenced
this pull request
Mar 16, 2026
…aching (ikawrakow#1398)" This reverts commit 433531d.
Nexesenex
added a commit
to Nexesenex/ik_llama.cpp.nxs
that referenced
this pull request
Mar 17, 2026
…aching (ikawrakow#1398)" This reverts commit 433531d.
Nexesenex
added a commit
to Nexesenex/ik_llama.cpp.nxs
that referenced
this pull request
Mar 17, 2026
…aching (ikawrakow#1398)" This reverts commit 433531d.
Nexesenex
added a commit
to Nexesenex/ik_llama.cpp.nxs
that referenced
this pull request
Mar 17, 2026
…aching (ikawrakow#1398)" This reverts commit 433531d.
Nexesenex
added a commit
to Nexesenex/ik_llama.cpp.nxs
that referenced
this pull request
Mar 18, 2026
…aching (ikawrakow#1398)" This reverts commit 433531d.
Nexesenex
added a commit
to Nexesenex/ik_llama.cpp.nxs
that referenced
this pull request
Mar 24, 2026
…aching (ikawrakow#1398)" This reverts commit 433531d.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix #1383
Previously server tokens cannot be copied if they have image in the cache. This PR ports the support from mainline to copy the server cache. It enables the the checkpoint and prompt cache for recurrent multi-modal models.
Context shift also works for mtmd if the model itself is supported, but unfortunately qwen3 VL is not supported, which is also the case in mainline. Recurrently model is not supported as well.
Loosing the criteria to do slot save and recovery. If the model is loaded with mmproj file, but with no image processed, the slot can still be saved and restored, which should work fine for very long system prompt.
Increase checkpoint to 32 and other small bug fixes.