Skip to content

Server: Handle context shift better to reduce prompt processing time#973

Merged
ikawrakow merged 2 commits intomainfrom
fcp/context_shift_fix
Nov 19, 2025
Merged

Server: Handle context shift better to reduce prompt processing time#973
ikawrakow merged 2 commits intomainfrom
fcp/context_shift_fix

Conversation

@firecoperana
Copy link
Copy Markdown
Collaborator

@firecoperana firecoperana commented Nov 17, 2025

This PR brings back context shift that was removed in #954, which has also been removed in mainline. Fix #960
Add --context-shift arg: (auto|on|off|0|1) to enable or disable the context shift feature, default is on.
Keep a sliding window of the prompt when prompt is longer than the context size and shift the kv cache accordingly to avoid reprocessing the whole prompt.
Make the context shift compatible with prompt caching feature. When printing the cache using, add discarded tokens count to track the total prompt processed.

Add context-shift args

Add back ga_n in context shift
@ikawrakow
Copy link
Copy Markdown
Owner

Are you ready to merge, or still making changes?

@firecoperana
Copy link
Copy Markdown
Collaborator Author

Ready to merge if you think it's good.

@ikawrakow ikawrakow merged commit 2cbfd04 into main Nov 19, 2025
@fernandaspets
Copy link
Copy Markdown

ah amazing thanks!

@firecoperana firecoperana deleted the fcp/context_shift_fix branch December 12, 2025 16:35
CamNoob pushed a commit to CamNoob/ik_llama.cpp that referenced this pull request Feb 27, 2026
The bug was likely introduced in PR ikawrakow#973 when the similarity calculation
was changed from LCP to token-level similarity, but sim_best was still
initialized to 0 instead of -1.0f.

When slot_prompt_similarity threshold was set high (e.g., 0.8) and no slot
met the threshold, sim_best stayed at 0, causing ret to remain nullptr.
This led to the system getting stuck without selecting any slot.

This fix:
- Changed sim_best initialization from 0 to -1.0f
- Added best_slot variable to track the best slot found during similarity search
- Only set ret = best_slot after the loop completes
- Removed redundant ret == nullptr check

This ensures that even when no slot meets the slot_prompt_similarity threshold,
the system still identifies the best available slot and falls back to LRU correctly.

Related: PR ikawrakow#973 (Server: Handle context shift better), PR ikawrakow#1285 (Fix slot prompt updating)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Context-Shifting fails to trigger, once context is exceeded

3 participants