Skip to content

server: reset counter related to kill-switch on client error#20513

Merged
ggerganov merged 3 commits intoggml-org:masterfrom
SoftwareRenderer:master
Mar 13, 2026
Merged

server: reset counter related to kill-switch on client error#20513
ggerganov merged 3 commits intoggml-org:masterfrom
SoftwareRenderer:master

Conversation

@SoftwareRenderer
Copy link
Contributor

This avoids inadvertently triggering a server kill switch.

If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated.

However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 4 such messages in a row, the server terminates.

This change resets the counter when the client's request exceeds --ctx-size. While I was looking at this, neighboring code had similar patterns relating to the same ERROR_TYPE_EXCEED_CONTEXT_SIZE, however I'm not familiar with those other conditions.

Steps to reproduce issue:

  1. Start llama-server with a tiny context e.g. --ctx-size 10
  2. Send text that exceeds the context size 4 times in a row

This avoids triggering a server kill switch.

If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated.

However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 3 such messages in a row, the server terminates.
@SoftwareRenderer SoftwareRenderer changed the title server: reset kill-switch on client error server: reset counter related to kill-switch on client error Mar 13, 2026
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to move this to launch_slot_with_task() - right before returning true.

@SoftwareRenderer
Copy link
Contributor Author

It would be better to move this to launch_slot_with_task() - right before returning true.

Done. I was worried that it might interfere with the debugging efforts in #20277 .

After looking at that issue closer, if I'm understanding it correctly: the loop occurs within the same task and slot. So resetting the counter won't impact that because the reset occurs before a new task/slot.

@ggerganov ggerganov merged commit d7ba99c into ggml-org:master Mar 13, 2026
1 check passed
Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026
…g#20513)

* server: reset kill-switch on client error

This avoids triggering a server kill switch.

If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated.

However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 3 such messages in a row, the server terminates.

* moved counter reset as per recommendation

* cont : minor

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants