server: reset counter related to kill-switch on client error by SoftwareRenderer · Pull Request #20513 · ggml-org/llama.cpp

SoftwareRenderer · 2026-03-13T13:35:47Z

This avoids inadvertently triggering a server kill switch.

If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated.

However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 4 such messages in a row, the server terminates.

This change resets the counter when the client's request exceeds --ctx-size. While I was looking at this, neighboring code had similar patterns relating to the same ERROR_TYPE_EXCEED_CONTEXT_SIZE, however I'm not familiar with those other conditions.

Steps to reproduce issue:

Start llama-server with a tiny context e.g. --ctx-size 10
Send text that exceeds the context size 4 times in a row

This avoids triggering a server kill switch. If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated. However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 3 such messages in a row, the server terminates.

ggerganov

It would be better to move this to launch_slot_with_task() - right before returning true.

SoftwareRenderer · 2026-03-13T17:37:03Z

It would be better to move this to launch_slot_with_task() - right before returning true.

Done. I was worried that it might interfere with the debugging efforts in #20277 .

After looking at that issue closer, if I'm understanding it correctly: the loop occurs within the same task and slot. So resetting the counter won't impact that because the reset occurs before a new task/slot.

…g#20513) * server: reset kill-switch on client error This avoids triggering a server kill switch. If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated. However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 3 such messages in a row, the server terminates. * moved counter reset as per recommendation * cont : minor --------- Co-authored-by: Georgi Gerganov <[email protected]>

SoftwareRenderer requested review from ggerganov and ngxson as code owners March 13, 2026 13:35

SoftwareRenderer changed the title ~~server: reset kill-switch on client error~~ server: reset counter related to kill-switch on client error Mar 13, 2026

github-actions bot added examples server labels Mar 13, 2026

ggerganov reviewed Mar 13, 2026

View reviewed changes

moved counter reset as per recommendation

a3050a5

cont : minor

1794a7b

ggerganov approved these changes Mar 13, 2026

View reviewed changes

ggerganov merged commit d7ba99c into ggml-org:master Mar 13, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: reset counter related to kill-switch on client error#20513

server: reset counter related to kill-switch on client error#20513
ggerganov merged 3 commits intoggml-org:masterfrom
SoftwareRenderer:master

SoftwareRenderer commented Mar 13, 2026

Uh oh!

ggerganov left a comment

Uh oh!

SoftwareRenderer commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SoftwareRenderer commented Mar 13, 2026

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

SoftwareRenderer commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants