Skip to content

UPSTREAM PR #18228: server: add auto-sleep after N seconds of idle#640

Closed
loci-dev wants to merge 11 commits intomainfrom
upstream-PR18228-branch_ngxson-xsn/server_sleep
Closed

UPSTREAM PR #18228: server: add auto-sleep after N seconds of idle#640
loci-dev wants to merge 11 commits intomainfrom
upstream-PR18228-branch_ngxson-xsn/server_sleep

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18228

Sleeping on Idle

The server supports an automatic sleep mode that activates after a specified period of inactivity (no incoming tasks). This feature, introduced in PR #18228, can be enabled using the --sleep-idle-seconds command-line argument. It works seamlessly in both single-model and multi-model configurations.

When the server enters sleep mode, the model and its associated memory (including the KV cache) are unloaded from RAM to conserve resources. Any new incoming task will automatically trigger the model to reload.

Note that the following endpoints are exempt from being considered as incoming tasks. They do not trigger model reloading and do not reset the idle timer:

  • GET /health
  • GET /props

Implementation

The implementation of this feature consists of 3 main parts:

  • server_queue sleeping state
  • server_context sleeping state
  • server_res_generator hook

The main loop inside server_queue acts as a watchdog timer (so we can avoid spawning a dedicated thread just for the watchdog). Upon timing condition passed, it signals to server_context to unload the model.

server_res_generator hooks on any incoming request, and will ask the server_queue to resume if it is in sleeping state. Note that some requests like /health bypass this check (they can only access read-only data of server_context)

Upon requested to resume, server_queue signals server_context to reload models, then unblock server_res_generator to proceed with the rest of the request.

@loci-dev loci-dev force-pushed the main branch 12 times, most recently from 26a6f0f to cf53bc9 Compare December 22, 2025 14:09
@DajanaV DajanaV closed this Dec 22, 2025
@DajanaV DajanaV deleted the upstream-PR18228-branch_ngxson-xsn/server_sleep branch December 22, 2025 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants