UPSTREAM PR #18228: server: add auto-sleep after N seconds of idle by loci-dev · Pull Request #640 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-20T15:35:12Z

Sleeping on Idle

The server supports an automatic sleep mode that activates after a specified period of inactivity (no incoming tasks). This feature, introduced in PR #18228, can be enabled using the --sleep-idle-seconds command-line argument. It works seamlessly in both single-model and multi-model configurations.

When the server enters sleep mode, the model and its associated memory (including the KV cache) are unloaded from RAM to conserve resources. Any new incoming task will automatically trigger the model to reload.

Note that the following endpoints are exempt from being considered as incoming tasks. They do not trigger model reloading and do not reset the idle timer:

GET /health
GET /props

Implementation

The implementation of this feature consists of 3 main parts:

server_queue sleeping state
server_context sleeping state
server_res_generator hook

The main loop inside server_queue acts as a watchdog timer (so we can avoid spawning a dedicated thread just for the watchdog). Upon timing condition passed, it signals to server_context to unload the model.

server_res_generator hooks on any incoming request, and will ask the server_queue to resume if it is in sleeping state. Note that some requests like /health bypass this check (they can only access read-only data of server_context)

Upon requested to resume, server_queue signals server_context to reload models, then unblock server_res_generator to proceed with the rest of the request.

ngxson added 5 commits December 20, 2025 15:21

implement sleeping at queue level

e1d7b43

implement server-context suspend

197e578

add test

db3b78d

add docs

aea8f8c

optimization: add fast path

44a5a26

loci-dev had a problem deploying to PROD__AL_DEMO December 20, 2025 15:35 — with GitHub Actions Error

ngxson added 3 commits December 20, 2025 19:09

make sure to free llama_init

e6ab62c

nits

937b064

fix use-after-free

105e2f3

loci-dev had a problem deploying to PROD__AL_DEMO December 20, 2025 18:39 — with GitHub Actions Failure

ngxson added 3 commits December 20, 2025 20:02

allow /models to be accessed during sleeping, fix use-after-free

fd09f88

don't allow accessing /models during sleep, it is not thread-safe

0bb9bc4

fix data race on accessing props and model_meta

d850082

loci-dev had a problem deploying to PROD__AL_DEMO December 20, 2025 19:33 — with GitHub Actions Failure

loci-dev force-pushed the main branch 12 times, most recently from 26a6f0f to cf53bc9 Compare December 22, 2025 14:09

DajanaV closed this Dec 22, 2025

DajanaV deleted the upstream-PR18228-branch_ngxson-xsn/server_sleep branch December 22, 2025 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18228: server: add auto-sleep after N seconds of idle#640

UPSTREAM PR #18228: server: add auto-sleep after N seconds of idle#640
loci-dev wants to merge 11 commits intomainfrom
upstream-PR18228-branch_ngxson-xsn/server_sleep

loci-dev commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

loci-dev commented Dec 20, 2025

Sleeping on Idle

Implementation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants