-
Notifications
You must be signed in to change notification settings - Fork 125
Description
Describe the bug
Repro:
- model A is swapped in and starts streaming
- we cancel it midway
- we attempt to call model B in the same process group
- deadlock
The issue seems to be in ProcessGroup.ProxyRequest:
- it takes pg.Lock()
- this line actually panics on cancel from the client:
pg.processes[modelID].ProxyRequest(writer, request)
The panic is triggered by p.reverseProxy.ServeHTTP(w, r) inside it, because the proxy doesn't have any other way to report the error, and has to generate a panic.
- the stack unwinds on panic and our pg.Lock is never released
- next calls to ProxyRequest deadlock because they try to lock a mutex which was never released
The fix seems to be pretty simple, use defer:
if pg.lastUsedProcess != modelID {
// ensure unlock even if ProxyRequest panics
defer pg.Unlock() // <-- add here
// is there something already running?
if pg.lastUsedProcess != "" {
pg.processes[pg.lastUsedProcess].Stop()
}
// wait for the request to the new model to be fully handled
// and prevent race conditions see issue #277
pg.processes[modelID].ProxyRequest(writer, request)
pg.lastUsedProcess = modelID
// short circuit and exit
// pg.Unlock() <-- remove this
return nil
}
I suspect the problem was introduced by the recent switch to the standard proxy which has this behavior of panicking when the request is cancelled. In Go's source code (net/http/httputil/reverseproxy.go):
// Since we're streaming the response, if we run into an error all we can do
// is abort the request. Issue 23643: ReverseProxy should use ErrAbortHandler
// on read error while copying body.
if !shouldPanicOnCopyError(req) {
p.logf("suppressing panic for copyResponse error in test; copy error: %v", err)
return
}
panic(http.ErrAbortHandler)
After the quick fix above, the issue disappeared on my server.
Expected behaviour
Llama-swap doesn't hang.
Operating system and version
Ubuntu 24.04.3 LTS
My Configuration
It's pretty large, the relevant bit is this:
groups:
"main":
swap: true
exclusive: false
members:
- "qwen3-32b-q4-32k"
- "qwen2.5-vl-32b-q4"Upstream Logs
When the deadlock happens, nothing appears in the logs.