-
Notifications
You must be signed in to change notification settings - Fork 117
proxy: add panic recovery to Process.ProxyRequest #363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Switching to use httputil.ReverseProxy in #342 introduced a possible panic if a client disconnects while streaming the body. Since llama-swap does not use http.Server the recover() is not automatically there. - introduce a recover() in Process.ProxyRequest to recover and log the event - add TestProcess_ReverseProxyPanicIsHandled to reproduce and test the fix fixes: #362
WalkthroughA defer-recover block is added to Changes
Sequence DiagramsequenceDiagram
participant Client
participant ProxyRequest
participant ReverseProxy
participant Recover
Client->>ProxyRequest: Request (with disconnect)
ProxyRequest->>ReverseProxy: ServeHTTP(w, r)
Note over ReverseProxy: Client disconnects during stream
ReverseProxy->>Recover: panic(http.ErrAbortHandler)
Recover->>Recover: Defer-recover catches panic
Note over Recover: Log panic, continue execution
Recover-->>Client: Graceful error response / nil
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested labels
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
proxy/process.go (1)
502-513: Constrain recovery to client aborts; add stack trace; keep timing log via deferCurrent code swallows all panics and logs at Info, which can hide real bugs. Recommend:
- Only treat http.ErrAbortHandler as a benign client disconnect.
- For other panics, log at Error with a stack trace (optionally re‑panic in dev).
- Move the request timing log into a defer so it runs even when a panic is recovered.
Apply:
--- a/proxy/process.go +++ b/proxy/process.go @@ import ( "context" "errors" "fmt" "net" "net/http" "net/http/httputil" "net/url" "os/exec" "strings" "sync" "sync/atomic" "syscall" "time" + "runtime/debug" @@ func (p *Process) ProxyRequest(w http.ResponseWriter, r *http.Request) { - requestBeginTime := time.Now() + requestBeginTime := time.Now() var startDuration time.Duration + // Always emit timing/logs, even on recovered panic. + defer func() { + totalTime := time.Since(requestBeginTime) + p.proxyLogger.Debugf("<%s> request %s - start: %v, total: %v", + p.ID, r.RequestURI, startDuration, totalTime) + }() @@ - // recover from http.ErrAbortHandler panics that can occur when the client - // disconnects before the response is sent + // Recover from http.ErrAbortHandler panics (client disconnect after headers). defer func() { - if r := recover(); r != nil { - if r == http.ErrAbortHandler { - p.proxyLogger.Infof("<%s> recovered from client disconnection during streaming", p.ID) - } else { - p.proxyLogger.Infof("<%s> recovered from panic: %v", p.ID, r) - } - } + if rec := recover(); rec != nil { + if rec == http.ErrAbortHandler { + // This is expected on client aborts; keep noise low. + p.proxyLogger.Debugf("<%s> client disconnected during streaming", p.ID) + return + } + // Unexpected panic: log with stack for triage. + p.proxyLogger.Errorf("<%s> recovered unexpected panic: %v\n%s", p.ID, rec, debug.Stack()) + // Optional: re‑panic in non‑prod builds to fail fast. + // panic(rec) + } }() @@ - totalTime := time.Since(requestBeginTime) - p.proxyLogger.Debugf("<%s> request %s - start: %v, total: %v", - p.ID, r.RequestURI, startDuration, totalTime) + // moved to defer aboveproxy/process_test.go (2)
498-545: Good reproduction; add a post-assertion to prove the process remains usableAfter the recovered panic, issue another ProxyRequest and assert 200/Ready to catch latent state corruption.
Example:
@@ func TestProcess_ReverseProxyPanicIsHandled(t *testing.T) { - assert.Equal(t, StateReady, process.CurrentState()) + assert.Equal(t, StateReady, process.CurrentState()) + + // Ensure subsequent requests still work + okReq := httptest.NewRequest("GET", "/test", nil) + okW := httptest.NewRecorder() + process.ProxyRequest(okW, okReq) + assert.Equal(t, http.StatusOK, okW.Code)
498-545: Optional: add a companion test for unexpected panicsIf you adopt constrained recovery (only swallow http.ErrAbortHandler), add a test that panics with a different sentinel and verifies it doesn't get swallowed (the test’s defer/recover should catch it).
Sketch:
func TestProcess_UnexpectedPanicBubbles(t *testing.T) { defer func() { if r := recover(); r == nil { t.Fatalf("expected unexpected panic to bubble") } }() process := NewProcess("panic-test", 5, getTestSimpleResponderConfig("x"), debugLogger, debugLogger) defer process.Stop() _ = process.start() w := &panicOnWriteResponseWriter{ResponseRecorder: httptest.NewRecorder(), shouldPanic: true} // flip to custom sentinel w.headerWritten = true defer func() { w.shouldPanic = false }() origWrite := w.Write w.Write = func(b []byte) (int, error) { panic("boom") } req := httptest.NewRequest("GET", "/test", nil) process.ProxyRequest(w, req) }Also applies to: 547-567
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
proxy/process.go(1 hunks)proxy/process_test.go(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
Fix all staticcheck-reported issues in Go code
Files:
proxy/process_test.goproxy/process.go
🧬 Code graph analysis (1)
proxy/process_test.go (1)
proxy/process.go (2)
NewProcess(82-126)StateReady(27-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run-tests
- GitHub Check: run-tests
| // TestProcess_ReverseProxyPanicIsHandled tests that panics from | ||
| // httputil.ReverseProxy in Process.ProxyRequest(w, r) do not bubble up and are | ||
| // handled appropriately. | ||
| // | ||
| // httputil.ReverseProxy will panic with http.ErrAbortHandler when it has sent headers | ||
| // can't copy the body. This can be caused by a client disconnecting before the full | ||
| // response is sent from some reason. | ||
| // | ||
| // bug: https://github.com/mostlygeek/llama-swap/issues/362 | ||
| // see: https://github.com/golang/go/issues/23643 (where panic was added to httputil.ReverseProxy) | ||
| func TestProcess_ReverseProxyPanicIsHandled(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Confirm linked deadlock fix (ProcessGroup defer Unlock) is present
PR objectives mention deferring pg.Unlock() in ProcessGroup.ProxyRequest to avoid deadlocks on panic. This file doesn’t show that change. Please verify it exists.
Run:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Locate ProcessGroup.ProxyRequest and lock/unlock usage..."
# Show candidate files
fd -t f -e go | xargs -I{} rg -nH -C3 'type\s+ProcessGroup\b|ProxyRequest\s*\(' {} || true
echo "Check for defer Unlock in ProcessGroup.ProxyRequest:"
rg -nP --type go '(?s)func\s*\(\s*\*?ProcessGroup\s*\)\s*ProxyRequest\s*\(.*\)\s*\{.*' -n | cut -d: -f1 | sort -u | while read -r f; do
echo "== $f ==";
rg -n -C3 'func\s*\(\s*\*?ProcessGroup\s*\)\s*ProxyRequest\s*\(' "$f";
rg -n -C2 'Lock\(|Unlock\(|defer\s+.*Unlock\(' "$f";
doneLength of output: 11505
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Full ProcessGroup.ProxyRequest method ==="
sed -n '57,85p' proxy/processgroup.goLength of output: 796
Add defer pg.Unlock() immediately after pg.Lock() in ProcessGroup.ProxyRequest to prevent deadlock on panic
The ProxyRequest call at line 73 in proxy/processgroup.go can panic (as mentioned in the test comment). Currently, the unlock is deferred only via direct calls (lines 80, 82), which will not execute if a panic occurs during the ProxyRequest call. This leaves the mutex locked indefinitely, causing a deadlock. Add defer pg.Unlock() right after the pg.Lock() call on line 63.
🤖 Prompt for AI Agents
In proxy/processgroup.go around lines 63 to 82, the function locks the
ProcessGroup with pg.Lock() but only calls pg.Unlock() in code paths that won't
run if ProxyRequest panics, causing a deadlock; add defer pg.Unlock()
immediately after pg.Lock() (line 63) so the mutex is always released on panic,
and remove or omit the later explicit pg.Unlock() calls in that function to
avoid double-unlocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK because Process.ProxyRequest(w,r) will recover() from any panics. The condition that created the original deadlock should not be able to happen anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mostlygeek, understood! You're absolutely right—with the recover() now in place within Process.ProxyRequest, any panic (including http.ErrAbortHandler from httputil.ReverseProxy) will be caught there and won't propagate up to ProcessGroup.ProxyRequest. This means the mutex in ProcessGroup.ProxyRequest will be released normally via the existing pg.Unlock() calls, preventing the deadlock scenario described in issue #362.
Thank you for the clarification!
Switching to use httputil.ReverseProxy in #342 introduced a possible panic if a client disconnects while streaming the body. Since llama-swap does not use http.Server the recover() is not automatically there.
fixes: #362
Summary by CodeRabbit
Bug Fixes
Tests