Skip to content

Conversation

@mostlygeek
Copy link
Owner

@mostlygeek mostlygeek commented Oct 26, 2025

Switching to use httputil.ReverseProxy in #342 introduced a possible panic if a client disconnects while streaming the body. Since llama-swap does not use http.Server the recover() is not automatically there.

  • introduce a recover() in Process.ProxyRequest to recover and log the event
  • add TestProcess_ReverseProxyPanicIsHandled to reproduce and test the fix

fixes: #362

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced service stability with improved panic recovery during streaming operations, preventing crashes when clients disconnect unexpectedly and ensuring continuous operation.
  • Tests

    • Added comprehensive test coverage to validate panic recovery mechanisms during streaming operations and confirm service resilience under disconnection scenarios.

Switching to use httputil.ReverseProxy in #342 introduced a possible
panic if a client disconnects while streaming the body. Since llama-swap
does not use http.Server the recover() is not automatically there.

- introduce a recover() in Process.ProxyRequest to recover and log the
  event
- add TestProcess_ReverseProxyPanicIsHandled to reproduce and test the
  fix

fixes: #362
@coderabbitai
Copy link

coderabbitai bot commented Oct 26, 2025

Walkthrough

A defer-recover block is added to Process.ProxyRequest to handle panics from httputil.ReverseProxy during client disconnects, specifically catching http.ErrAbortHandler. A test validates that panics are properly recovered and do not propagate.

Changes

Cohort / File(s) Summary
Panic recovery in reverse proxy handling
proxy/process.go
Adds a defer-recover block in Process.ProxyRequest to catch panics thrown by httputil.ReverseProxy (e.g., http.ErrAbortHandler on client disconnect). Specialized logging for aborts; generic logging for other panics.
Test coverage for panic recovery
proxy/process_test.go
Adds TestProcess_ReverseProxyPanicIsHandled test that verifies panics from the reverse proxy are recovered and do not crash the process. Includes a helper panicOnWriteResponseWriter that simulates client disconnect by panicking on write.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ProxyRequest
    participant ReverseProxy
    participant Recover

    Client->>ProxyRequest: Request (with disconnect)
    ProxyRequest->>ReverseProxy: ServeHTTP(w, r)
    Note over ReverseProxy: Client disconnects during stream
    ReverseProxy->>Recover: panic(http.ErrAbortHandler)
    Recover->>Recover: Defer-recover catches panic
    Note over Recover: Log panic, continue execution
    Recover-->>Client: Graceful error response / nil
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Key areas requiring attention:
    • Verify defer placement correctly handles all code paths and timing relative to start logic and reverse proxy invocation
    • Confirm panic logging distinguishes between http.ErrAbortHandler (client disconnect) and other panics appropriately
    • Validate test helper panicOnWriteResponseWriter accurately simulates the panic scenario and that the state assertion (Ready) is correct after panic recovery
    • Check that the recover does not mask other issues or allow invalid process state transitions

Possibly related PRs

Suggested labels

enhancement

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "proxy: add panic recovery to Process.ProxyRequest" directly and specifically describes the main change in the changeset. The title clearly indicates that panic recovery functionality is being added to the Process.ProxyRequest method, which matches the core implementation changes in proxy/process.go. The title is concise, specific, and avoids vague terminology, making it easy for reviewers scanning history to understand the primary objective.
Linked Issues Check ✅ Passed The pull request addresses the core issue from #362 by adding panic recovery to Process.ProxyRequest to handle panics from httputil.ReverseProxy when clients disconnect during streaming. The PR prevents the panic from escaping, which resolves the root cause that leads to deadlocks when locks are held at higher levels (like in ProcessGroup). The added test verifies that panics are recovered and the process remains in a valid state, ensuring the fix prevents the cascading failure described in issue #362.
Out of Scope Changes Check ✅ Passed All changes in the pull request are directly within scope of issue #362. The modifications to proxy/process.go add the necessary panic recovery mechanism to handle httputil.ReverseProxy panics, and the additions to proxy/process_test.go provide test coverage to verify the fix works correctly. No extraneous changes, refactorings, or unrelated modifications are present in the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch issue-362

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
proxy/process.go (1)

502-513: Constrain recovery to client aborts; add stack trace; keep timing log via defer

Current code swallows all panics and logs at Info, which can hide real bugs. Recommend:

  • Only treat http.ErrAbortHandler as a benign client disconnect.
  • For other panics, log at Error with a stack trace (optionally re‑panic in dev).
  • Move the request timing log into a defer so it runs even when a panic is recovered.

Apply:

--- a/proxy/process.go
+++ b/proxy/process.go
@@
 import (
 	"context"
 	"errors"
 	"fmt"
 	"net"
 	"net/http"
 	"net/http/httputil"
 	"net/url"
 	"os/exec"
 	"strings"
 	"sync"
 	"sync/atomic"
 	"syscall"
 	"time"
+	"runtime/debug"
@@
 func (p *Process) ProxyRequest(w http.ResponseWriter, r *http.Request) {
-	requestBeginTime := time.Now()
+	requestBeginTime := time.Now()
 	var startDuration time.Duration
+	// Always emit timing/logs, even on recovered panic.
+	defer func() {
+		totalTime := time.Since(requestBeginTime)
+		p.proxyLogger.Debugf("<%s> request %s - start: %v, total: %v",
+			p.ID, r.RequestURI, startDuration, totalTime)
+	}()
@@
-	// recover from http.ErrAbortHandler panics that can occur when the client
-	// disconnects before the response is sent
+	// Recover from http.ErrAbortHandler panics (client disconnect after headers).
 	defer func() {
-		if r := recover(); r != nil {
-			if r == http.ErrAbortHandler {
-				p.proxyLogger.Infof("<%s> recovered from client disconnection during streaming", p.ID)
-			} else {
-				p.proxyLogger.Infof("<%s> recovered from panic: %v", p.ID, r)
-			}
-		}
+		if rec := recover(); rec != nil {
+			if rec == http.ErrAbortHandler {
+				// This is expected on client aborts; keep noise low.
+				p.proxyLogger.Debugf("<%s> client disconnected during streaming", p.ID)
+				return
+			}
+			// Unexpected panic: log with stack for triage.
+			p.proxyLogger.Errorf("<%s> recovered unexpected panic: %v\n%s", p.ID, rec, debug.Stack())
+			// Optional: re‑panic in non‑prod builds to fail fast.
+			// panic(rec)
+		}
 	}()
@@
-	totalTime := time.Since(requestBeginTime)
-	p.proxyLogger.Debugf("<%s> request %s - start: %v, total: %v",
-		p.ID, r.RequestURI, startDuration, totalTime)
+	// moved to defer above
proxy/process_test.go (2)

498-545: Good reproduction; add a post-assertion to prove the process remains usable

After the recovered panic, issue another ProxyRequest and assert 200/Ready to catch latent state corruption.

Example:

@@ func TestProcess_ReverseProxyPanicIsHandled(t *testing.T) {
-	assert.Equal(t, StateReady, process.CurrentState())
+	assert.Equal(t, StateReady, process.CurrentState())
+
+	// Ensure subsequent requests still work
+	okReq := httptest.NewRequest("GET", "/test", nil)
+	okW := httptest.NewRecorder()
+	process.ProxyRequest(okW, okReq)
+	assert.Equal(t, http.StatusOK, okW.Code)

498-545: Optional: add a companion test for unexpected panics

If you adopt constrained recovery (only swallow http.ErrAbortHandler), add a test that panics with a different sentinel and verifies it doesn't get swallowed (the test’s defer/recover should catch it).

Sketch:

func TestProcess_UnexpectedPanicBubbles(t *testing.T) {
	defer func() {
		if r := recover(); r == nil {
			t.Fatalf("expected unexpected panic to bubble")
		}
	}()
	process := NewProcess("panic-test", 5, getTestSimpleResponderConfig("x"), debugLogger, debugLogger)
	defer process.Stop()
	_ = process.start()
	w := &panicOnWriteResponseWriter{ResponseRecorder: httptest.NewRecorder(), shouldPanic: true}
	// flip to custom sentinel
	w.headerWritten = true
	defer func() { w.shouldPanic = false }()
	origWrite := w.Write
	w.Write = func(b []byte) (int, error) { panic("boom") }
	req := httptest.NewRequest("GET", "/test", nil)
	process.ProxyRequest(w, req)
}

Also applies to: 547-567

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e250e71 and 026ab8d.

📒 Files selected for processing (2)
  • proxy/process.go (1 hunks)
  • proxy/process_test.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Fix all staticcheck-reported issues in Go code

Files:

  • proxy/process_test.go
  • proxy/process.go
🧬 Code graph analysis (1)
proxy/process_test.go (1)
proxy/process.go (2)
  • NewProcess (82-126)
  • StateReady (27-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: run-tests
  • GitHub Check: run-tests

Comment on lines +498 to +508
// TestProcess_ReverseProxyPanicIsHandled tests that panics from
// httputil.ReverseProxy in Process.ProxyRequest(w, r) do not bubble up and are
// handled appropriately.
//
// httputil.ReverseProxy will panic with http.ErrAbortHandler when it has sent headers
// can't copy the body. This can be caused by a client disconnecting before the full
// response is sent from some reason.
//
// bug: https://github.com/mostlygeek/llama-swap/issues/362
// see: https://github.com/golang/go/issues/23643 (where panic was added to httputil.ReverseProxy)
func TestProcess_ReverseProxyPanicIsHandled(t *testing.T) {
Copy link

@coderabbitai coderabbitai bot Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Confirm linked deadlock fix (ProcessGroup defer Unlock) is present

PR objectives mention deferring pg.Unlock() in ProcessGroup.ProxyRequest to avoid deadlocks on panic. This file doesn’t show that change. Please verify it exists.

Run:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "Locate ProcessGroup.ProxyRequest and lock/unlock usage..."
# Show candidate files
fd -t f -e go | xargs -I{} rg -nH -C3 'type\s+ProcessGroup\b|ProxyRequest\s*\(' {} || true
echo "Check for defer Unlock in ProcessGroup.ProxyRequest:"
rg -nP --type go '(?s)func\s*\(\s*\*?ProcessGroup\s*\)\s*ProxyRequest\s*\(.*\)\s*\{.*' -n | cut -d: -f1 | sort -u | while read -r f; do
  echo "== $f ==";
  rg -n -C3 'func\s*\(\s*\*?ProcessGroup\s*\)\s*ProxyRequest\s*\(' "$f";
  rg -n -C2 'Lock\(|Unlock\(|defer\s+.*Unlock\(' "$f";
done

Length of output: 11505


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "=== Full ProcessGroup.ProxyRequest method ==="
sed -n '57,85p' proxy/processgroup.go

Length of output: 796


Add defer pg.Unlock() immediately after pg.Lock() in ProcessGroup.ProxyRequest to prevent deadlock on panic

The ProxyRequest call at line 73 in proxy/processgroup.go can panic (as mentioned in the test comment). Currently, the unlock is deferred only via direct calls (lines 80, 82), which will not execute if a panic occurs during the ProxyRequest call. This leaves the mutex locked indefinitely, causing a deadlock. Add defer pg.Unlock() right after the pg.Lock() call on line 63.

🤖 Prompt for AI Agents
In proxy/processgroup.go around lines 63 to 82, the function locks the
ProcessGroup with pg.Lock() but only calls pg.Unlock() in code paths that won't
run if ProxyRequest panics, causing a deadlock; add defer pg.Unlock()
immediately after pg.Lock() (line 63) so the mutex is always released on panic,
and remove or omit the later explicit pg.Unlock() calls in that function to
avoid double-unlocking.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK because Process.ProxyRequest(w,r) will recover() from any panics. The condition that created the original deadlock should not be able to happen anymore.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mostlygeek, understood! You're absolutely right—with the recover() now in place within Process.ProxyRequest, any panic (including http.ErrAbortHandler from httputil.ReverseProxy) will be caught there and won't propagate up to ProcessGroup.ProxyRequest. This means the mutex in ProcessGroup.ProxyRequest will be released normally via the existing pg.Unlock() calls, preventing the deadlock scenario described in issue #362.

Thank you for the clarification!

@mostlygeek mostlygeek merged commit f852689 into main Oct 26, 2025
3 checks passed
@mostlygeek mostlygeek deleted the issue-362 branch October 26, 2025 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deadlock when cancelling a streaming response

2 participants