-
Notifications
You must be signed in to change notification settings - Fork 117
Fix race conditions in proxy.Process #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- TestProxy_UnloadAfterTTL - TestProcess_HTTPRequestsHaveTimeToFinish
- TestProcess_ShutdownInterruptsHealthCheck - TestProcess_ExitInterruptsHealthCheck - TestProcess_EnvironmentSetCorrectly
…ngHeader This race condition only appeared in the test and not part of the production code. The test was running the HTTP handler in a goroutine and then immediately reading from httptest.ResponseRecorder while the handler was still potentially writing to it. The fix was to remove the time.Sleep and use channels to signal progress
|
Caution Review failedThe pull request is closed. WalkthroughAdds concurrency safety to process lifecycle and request tracking: introduces mutexes, atomic in-flight counters, and accessor methods; guards command cancellation and wait channels; updates TTL unload, start/stop/wait flows, and ProxyRequest to use new synchronization. Also updates a test to avoid concurrent recorder access and enables race detector in Makefile. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Proxy as ProxyRequest
participant Proc as Process
participant TTL as TTL Unloader
Note over Proxy,Proc: Request tracking and TTL check use new sync primitives
Client->>Proxy: HTTP request
Proxy->>Proc: inFlightRequestsCount.Add(1)
Proxy->>Proc: Serve request
Proc-->>Proxy: Response
Proxy->>Proc: setLastRequestHandled(now)
Proxy->>Proc: inFlightRequestsCount.Add(-1)
TTL->>Proc: Periodic TTL check
alt inFlightRequestsCount > 0
TTL-->>TTL: Skip unload
else TTL expired and no inflight
TTL->>Proc: Initiate unload sequence
end
sequenceDiagram
autonumber
participant Controller
participant Proc as Process
participant Up as UpstreamCmd
Note over Proc: State transitions and cmd lifecycle guarded by mutexes
Controller->>Proc: swapState(StateStarting)
Proc->>Proc: atomic waitStarting increment
Proc->>Up: start command (access under cmdMutex)
alt start succeeds
Proc->>Proc: forceState(StateRunning)
else start fails
Proc->>Proc: forceState(StateStopped)
end
Controller->>Proc: stopCommand()
Proc->>Proc: read/cancel upstream and close cmdWaitChan (under cmdMutex)
Proc->>Up: cancel if non-nil
Proc->>Proc: forceState(StateStopped/StateShutdown)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
proxy/process.go (2)
314-337: TTL unload loop has two issues: potential leak and premature exit; also zero-time triggers immediate unload
- time.Tick leaks (SA1015). Use time.NewTicker.
- The loop returns if state != Ready; since the goroutine starts before Ready, it can exit before TTL ever runs.
- lastRequestHandled starts at zero; time.Since(zero) is large, so it may unload immediately on first tick if no requests have completed.
Fix by initializing lastRequestHandled when enabling TTL, waiting until Ready, and using a ticker.
As per coding guidelines (staticcheck)
- go func() { - maxDuration := time.Duration(p.config.UnloadAfter) * time.Second - - for range time.Tick(time.Second) { - if p.CurrentState() != StateReady { - return - } - - // skip the TTL check if there are inflight requests - if p.inFlightRequestsCount.Load() != 0 { - continue - } - - if time.Since(p.getLastRequestHandled()) > maxDuration { - p.proxyLogger.Infof("<%s> Unloading model, TTL of %ds reached", p.ID, p.config.UnloadAfter) - p.Stop() - return - } - } - }() + // Initialize to now so we don't unload immediately before first request completes. + p.setLastRequestHandled(time.Now()) + go func() { + maxDuration := time.Duration(p.config.UnloadAfter) * time.Second + ticker := time.NewTicker(time.Second) + defer ticker.Stop() + for range ticker.C { + st := p.CurrentState() + // Exit when process stops/shuts down; otherwise wait until Ready + if st == StateStopped || st == StateShutdown { + return + } + if st != StateReady { + continue + } + // Skip while requests are in-flight + if p.inFlightRequestsCount.Load() != 0 { + continue + } + if !p.getLastRequestHandled().IsZero() && time.Since(p.getLastRequestHandled()) > maxDuration { + p.proxyLogger.Infof("<%s> Unloading model, TTL of %ds reached", p.ID, p.config.UnloadAfter) + p.Stop() + return + } + } + }()
347-357: Stop() waits before transitioning to Stopping → can hang under loadStop() waits for in-flight requests while state remains Ready, allowing new requests to arrive and preventing the WaitGroup from draining. Transition to Stopping first to reject new requests, then wait and stop.
func (p *Process) Stop() { - if !isValidTransition(p.CurrentState(), StateStopping) { - return - } - - // wait for any inflight requests before proceeding - p.proxyLogger.Debugf("<%s> Stop(): Waiting for inflight requests to complete", p.ID) - p.inFlightRequests.Wait() - p.StopImmediately() + // Transition first to reject new requests, then wait and stop. + if curState, err := p.swapState(StateReady, StateStopping); err != nil { + p.proxyLogger.Infof("<%s> Stop(): Ready -> Stopping failed: %v (current: %v)", p.ID, err, curState) + return + } + p.proxyLogger.Debugf("<%s> Stop(): Waiting for inflight requests to complete", p.ID) + p.inFlightRequests.Wait() + p.stopCommand() }
🧹 Nitpick comments (2)
proxy/proxymanager_test.go (1)
1078-1099: Avoid unnecessary fixed 100ms wait; select on handler completion firstCurrent flow always waits for ctx timeout even if handler returns early. Use a select to prefer done, falling back to ctx.Done(), then ensure the goroutine has exited before asserting.
- // Wait for either the handler to complete or context to timeout - <-ctx.Done() - - // At this point, the handler has either finished or been cancelled - // Wait for the goroutine to fully exit before reading - <-done + // Prefer handler completion; otherwise wait for context timeout, then ensure exit + select { + case <-done: + case <-ctx.Done(): + <-done + }proxy/process.go (1)
379-387: Shutdown should mark Stopping to block new requests before killing upstreamCurrently Shutdown does not transition to Stopping, so ProxyRequest can still accept new requests until the process actually exits. Mark Stopping first (for Ready/Starting), then stop and force Shutdown.
func (p *Process) Shutdown() { - if !isValidTransition(p.CurrentState(), StateStopping) { - return - } - - p.stopCommand() - // just force it to this state since there is no recovery from shutdown - p.forceState(StateShutdown) + cur := p.CurrentState() + switch cur { + case StateReady, StateStarting: + if _, err := p.swapState(cur, StateStopping); err != nil { + p.proxyLogger.Infof("<%s> Shutdown(): %s -> Stopping failed: %v", p.ID, cur, err) + } + default: + if !isValidTransition(cur, StateStopping) { + return + } + } + p.stopCommand() + p.forceState(StateShutdown) // terminal state }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
proxy/process.go(14 hunks)proxy/proxymanager_test.go(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
Fix all staticcheck-reported issues in Go code
Files:
proxy/proxymanager_test.goproxy/process.go
Fixes #348
Summary by CodeRabbit
Bug Fixes
Refactor
Tests
Chores