Skip to content

fix(mcp): resolve TOCTOU race condition and resource leak#999

Merged
yinwm merged 1 commit intosipeed:mainfrom
yinwm:fix/mcp-race-condition-and-resource-leak
Mar 3, 2026
Merged

fix(mcp): resolve TOCTOU race condition and resource leak#999
yinwm merged 1 commit intosipeed:mainfrom
yinwm:fix/mcp-race-condition-and-resource-leak

Conversation

@yinwm
Copy link
Collaborator

@yinwm yinwm commented Mar 2, 2026

Summary

  • Use atomic.Bool for closed flag to prevent TOCTOU race between CallTool and Close operations
  • Add double-check pattern in CallTool for thread-safe closed state detection
  • Use atomic Swap in Close to ensure no new calls can start after closed flag is set
  • Move MCP manager cleanup defer before initialization to handle partial initialization failures
  • Update tests to use atomic.Bool operations

Problem

TOCTOU Race Condition

Original code had a race window between checking closed flag and adding to WaitGroup:

CallTool:  check closed=false → release lock → [not yet Add(1)]
Close:                      acquire lock → closed=true → Wait() returns immediately!
                                                       → starts closing connections while CallTool hasn't started

Resource Leak

If LoadFromMCPConfig partially succeeded then failed, MCP connections were not cleaned up because defer mcpManager.Close() was only called on success path.

Solution

  1. atomic.Bool + Swap: Close() uses Swap(true) which atomically sets closed and returns previous value
  2. Double-check in CallTool: Check closed before lock (fast path) and after lock (TOCTOU prevention)
  3. Defer before init: Move cleanup defer before initialization to handle all exit paths

Test plan

  • make check passes
  • Existing tests updated for atomic.Bool
  • Race condition fix verified through code review

🤖 Generated with Claude Code

- Use atomic.Bool for closed flag to prevent TOCTOU race between
  CallTool and Close operations
- Add double-check pattern in CallTool for thread-safe closed state
- Use atomic Swap in Close to ensure no new calls can start after
  closed flag is set
- Move MCP manager cleanup defer before initialization to handle
  partial initialization failures
- Update tests to use atomic.Bool operations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+github.com/gdamore/tcell/v2 v2.13.8
+github.com/rivo/tview v0.42.0

These two dependencies have changed from indirect to direct dependencies, but there is no code referencing them in the PR. This could be a side effect of go mod tidy, or there may be other uncommitted local code that introduced these dependencies.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's go mod tidy effect

@sipeed-bot sipeed-bot bot added type: bug Something isn't working domain: agent labels Mar 3, 2026
Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This is a solid fix for a real TOCTOU race condition in the MCP manager.

What was broken: The original code checked m.closed (a plain bool) under a read-lock in CallTool, then released the lock, then called wg.Add(1). Meanwhile Close() could set closed=true and call wg.Wait() which would return immediately because Add(1) had not happened yet. This means Close() would start tearing down connections while a CallTool was about to execute.

What this fixes:

  1. atomic.Bool with Swap(true) in Close() -- atomic set + get-previous-value in one operation, no lock needed
  2. Double-check pattern in CallTool -- fast-path check before lock, TOCTOU-safe check after lock, with wg.Add(1) inside the critical section
  3. defer mcpManager.Close() moved before LoadFromMCPConfig() -- handles partial initialization failures

The resource leak fix (defer before init) is an important secondary fix. If LoadFromMCPConfig connects 3 servers successfully then fails on the 4th, those 3 connections would previously leak.

Clean, correct, well-documented.

@yinwm yinwm merged commit de2ccb5 into sipeed:main Mar 3, 2026
2 checks passed
hyperwd pushed a commit to hyperwd/picoclaw that referenced this pull request Mar 5, 2026
…esource-leak

fix(mcp): resolve TOCTOU race condition and resource leak
Pluckypan pushed a commit to Pluckypan/picoclaw that referenced this pull request Mar 6, 2026
…esource-leak

fix(mcp): resolve TOCTOU race condition and resource leak
@yinwm yinwm deleted the fix/mcp-race-condition-and-resource-leak branch March 18, 2026 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: agent type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants