Skip to content

RFC: Fix race condition in ~/.claude.json concurrent writes#29077

Closed
4RH1T3CT0R7 wants to merge 1 commit intoanthropics:mainfrom
4RH1T3CT0R7:fix/atomic-claude-json-write
Closed

RFC: Fix race condition in ~/.claude.json concurrent writes#29077
4RH1T3CT0R7 wants to merge 1 commit intoanthropics:mainfrom
4RH1T3CT0R7:fix/atomic-claude-json-write

Conversation

@4RH1T3CT0R7
Copy link
Copy Markdown

Summary

Comprehensive root cause analysis and fix plan for the race condition that corrupts ~/.claude.json when multiple Claude Code sessions run concurrently. This affects 30+ open issues and has been reported since June 2025.

Key Findings

  • Decompiled v2.1.59 cli.js to identify 3 critical fallback paths that bypass existing atomic write and file locking protections
  • The Fe() atomic write function falls back to non-atomic writeFileSync (with O_TRUNC) when renameSync fails on Windows
  • The W8() / saveGlobalConfig function falls back to unlocked writes when proper-lockfile lock acquisition fails
  • The s16() / read function has no retry on JSON parse errors from partial reads
  • 74 call sites for W8() create high write contention

Proposed Fixes (6 total)

  1. Remove non-atomic fallback in Fe() — retry atomic write instead
  2. Remove lockless fallback in W8()/nw() — retry lock with critical/non-critical distinction
  3. Add retry-with-backoff to s16() read — handles partial reads gracefully
  4. Debounce/coalesce writes — reduces write frequency by 10-100x
  5. Separate high-frequency data to ~/.claude/session-state.json
  6. Cascade-breaking logic — prevents exponential corruption detection feedback loop (2.1.59 regression: corruption detection cascade amplifies .claude.json corruption from Task tool subagents #28923)

Security Review Incorporated

  • 2 Critical: Predictable temp file names (CWE-377), TOCTOU in permission preservation (CWE-367)
  • 3 High: DoS via retry blocking, stale lock theft, symlink credential exfiltration (CWE-59)
  • 3 Medium: Silent auth write loss, debounce persistence gap, cleanup race

All findings addressed in the fix proposals.

Documents

  • ANALYSIS.md — Full root cause analysis with decompiled code, race condition vectors, timeline diagrams, and impact assessment
  • FIX_PLAN.md — Detailed fix proposals with pseudocode patches, security hardening, edge cases, testing strategy, and future direction (SQLite)

Related Issues

Fixes: #28809, #28813, #28824, #28837, #28861, #28888, #28922, #28923, #28965, #28988, #29003, #29032, #29036, #27983

Additional: #28992, #28966, #28842, #29010, #28898, #29004, #29008, #28829, #28847, #15079, #13287, #24130, #27941, #27902

Historical (closed without resolution): #2593, #2810, #3117, #7243, #7273, #15608, #18998, #26717

Test plan

  • Review ANALYSIS.md for accuracy of decompiled code analysis
  • Review FIX_PLAN.md for completeness of fix proposals
  • Validate security hardening recommendations
  • Confirm implementation order (Fix 3 → Fix 1 → Fix 2 → Fix 6 → Fix 4 → Fix 5)
  • Identify any missing edge cases or race vectors

Add comprehensive root cause analysis and fix plan for the race condition
that corrupts ~/.claude.json when multiple Claude Code sessions run
concurrently (30+ open issues).

ANALYSIS.md: Decompiled code analysis of v2.1.59 showing 3 critical
fallback paths that bypass atomic write and file locking protections.
Documents 7 race condition vectors with severity ratings.

FIX_PLAN.md: 6 proposed fixes with pseudocode patches, security review
findings (2 critical, 3 high), edge cases, testing strategy, and
future direction (SQLite migration).

Related issues: anthropics#28809 anthropics#28813 anthropics#28824 anthropics#28837 anthropics#28861 anthropics#28888 anthropics#28922
anthropics#28923 anthropics#28965 anthropics#28988 anthropics#29003 anthropics#29032 anthropics#29036 anthropics#27983
@stevenpetryk
Copy link
Copy Markdown

See #28847

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.claude.json becomes corrupted (Unexpected EOF) during tool use — non-atomic config writes

2 participants