Skip to content

fix: Fixes race conditions and missing error handling that cause subagents to hang indefinitely when errors occur.#18389

Draft
huyusong10 wants to merge 2 commits intoanomalyco:devfrom
huyusong10:fix/subagent-error-handling
Draft

fix: Fixes race conditions and missing error handling that cause subagents to hang indefinitely when errors occur.#18389
huyusong10 wants to merge 2 commits intoanomalyco:devfrom
huyusong10:fix/subagent-error-handling

Conversation

@huyusong10
Copy link
Copy Markdown

@huyusong10 huyusong10 commented Mar 20, 2026

Issue for this PR

Closes #18378

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Fixes race conditions and missing error handling that cause subagents to hang indefinitely when errors occur.

Changes Made

1. prompt.ts - session/prompt.ts

  • Add state existence check before adding callbacks to prevent race conditions
  • Reject promise immediately if session state is not found
  • Prevents "hanging" when concurrent operations delete/recreate session state
if (!abort) {
  return new Promise<MessageV2.WithParts>((resolve, reject) => {
    const current = state()
    const sessionState = current[sessionID]
    if (!sessionState) {
      reject(new DOMException("Session state not found", "AbortError"))
      return
    }
    sessionState.callbacks.push({ resolve, reject })
  })
}

2. task.ts - tool/task.ts

  • Add try-catch around SessionPrompt.prompt() call
  • Ensure errors are properly propagated to parent agent
  • Provide meaningful error messages for debugging
    let result: MessageV2.WithParts
    try {
    result = await SessionPrompt.prompt({ /* ... */ })
    } catch (error) {
    throw new Error(Subagent execution failed: ${error instanceof Error ? error.message : String(error)})
    }

Problem Solved
Before: Parent agent shows subagent as "running" forever, even though subagent crashed/cancelled
After: Parent agent receives proper error notification and can handle failure gracefully

How did you verify your code works?

  • Ran bun run typecheck - all checks passed
  • Code review confirms state check prevents race condition
  • Error handling path ensures parent receives notification

Screenshots / recordings

Not applicable (backend error handling fix)

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

When a session is cancelled (e.g., due to an error or abort), the
callbacks array containing pending Promise resolve/reject handlers
was not being processed. This caused callers waiting for the session
to complete to hang indefinitely.

This fix ensures all pending callbacks are rejected with an AbortError
before the session state is cleaned up, both in the cancel() function
and in the state dispose handler.
- Add try-catch in task.ts to catch and propagate subagent execution errors
- Check session state existence in loop() before adding callbacks
- Prevent race conditions where callbacks are added to deleted session state
- Ensure parent agent receives proper error notification when subagent fails

This fixes issues where subagents appear to be running but are actually
crashed or cancelled, causing parent agents to wait indefinitely.
@github-actions
Copy link
Copy Markdown
Contributor

Hey! Your PR title Fix/subagent error handling doesn't follow conventional commit format.

Please update it to start with one of:

  • feat: or feat(scope): new feature
  • fix: or fix(scope): bug fix
  • docs: or docs(scope): documentation changes
  • chore: or chore(scope): maintenance tasks
  • refactor: or refactor(scope): code refactoring
  • test: or test(scope): adding or updating tests

Where scope is the package name (e.g., app, desktop, opencode).

See CONTRIBUTING.md for details.

@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on the search results, I found the following potentially related PRs:

Potentially Related PRs

  1. fix: robust subagent completion propagation #13321 - "fix: robust subagent completion propagation"

  2. fix(opencode): propagate subagent errors to parent session #13422 - "fix(opencode): propagate subagent errors to parent session"

  3. fix(run): prevent subagent question tool hang in non-interactive mode #13974 - "fix(run): prevent subagent question tool hang in non-interactive mode"

  4. fix(acp): handle question.asked event to prevent hanging #17921 - "fix(acp): handle question.asked event to prevent hanging"

  5. fix: improve plugin system robustness — agent/command resolution, async errors, hook timing, two-phase init #18280 - "fix: improve plugin system robustness — agent/command resolution, async errors, hook timing, two-phase init"

Most relevant: PR #13422 appears to be the most directly related, as it specifically addresses propagating subagent errors to the parent session, which is the core issue being fixed in PR #18389.

@huyusong10 huyusong10 changed the title Fix/subagent error handling Fix/Fixes race conditions and missing error handling that cause subagents to hang indefinitely when errors occur. Mar 20, 2026
@huyusong10 huyusong10 changed the title Fix/Fixes race conditions and missing error handling that cause subagents to hang indefinitely when errors occur. fix/Fixes race conditions and missing error handling that cause subagents to hang indefinitely when errors occur. Mar 20, 2026
@huyusong10 huyusong10 changed the title fix/Fixes race conditions and missing error handling that cause subagents to hang indefinitely when errors occur. fix: Fixes race conditions and missing error handling that cause subagents to hang indefinitely when errors occur. Mar 20, 2026
@huyusong10 huyusong10 closed this Mar 20, 2026
@huyusong10 huyusong10 reopened this Mar 20, 2026
@huyusong10 huyusong10 marked this pull request as draft March 20, 2026 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Subagent tasks hang indefinitely in high-concurrency environment

1 participant