Skip to content

Handle local agent connection recovery#2853

Merged
wwwillchen merged 8 commits intodyad-sh:mainfrom
wwwillchen:local-agent-retry-codex-2
Mar 2, 2026
Merged

Handle local agent connection recovery#2853
wwwillchen merged 8 commits intodyad-sh:mainfrom
wwwillchen:local-agent-retry-codex-2

Conversation

@wwwillchen
Copy link
Copy Markdown
Collaborator

Handle connection-drop and retry behavior in local agent IPC handling.
Add an end-to-end scenario that verifies recovery after a temporary local agent disconnect.
Align test fixtures and snapshots for local-agent reconnection behavior.

@wwwillchen wwwillchen requested a review from a team March 2, 2026 21:35
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the resilience of the local agent by introducing automatic retry mechanisms for transient connection failures during AI stream processing. The changes ensure that agent conversations can seamlessly recover from network interruptions, preserving ongoing work and providing a more robust user experience. This improvement is validated through new comprehensive end-to-end tests and supported by an updated fake LLM server capable of simulating these failure conditions.

Highlights

  • Local Agent Connection Recovery: Implemented robust retry and resume logic within the local agent handler to automatically recover from transient connection drops during AI stream processing, ensuring continuity of agent operations.
  • Tool Event Replay on Retry: Ensured that tool calls and their results emitted before a connection drop are correctly replayed and re-injected into the message history upon retry, preventing loss of context and redundant execution.
  • End-to-End Testing for Resilience: Added a new end-to-end test scenario that simulates a local agent connection drop and verifies the system's ability to recover, complete the task, and maintain UI integrity without displaying errors.
  • Enhanced Fake LLM Server for Testing: Updated the fake LLM server to support configurable connection drop simulations, allowing specific turns and attempts to be designated for connection termination to thoroughly test recovery mechanisms.
Changelog
  • e2e-tests/fixtures/engine/local-agent/connection-drop.ts
    • Added a new fixture to define a scenario where the local agent connection drops on the first attempt of turn 1, simulating a realistic network interruption.
  • e2e-tests/local_agent_connection_retry.spec.ts
    • Added a new E2E test case to verify the local agent's ability to recover from a simulated connection drop, checking for correct file creation and UI state.
  • e2e-tests/snapshots/local_agent_connection_retry.spec.ts_after-connection-retry.txt
    • Added a new snapshot file capturing the expected file system state after a successful connection recovery in the E2E test.
  • e2e-tests/snapshots/local_agent_connection_retry.spec.ts_local-agent---recovers-from-connection-drop-1.aria.yml
    • Added a new ARIA snapshot for the chat UI, confirming the correct display of messages and tool cards after connection recovery.
  • src/tests/local_agent_handler.test.ts
    • Added a unit test to verify that the stream retries and resumes correctly after a transient termination, appending new output.
    • Added a unit test to confirm that previously emitted tool calls and results are replayed in the message history when a stream is retried after termination.
  • src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts
    • Introduced constants MAX_TERMINATED_STREAM_RETRIES, STREAM_RETRY_BASE_DELAY_MS, and STREAM_CONTINUE_MESSAGE for retry configuration and user instruction.
    • Defined RetryReplayEvent type to capture tool call and tool result events for replaying during stream retries.
    • Refactored handleLocalAgentStream to include a while loop for stream retries, managing terminatedRetryCount and needsContinuationInstruction.
    • Modified stream processing to capture streamErrorFromIteration and streamErrorFromCallback for comprehensive error handling.
    • Implemented shouldRetryTerminatedStreamError to determine if a stream error warrants a retry based on error type, retry count, and abort status.
    • Added maybeCaptureRetryReplayEvent to record tool calls and results that need to be replayed if a stream terminates.
    • Implemented maybeAppendRetryReplayForRetry to construct and append replay messages (partial response, tool calls, tool results) to the message history for subsequent retry attempts.
    • Added utility functions buildTerminatedRetryContinuationInstruction, unwrapStreamError, getErrorMessage, isTerminatedStreamError, delay, and toToolResultOutput to support the retry logic.
    • Updated the access to prevStepMessages to use optional chaining (?.response?.messages) for safer property access.
  • testing/fake-llm-server/localAgentHandler.ts
    • Added connectionAttempts map to track retry attempts for connection drop simulation.
    • Implemented logic to check dropConnectionByTurn and dropConnectionOnAttempts from the fixture to simulate connection drops by destroying the response socket.
  • testing/fake-llm-server/localAgentTypes.ts
    • Added dropConnectionOnAttempts property to LocalAgentFixture to specify which attempts should trigger a connection drop.
    • Added dropConnectionByTurn property to LocalAgentFixture for per-turn connection drop configuration, including turnIndex and attempts.
Activity
  • No human activity (comments, reviews) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@wwwillchen
Copy link
Copy Markdown
Collaborator Author

@BugBot run

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust connection recovery mechanism for the local agent, enhancing network resilience with retry logic and backoff for transient stream terminations, validated by unit and end-to-end tests. A security audit confirms that these changes do not introduce any new security vulnerabilities, with the implementation correctly utilizing AbortController, retry limits, and backoff delays while maintaining conversation history integrity. My review focuses on increasing the robustness of the error detection logic and correcting the handling of tool results during the retry process to ensure compatibility with the AI SDK, and I've also included a suggestion for a minor refactoring to improve code clarity.

Comment on lines +1205 to +1219
function isTerminatedStreamError(error: unknown): boolean {
const normalized = unwrapStreamError(error);
const message = getErrorMessage(normalized).toLowerCase();
if (message.includes("typeerror: terminated") || message === "terminated") {
return true;
}
const cause =
isRecord(normalized) && "cause" in normalized
? normalized.cause
: undefined;
if (cause) {
return isTerminatedStreamError(cause);
}
return false;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of isTerminatedStreamError is brittle as it primarily checks for the specific string "typeerror: terminated". This is tailored to the mocked error in the unit tests but is unlikely to cover the range of real-world network errors that can occur when a stream is terminated (e.g., ECONNRESET).

To make the retry logic more robust and effective in production, this check should be expanded to include common network error codes (like ECONNRESET) and more generic error messages (like "fetch failed").

Comment on lines +1357 to +1366
function toToolResultOutput(value: unknown): { type: "text"; value: string } {
if (typeof value === "string") {
return { type: "text", value };
}
try {
return { type: "text", value: JSON.stringify(value) };
} catch {
return { type: "text", value: String(value) };
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The toToolResultOutput function incorrectly wraps the tool output in an object { type: 'text', value: '...' }. This structure is then assigned to the output field of a tool-result message part. The output field of a tool-result should contain the raw, JSON-serializable result of the tool execution, not an object that mimics a content part.

This incorrect nesting will likely be misinterpreted by the AI SDK when it serializes the message for the model provider, potentially causing errors. The function should pass through the raw tool output, as the AI SDK handles the final serialization. Note that the corresponding unit test assertion in src/__tests__/local_agent_handler.test.ts will also need to be updated to reflect this change.

function toToolResultOutput(value: unknown): unknown {
  // The AI SDK will handle serialization of the output, so we should pass the raw value through.
  return value;
}

Comment on lines +957 to +966
maybeAppendRetryReplayForRetry({
partialResponse: fullResponse.slice(
passAttemptResponseStartLength,
),
retryReplayEvents,
currentMessageHistoryRef: currentMessageHistory,
accumulatedAiMessagesRef: accumulatedAiMessages,
onCurrentMessageHistoryUpdate: (next) =>
(currentMessageHistory = next),
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function maybeAppendRetryReplayForRetry modifies currentMessageHistory via a callback, onCurrentMessageHistoryUpdate. This pattern of passing a setter can make the data flow harder to trace and maintain.

For better clarity, consider refactoring maybeAppendRetryReplayForRetry to be a pure function that returns the new message history. The caller can then explicitly re-assign currentMessageHistory. This would make the state update more explicit and easier to follow.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ca24a3719

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +958 to +959
partialResponse: fullResponse.slice(
passAttemptResponseStartLength,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replay only model output when constructing retry history

The retry branch builds partialResponse from fullResponse, but fullResponse includes synthetic XML added via onXmlComplete (tool-call/result markup and other UI tags), not just model text. If a disconnect happens after tool XML is appended, retry history injects that XML back as assistant text before the next streamText() call, so the model sees renderer-specific markup and duplicated tool transcript data. This can derail continuation and produce repeated or inconsistent tool behavior; retry context should be derived from streamed model text only.

Useful? React with 👍 / 👎.

Comment on lines +1314 to +1316
if (event.type === "tool-call") {
replayMessages.push({
role: "assistant",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replay only completed tool exchanges after termination

This retry replay loop re-adds every captured tool-call even when no matching tool-result was captured before the stream dropped. In the common failure window between a tool-call event and its tool-result, the next attempt receives an orphaned tool call in history, which can cause provider-side message validation failures or make the model re-run a state-changing tool. Retry replay should only include tool calls that have corresponding results.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 7 additional findings in Devin Review.

Open in Devin Review

cubic-dev-ai[bot]

This comment was marked as resolved.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

This PR adds connection-drop resilience to the local agent IPC stream handler: when a TypeError: terminated (TCP teardown mid-stream) is detected, the handler replays any already-emitted tool-call/tool-result events into the message history, appends a system continuation instruction, and re-opens the stream (up to MAX_TERMINATED_STREAM_RETRIES = 2 times with exponential backoff). It also adds two E2E test scenarios for the two main drop windows (during a text turn, and mid-tool-call stream), two unit tests, and extends the fake-server fixture DSL with dropConnectionAfterToolCallByTurn.

Key observations:

  • Fake server turn-index bug: countToolResultRounds in testing/fake-llm-server/localAgentHandler.ts finds the last user message as its baseline. In a retry, the continuation instruction is the last user message, so replayed tool results (which appear before it) are not counted — turnIndex resets to 0. For the connection-drop fixture (drop after a completed tool turn), this causes the fake server to re-serve turn 0 on retry, executing write_file a second time. The write is idempotent so no data is lost, but toHaveCount(1) for edit-card assertions could become flaky.
  • isTerminatedStreamError relies on exact error-message strings ("typeerror: terminated") which are Node.js/V8 specific and could silently stop retrying if formatting changes across environments or SDK versions.
  • Duplicated retry-decision blocks: the same ~20-line retry handling sequence is copy-pasted for streamErrorFromIteration and streamResult.response finalization errors; extracting a shared helper would reduce maintenance surface.
  • Confusing variable names in maybeAppendRetryReplayForRetry: toolCallsWithResult is populated from tool-result events and toolResultsWithCall from tool-call events — the names are semantically inverted relative to the event type that populates them.
  • Good incidental fix: optional chaining was added for steps[...].response?.messages (line ~803), preventing a potential crash when step data is incomplete.

Confidence Score: 3/5

  • Mostly safe to merge — the production retry logic is sound, but the fake-server turn-index issue could make the new E2E tests flaky, and error detection relies on brittle string matching.
  • The core retry mechanism (replay events, continuation instruction, exponential backoff) is well-designed and the unit tests cover the happy-path retry scenarios. However, the fake server's countToolResultRounds does not account for the continuation instruction being appended as the last user message, which can cause the server to re-serve an earlier turn (and duplicate tool execution) in the connection-drop E2E scenario. Additionally, isTerminatedStreamError uses Node.js-specific message strings that could stop retrying silently if the error format changes, and the duplicated retry blocks increase future maintenance risk.
  • testing/fake-llm-server/localAgentHandler.ts (countToolResultRounds baseline logic) and src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts (isTerminatedStreamError and duplicated retry blocks).

Important Files Changed

Filename Overview
src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts Core retry logic added; contains duplicated retry blocks and confusingly-named variables in maybeAppendRetryReplayForRetry, plus brittle error-string detection in isTerminatedStreamError. Also includes a good bug-fix for optional chaining on steps[...].response?.messages.
testing/fake-llm-server/localAgentHandler.ts countToolResultRounds uses the last user message as its baseline, but the retry path appends a continuation instruction as the last user message — causing the function to return turnIndex=0 regardless of replayed tool rounds, potentially triggering duplicate tool execution in E2E tests.
testing/fake-llm-server/localAgentTypes.ts New dropConnectionAfterToolCallByTurn field added to LocalAgentFixture type; well-documented with JSDoc, straightforward addition.
testing/fake-llm-server/chatCompletionHandler.ts Correctly extends the synthetic-message detection to also skip the new retry continuation instruction when searching for the original fixture trigger; minimal, correct change.
e2e-tests/local_agent_connection_retry.spec.ts Good coverage of both drop-before-finish and drop-after-tool-call scenarios; positional ordering assertions are a nice touch, though the toHaveCount(1) assertion for edit cards may be fragile if duplicate tool execution occurs (see fake server issue).
src/tests/local_agent_handler.test.ts Two well-structured unit tests added for text-only and tool-replay retry scenarios; verifies continuation instruction injection and replayed tool history on second attempt.
e2e-tests/fixtures/engine/local-agent/connection-drop.ts Fixture correctly uses dropConnectionByTurn to drop turn 1 (post-tool text turn), which is a realistic mid-stream interruption scenario.
e2e-tests/fixtures/engine/local-agent/connection-drop-after-tool-call.ts Fixture for drop-after-tool-call scenario is correctly configured with dropConnectionAfterToolCallByTurn; tests the "orphaned tool call" window.

Sequence Diagram

sequenceDiagram
    participant Client as Electron Client
    participant Handler as local_agent_handler
    participant SDK as AI SDK (streamText)
    participant LLM as Local Agent LLM

    Client->>Handler: handleLocalAgentStream()
    loop Retry while loop (max 2 retries)
        Handler->>SDK: streamText(attemptMessages)
        SDK->>LLM: POST /v1/chat/completions
        LLM-->>SDK: SSE stream (text-delta, tool-call, tool-result)
        SDK-->>Handler: fullStream events (captured in retryReplayEvents)
        
        alt Stream terminates normally
            SDK-->>Handler: streamResult.response resolved
            Handler->>Handler: Accumulate responseMessages
            Handler->>Client: sendResponseChunk()
            Note over Handler: break out of retry loop
        else Stream terminates with TypeError("terminated")
            Note over Handler: isTerminatedStreamError() → true
            Handler->>Handler: maybeAppendRetryReplayForRetry()<br/>(inject replayed tool exchanges into currentMessageHistory)
            Handler->>Handler: append continuation instruction<br/>(needsContinuationInstruction = true)
            Handler->>Handler: delay(400ms * retryCount)
            Note over Handler: continue → retry loop
            Handler->>SDK: streamText([...replay, continuationInstruction])
            SDK->>LLM: POST /v1/chat/completions (with context)
            LLM-->>SDK: SSE stream (resumed)
            SDK-->>Handler: fullStream events
            SDK-->>Handler: streamResult.response resolved
            Note over Handler: break
        else Retries exhausted (retryCount >= MAX_TERMINATED_STREAM_RETRIES)
            Handler->>Client: throw error (caught upstream)
        end
    end
    Handler->>Client: chat:response:end
Loading

Last reviewed commit: 47623be

@github-actions github-actions bot added the needs-human:review-issue ai agent flagged an issue that requires human review label Mar 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge

Reviewed by 3 independent agents: Correctness Expert, Code Health Expert, UX Wizard.

Issues Summary

Severity File Issue
🟡 MEDIUM src/.../local_agent_handler.ts Retry-on-terminated logic duplicated in two catch branches
🟡 MEDIUM src/.../local_agent_handler.ts isTerminatedStreamError uses fragile string matching without documenting source
🟢 Low Priority Notes (4 items)
  • baseMessageHistoryCount not updated after retry replay - src/.../local_agent_handler.ts — If mid-turn compaction triggers during a retry iteration, the stale baseMessageHistoryCount could cause incorrect message slicing of in-flight messages.
  • Recursive cause traversal has no depth limit - src/.../local_agent_handler.ts:~1205isTerminatedStreamError recursively walks error.cause with no max depth. A circular cause chain (unlikely but possible) would stack overflow.
  • Whitespace-only partial responses silently discarded - src/.../local_agent_handler.tsmaybeAppendRetryReplayForRetry uses partialResponse.trim() to decide whether to replay text. Whitespace-only partial output is dropped from model context but remains in fullResponse, creating a minor mismatch.
  • dropConnectionOnAttempts defined but never exercised - testing/fake-llm-server/localAgentTypes.ts — The simpler top-level field exists in the type and handler fallback logic but no fixture uses it. Consider removing it or adding a fixture.
🚫 Dropped False Positives (8 items)
  • passProducedChatText not reset between retry attempts — Dropped: Tracking text production across all attempts is correct behavior. The flag indicates whether the pass produced any text at all, regardless of which attempt emitted it.
  • Partial text from dropped connection remains in chat — Dropped: This is intentional design. STREAM_CONTINUE_MESSAGE instructs the model to continue from where it left off, so the partial text is preserved and the continuation seamlessly follows.
  • onError callback timing: streamErrorFromCallback may not be set when checked — Dropped: The ?? operator correctly handles both paths — either the iterator throws (caught as streamErrorFromIteration) or the callback fires first. The fallback covers both orderings.
  • Module-level toolStreamingEntries not cleared on retry — Dropped: The AI SDK generates unique UUIDs per tool call, so stale entries from a failed attempt won't collide with new ones. The orphaned entries are a theoretical memory concern but not a corruption risk.
  • connectionAttempts Map grows unboundedly — Dropped: E2E test servers are short-lived per test run, and session IDs are unique per test. Not a practical concern.
  • No user-visible indication of retry occurring — Dropped: The retry delay is 400–800ms, well within normal streaming latency. Adding a UI indicator for sub-second pauses would be over-engineering.
  • Raw TypeError surfaces when retries exhausted — Dropped: Error propagation via throw streamError is correct. The error is handled by the caller's existing error handling path, same as before this PR.
  • Retry counter resets per pass — Dropped: Per-pass retry semantics are reasonable — each pass is a separate conversation turn with independent network conditions.

Generated by Dyadbot multi-agent code review

Comment on lines +985 to +1011
if (
shouldRetryTerminatedStreamError({
error: err,
retryCount: terminatedRetryCount,
aborted: abortController.signal.aborted,
})
) {
maybeAppendRetryReplayForRetry({
partialResponse: fullResponse.slice(
passAttemptResponseStartLength,
),
retryReplayEvents,
currentMessageHistoryRef: currentMessageHistory,
accumulatedAiMessagesRef: accumulatedAiMessages,
onCurrentMessageHistoryUpdate: (next) =>
(currentMessageHistory = next),
});
terminatedRetryCount += 1;
needsContinuationInstruction = true;
const retryDelayMs =
STREAM_RETRY_BASE_DELAY_MS * terminatedRetryCount;
logger.warn(
`Transient stream termination while finalizing response for chat ${req.chatId}; retrying pass (${terminatedRetryCount}/${MAX_TERMINATED_STREAM_RETRIES}) after ${retryDelayMs}ms`,
);
await delay(retryDelayMs);
continue;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | duplication

Retry-on-terminated logic duplicated in two catch branches

This retry block (lines 985–1011) is nearly identical to the one at lines 948–975 — the only difference is the log message string. Any future change to the retry logic (delay formula, counter increment, telemetry, etc.) must be applied in both places, making it easy to introduce inconsistencies.

💡 Suggestion: Extract a shared helper, e.g. async function performTerminatedRetry(logContext: string), that encapsulates maybeAppendRetryReplayForRetry, counter increment, delay, and the continue signal. Call it from both branches.

Comment on lines +1205 to +1219
function isTerminatedStreamError(error: unknown): boolean {
const normalized = unwrapStreamError(error);
const message = getErrorMessage(normalized).toLowerCase();
if (message.includes("typeerror: terminated") || message === "terminated") {
return true;
}
const cause =
isRecord(normalized) && "cause" in normalized
? normalized.cause
: undefined;
if (cause) {
return isTerminatedStreamError(cause);
}
return false;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | fragile-detection

isTerminatedStreamError relies on fragile string matching without documenting the source

This function detects "terminated" errors by matching lowercased error message strings ("typeerror: terminated", "terminated"). This is a fragile heuristic that will silently stop working if the underlying HTTP library (undici? node-fetch? Node.js built-in?) changes its error message wording. There's no comment explaining:

  • Which library emits TypeError: terminated and under what condition
  • Why a typed error check (e.g., error.code or error subclass) can't be used instead

The recursive cause traversal (with no depth limit) adds to the opacity.

💡 Suggestion: Add a comment documenting which library emits this error and why string matching is necessary. If possible, also check error.code or the error constructor as a more robust detection mechanism.

cursor[bot]

This comment was marked as resolved.

`Transient stream termination for chat ${req.chatId}; retrying pass (${terminatedRetryCount}/${MAX_TERMINATED_STREAM_RETRIES}) after ${retryDelayMs}ms`,
);
await delay(retryDelayMs);
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated retry-and-continue blocks risk inconsistent future fixes

Low Severity

The shouldRetryTerminatedStreamErrormaybeAppendRetryReplayForRetry → increment → delay → continue sequence is copy-pasted nearly identically across two error-handling paths (stream iteration error and response finalization error). Only the error variable name and log message differ. If the retry logic ever needs updating, both blocks must be changed in lockstep, creating a maintenance risk and inconsistency hazard.

Additional Locations (1)

Fix in Cursor Fix in Web

@wwwillchen
Copy link
Copy Markdown
Collaborator Author

@BugBot run

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1583cc151c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +445 to +447
const attemptKey = `${sessionId}-${passIndex}-${turnIndex}`;
const currentAttempt = (connectionAttempts.get(attemptKey) || 0) + 1;
connectionAttempts.set(attemptKey, currentAttempt);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reset connection-drop attempt state per fixture session

The global connectionAttempts counter is incremented for each ${sessionId}-${passIndex}-${turnIndex} key but never cleared, and sessionId is derived from the first user message content. When the same fixture prompt is reused in the same fake-server process (for example with Playwright --repeat-each or multiple tests that send the same trigger), later runs start at attempt 2+ so rules like attempts: [1] no longer drop the connection, making the reconnection fixtures non-deterministic and potentially masking regressions.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 7 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="e2e-tests/local_agent_connection_retry.spec.ts">

<violation number="1" location="e2e-tests/local_agent_connection_retry.spec.ts:44">
P2: The conversation-order check is layout-dependent (`boundingBox().y`) and can become flaky across environments; assert DOM/message sequence instead of pixel coordinates.</violation>
</file>

<file name="testing/fake-llm-server/localAgentHandler.ts">

<violation number="1" location="testing/fake-llm-server/localAgentHandler.ts:480">
P2: The global `connectionAttempts` map is never cleared between test sessions. Since `sessionId` is derived from user message content, reusing the same fixture prompt within the same fake-server process (e.g., via Playwright `--repeat-each` or multiple tests sharing a trigger) causes the attempt counter to start at 2+ on subsequent runs. This means `attempts: [1]` rules won't fire, making connection-drop fixtures non-deterministic and potentially masking regressions. Consider clearing the relevant keys when a new session/fixture starts, or resetting the map between test runs.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment on lines +44 to +52
const introY = (await introText.boundingBox())?.y;
const editCardY = (await recoveredEditCard.boundingBox())?.y;
const completionY = (await completionText.boundingBox())?.y;
expect(introY).toBeDefined();
expect(editCardY).toBeDefined();
expect(completionY).toBeDefined();
expect(introY!).toBeLessThan(editCardY!);
expect(editCardY!).toBeLessThan(completionY!);

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The conversation-order check is layout-dependent (boundingBox().y) and can become flaky across environments; assert DOM/message sequence instead of pixel coordinates.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At e2e-tests/local_agent_connection_retry.spec.ts, line 44:

<comment>The conversation-order check is layout-dependent (`boundingBox().y`) and can become flaky across environments; assert DOM/message sequence instead of pixel coordinates.</comment>

<file context>
@@ -20,16 +20,35 @@ testSkipIfWindows(
+
+    // The replayed conversation order must stay:
+    // intro assistant text -> tool edit card -> completion assistant text.
+    const introY = (await introText.boundingBox())?.y;
+    const editCardY = (await recoveredEditCard.boundingBox())?.y;
+    const completionY = (await completionText.boundingBox())?.y;
</file context>
Suggested change
const introY = (await introText.boundingBox())?.y;
const editCardY = (await recoveredEditCard.boundingBox())?.y;
const completionY = (await completionText.boundingBox())?.y;
expect(introY).toBeDefined();
expect(editCardY).toBeDefined();
expect(completionY).toBeDefined();
expect(introY!).toBeLessThan(editCardY!);
expect(editCardY!).toBeLessThan(completionY!);
await expect(recoveredEditCard).toBeVisible();
const [introEl, editEl, completionEl] = await Promise.all([
introText.elementHandle(),
recoveredEditCard.elementHandle(),
completionText.elementHandle(),
]);
expect(introEl).toBeTruthy();
expect(editEl).toBeTruthy();
expect(completionEl).toBeTruthy();
const isInOrder = await po.page.evaluate(
([intro, edit, completion]) =>
!!intro &&
!!edit &&
!!completion &&
!!(intro.compareDocumentPosition(edit) & Node.DOCUMENT_POSITION_FOLLOWING) &&
!!(edit.compareDocumentPosition(completion) & Node.DOCUMENT_POSITION_FOLLOWING),
[introEl, editEl, completionEl],
);
expect(isInOrder).toBe(true);
Fix with Cubic

turnScopedDropAfterToolCallAttempts &&
turnScopedDropAfterToolCallAttempts.length > 0
? (() => {
const attemptKey = `${sessionId}-${passIndex}-${turnIndex}-after-tool-call`;
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The global connectionAttempts map is never cleared between test sessions. Since sessionId is derived from user message content, reusing the same fixture prompt within the same fake-server process (e.g., via Playwright --repeat-each or multiple tests sharing a trigger) causes the attempt counter to start at 2+ on subsequent runs. This means attempts: [1] rules won't fire, making connection-drop fixtures non-deterministic and potentially masking regressions. Consider clearing the relevant keys when a new session/fixture starts, or resetting the map between test runs.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At testing/fake-llm-server/localAgentHandler.ts, line 480:

<comment>The global `connectionAttempts` map is never cleared between test sessions. Since `sessionId` is derived from user message content, reusing the same fixture prompt within the same fake-server process (e.g., via Playwright `--repeat-each` or multiple tests sharing a trigger) causes the attempt counter to start at 2+ on subsequent runs. This means `attempts: [1]` rules won't fire, making connection-drop fixtures non-deterministic and potentially masking regressions. Consider clearing the relevant keys when a new session/fixture starts, or resetting the map between test runs.</comment>

<file context>
@@ -454,7 +473,23 @@ export async function handleLocalAgentFixture(
+        turnScopedDropAfterToolCallAttempts &&
+        turnScopedDropAfterToolCallAttempts.length > 0
+          ? (() => {
+              const attemptKey = `${sessionId}-${passIndex}-${turnIndex}-after-tool-call`;
+              const currentAttempt =
+                (connectionAttempts.get(attemptKey) || 0) + 1;
</file context>
Fix with Cubic

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

🎭 Playwright Test Results

❌ Some tests failed

OS Passed Failed Flaky Skipped
🍎 macOS 236 2 6 6

Summary: 236 passed, 2 failed, 6 flaky, 6 skipped

Failed Tests

🍎 macOS

  • context_manage.spec.ts > manage context - smart context
    • Error: expect(string).toMatchSnapshot(expected) failed
  • template-create-nextjs.spec.ts > create next.js app
    • Error: expect(locator).toMatchAriaSnapshot(expected) failed

📋 Re-run Failing Tests (macOS)

Copy and paste to re-run all failing spec files locally:

npm run e2e \
  e2e-tests/context_manage.spec.ts \
  e2e-tests/template-create-nextjs.spec.ts

⚠️ Flaky Tests

🍎 macOS

  • engine.spec.ts > send message to engine - openai gpt-5 (passed after 1 retry)
  • hmr_path.spec.ts > HMR after navigating back to root should stay on root (passed after 1 retry)
  • partial_response.spec.ts > partial message is resumed (passed after 1 retry)
  • select_component.spec.ts > select component next.js (passed after 1 retry)
  • setup_flow.spec.ts > Setup Flow > setup banner shows correct state when node.js is installed (passed after 1 retry)
  • setup.spec.ts > setup ai provider (passed after 1 retry)

📊 View full report

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge

Reviewed by 3 independent agents: Correctness Expert, Code Health Expert, UX Wizard.

Issues Summary

Severity File Issue
🟡 MEDIUM local_agent_handler.ts:1306 Swapped variable names in maybeAppendRetryReplayForRetry
🟡 MEDIUM local_agent_handler.ts:1194 getErrorMessage duplicates existing utility in src/lib/errors.ts
🟡 MEDIUM local_agent_handler.ts:623 Inner retry while loop lacks high-level documentation
🟡 MEDIUM local_agent_handler.ts:947 Synthetic </think> tag not captured in retry replay events
🟡 MEDIUM local_agent_handler.ts:976 No visual feedback to user during retry delay
🟡 MEDIUM local_agent_handler.ts:983 Raw TypeError: terminated shown to user after retries exhausted
🟢 Low Priority Notes (7 items)
  • Retry delay doesn't respect abort signallocal_agent_handler.ts:980 — 400–800ms delay blocks even if user cancels
  • Orphaned toolStreamingEntries on mid-input terminationlocal_agent_handler.ts — module-level Map leaks entries when stream dies between tool-input-start and tool-input-end
  • steps type annotation is overly looselocal_agent_handler.ts:616 — Uses Array<unknown> instead of the SDK's step type
  • dropConnectionOnAttempts defined but never used by any fixturelocalAgentTypes.ts:56 — Dead infrastructure; only per-turn variants are used
  • STREAM_CONTINUE_MESSAGE lacks "why" commentlocal_agent_handler.ts:84 — Specific wording matters (prevents duplicate text) but isn't documented
  • Continuation instruction relies on model compliancelocal_agent_handler.ts:84 — Model may repeat text; no programmatic dedup
  • Only 2 retries with short total window (~1.2s)local_agent_handler.ts:82 — May be insufficient for genuine intermittent connectivity issues
🚫 Dropped False Positives (3 items)
  • Partial text from failed stream persists in user-visible response — Dropped: This is the intended design. The unit test explicitly validates that fullResponse contains both partial and recovered text. The continuation instruction handles deduplication.
  • streamErrorFromCallback may be stale from previous retry iteration — Dropped: The variable is declared fresh per loop iteration and onError fires synchronously during stream processing, so timing conflicts are not realistic.
  • onError no longer sends error to frontend for non-retryable errors — Dropped: Non-retryable errors are thrown and caught by the outer handler which does send chat:response:error. The path where onError fires but iteration succeeds is unrealistic with the AI SDK.

Generated by Dyadbot multi-agent code review

Comment on lines +1306 to +1316
const toolCallsWithResult = new Set<string>();
const toolResultsWithCall = new Set<string>();

for (const event of retryReplayEvents) {
if (event.type === "tool-call") {
toolResultsWithCall.add(event.toolCallId);
continue;
}
if (event.type === "tool-result") {
toolCallsWithResult.add(event.toolCallId);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | naming

Set variable names are swapped relative to their content

toolCallsWithResult is populated from tool-result events (line 1315), and toolResultsWithCall is populated from tool-call events (line 1311). The names are the opposite of what they store. The intersection logic for completedToolExchangeIds happens to produce the correct result, but the swapped names make this code confusing and error-prone for future maintenance.

💡 Suggestion: Rename to seenToolCallIds and seenToolResultIds to directly describe what each set holds.

return error;
}

function getErrorMessage(error: unknown): string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | duplication

getErrorMessage duplicates existing utility in src/lib/errors.ts

A getErrorMessage function already exists at src/lib/errors.ts. This PR adds a second, slightly different implementation. Having two getErrorMessage functions with different behavior is a maintenance hazard.

💡 Suggestion: Extend the existing getErrorMessage in src/lib/errors.ts to handle the additional cases, or rename this one to something distinct like formatStreamErrorMessage.

let terminatedRetryCount = 0;
let needsContinuationInstruction = false;

while (!abortController.signal.aborted) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | missing-why-comment

Inner retry while loop lacks high-level documentation

This inner while loop is the core of this PR — it implements the entire retry-on-terminated-stream protocol. A block comment explaining the high-level contract would help future readers understand: (1) what conditions cause a retry, (2) what state is preserved vs reset between attempts, and (3) how replay events feed into the next attempt's message history.

💡 Suggestion: Add a comment like: "Retry loop: if the stream terminates with a transient error, captured text/tool events are replayed into message history, a continuation instruction is appended, and the stream is re-opened. Breaks on success, abort, or non-retryable error."

Comment on lines +947 to +949
const closingThinkBlock = "</think>\n";
fullResponse += closingThinkBlock;
await updateResponseInDb(placeholderMessageId, fullResponse);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | data-integrity

Synthetic </think> tag not captured in retry replay events

When a stream terminates mid-thinking-block, this code appends a synthetic </think>\n to fullResponse and persists it to DB. However, maybeCaptureRetryReplayText is only called from the text-delta handler, so this synthetic closing tag is NOT included in the retry replay events. This means the replay text fed into message history on retry is inconsistent with what's shown to the user (user sees closed thinking block, but message history has an unclosed one).

Comment on lines +976 to +983
STREAM_RETRY_BASE_DELAY_MS * terminatedRetryCount;
logger.warn(
`Transient stream termination for chat ${req.chatId}; retrying pass (${terminatedRetryCount}/${MAX_TERMINATED_STREAM_RETRIES}) after ${retryDelayMs}ms`,
);
await delay(retryDelayMs);
continue;
}
throw streamError;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | UX

No visual feedback during retry; raw error after retries exhausted

Two UX concerns here:

  1. During retry: The user experiences a 400–800ms pause with no visual indication that recovery is happening. The stream stops mid-response, waits, then resumes. Consider a transient "Reconnecting…" indicator.

  2. After retries exhausted: The throw streamError surfaces as a raw TypeError: terminated to the user, which is not actionable. After silently retrying multiple times, showing a user-friendly message like "The connection to your local agent was lost and could not be recovered" would be much better.

if (turnScopedDropAttempts && turnScopedDropAttempts.length > 0) {
const attemptKey = `${sessionId}-${passIndex}-${turnIndex}`;
const currentAttempt = (connectionAttempts.get(attemptKey) || 0) + 1;
connectionAttempts.set(attemptKey, currentAttempt);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry drop counters leak across conversations

Medium Severity

connectionAttempts is keyed by sessionId, but sessionId is derived from the first user message content. Separate chats that start with the same fixture trigger share the same key, so attempt counts bleed across runs. In localAgentHandler.ts, later conversations may skip configured first-attempt drops unexpectedly.

Additional Locations (1)

Fix in Cursor Fix in Web

// Check if we should simulate a connection drop for this attempt
const turnScopedDropAttempts =
fixture.dropConnectionByTurn?.find((rule) => rule.turnIndex === turnIndex)
?.attempts ?? fixture.dropConnectionOnAttempts;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global drop rule applies to every turn

Medium Severity

dropConnectionOnAttempts is documented for the first turn, but localAgentHandler.ts falls back to it for any turnIndex without a per-turn rule. This makes later turns drop unexpectedly, so fixture behavior diverges from the declared contract in localAgentTypes.ts.

Additional Locations (1)

Fix in Cursor Fix in Web


// Track connection attempts per session+turn for connection drop simulation.
// Key: `${sessionId}-${passIndex}-${turnIndex}`, Value: attempt count
const connectionAttempts = new Map<string, number>();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connection attempt map never gets cleared

Low Severity

connectionAttempts accumulates per-session keys but has no cleanup path. In long-running fake-server processes, each unique sessionId/pass/turn key remains forever, causing unbounded growth and stale state retention in localAgentHandler.ts.

Additional Locations (2)

Fix in Cursor Fix in Web

...currentMessageHistory,
buildTerminatedRetryContinuationInstruction(),
]
: currentMessageHistory;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry instruction resets fixture turn state

Medium Severity

Appending buildTerminatedRetryContinuationInstruction() as the last user message can make fixture turn detection use the synthetic retry prompt as the boundary. countToolResultRounds() then ignores prior replayed tool results, so retries can restart from turn 0 and re-run earlier fixture work instead of resuming.

Additional Locations (1)

Fix in Cursor Fix in Web

}
try {
return { type: "text", value: JSON.stringify(value) };
} catch {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undefined tool results break replay message encoding

Medium Severity

toToolResultOutput() returns JSON.stringify(value) directly for non-strings. When value is undefined, JSON.stringify returns undefined, so replayed tool-result messages get a non-string output.value. That can make the retry replay payload invalid and fail the recovery attempt.

Additional Locations (1)

Fix in Cursor Fix in Web

@wwwillchen
Copy link
Copy Markdown
Collaborator Author

@BugBot run

@wwwillchen
Copy link
Copy Markdown
Collaborator Author

@BugBot run

@wwwillchen wwwillchen merged commit 6f53e89 into dyad-sh:main Mar 2, 2026
7 of 8 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 47623be5f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +61 to +62
(userTextContent.includes("incomplete todo(s)") ||
userTextContent.includes("previous response stream was interrupted"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Select the most recent fixture for retry continuation

Including "previous response stream was interrupted" in this fallback means retry continuation prompts now reuse the "search earlier messages" path, but that path picks the first tc=local-agent/... trigger in the chat rather than the latest one. In a chat that has multiple fixture prompts, a retry can be routed to an older fixture, and handleLocalAgentFixture then derives turn state from the wrong script, which can replay earlier turns and re-run tool side effects instead of resuming the interrupted one.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.

if (event.type === "tool-result") {
toolCallsWithResult.add(event.toolCallId);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapped variable names in tool exchange tracking sets

Medium Severity

toolCallsWithResult is populated from tool-result events and toolResultsWithCall is populated from tool-call events — the opposite of what the names suggest. The final completedToolExchangeIds computation is still correct because set intersection is commutative, but the naming is misleading. A future developer trusting the variable names while modifying this logic (e.g., changing the intersection to a directional filter) could easily introduce a real bug.

Fix in Cursor Fix in Web

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.

if (event.type === "tool-result") {
toolCallsWithResult.add(event.toolCallId);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapped set variable names obscure retry replay logic

Low Severity

In maybeAppendRetryReplayForRetry, the sets toolCallsWithResult and toolResultsWithCall have their population logic swapped relative to their names. toolCallsWithResult is populated from tool-result events, and toolResultsWithCall is populated from tool-call events — the opposite of what the names suggest. The intersection used to compute completedToolExchangeIds is commutative, so the result is still correct. However, the naming mismatch is confusing and could mislead future maintainers into introducing a real bug when modifying this code.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 2, 2026

Additional Comments (1)

testing/fake-llm-server/localAgentHandler.ts
countToolResultRounds miscalculates turn index after retry with replay

countToolResultRounds finds the last user message and counts tool-result rounds that appear after it. In the retry path, handleLocalAgentStream appends a continuation instruction as the last user message after any replay messages:

[user: fixture trigger]
[assistant: "I'll create a file…" + write_file]  ← replay
[tool:  write_file result]                         ← replay
[assistant: "Partial response…"]                  ← replay (turn 1 partial)
[user: "[System] …stream was interrupted…"]        ← continuation (LAST user msg)

Because the continuation instruction is the last user message, lastUserIndex points to it and the tool-result at the replay position is not counted. The function returns rounds = 0, so turnIndex = 0.

This causes the fake server to re-serve turn 0 (which may include tool calls) on retry, instead of the correct turn 1. For the connection-drop fixture this means write_file gets executed a second time (the file write is idempotent, so it may not surface as a test failure, but the conversation history will contain a duplicate tool exchange and potentially two edit cards in the UI).

A targeted fix would be to skip synthetic user messages (continuation instructions, todo-reminders) when searching for the baseline user message:

function countToolResultRounds(messages: any[]): number {
  // Skip synthetic messages injected by the retry/outer-loop machinery
  const isSynthetic = (msg: any) =>
    isTodoReminderMessage(msg) ||
    (typeof msg?.content === "string" &&
      msg.content.includes("previous response stream was interrupted")) ||
    (Array.isArray(msg?.content) &&
      msg.content.some(
        (p: any) =>
          p.type === "text" &&
          p.text?.includes("previous response stream was interrupted"),
      ));

  let lastUserIndex = -1;
  for (let i = messages.length - 1; i >= 0; i--) {
    if (messages[i]?.role === "user" && !isSynthetic(messages[i])) {
      lastUserIndex = i;
      break;
    }
  }
  // ... rest unchanged
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: testing/fake-llm-server/localAgentHandler.ts
Line: 64-96

Comment:
**`countToolResultRounds` miscalculates turn index after retry with replay**

`countToolResultRounds` finds the **last** user message and counts tool-result rounds that appear after it. In the retry path, `handleLocalAgentStream` appends a continuation instruction as the last user message _after_ any replay messages:

```
[user: fixture trigger]
[assistant: "I'll create a file…" + write_file]  ← replay
[tool:  write_file result]                         ← replay
[assistant: "Partial response…"]                  ← replay (turn 1 partial)
[user: "[System] …stream was interrupted…"]        ← continuation (LAST user msg)
```

Because the continuation instruction is the last user message, `lastUserIndex` points to it and the tool-result at the replay position is **not counted**. The function returns `rounds = 0`, so `turnIndex = 0`.

This causes the fake server to re-serve turn 0 (which may include tool calls) on retry, instead of the correct turn 1. For the `connection-drop` fixture this means `write_file` gets executed a second time (the file write is idempotent, so it may not surface as a test failure, but the conversation history will contain a duplicate tool exchange and potentially two edit cards in the UI).

A targeted fix would be to skip synthetic user messages (continuation instructions, todo-reminders) when searching for the baseline user message:

```ts
function countToolResultRounds(messages: any[]): number {
  // Skip synthetic messages injected by the retry/outer-loop machinery
  const isSynthetic = (msg: any) =>
    isTodoReminderMessage(msg) ||
    (typeof msg?.content === "string" &&
      msg.content.includes("previous response stream was interrupted")) ||
    (Array.isArray(msg?.content) &&
      msg.content.some(
        (p: any) =>
          p.type === "text" &&
          p.text?.includes("previous response stream was interrupted"),
      ));

  let lastUserIndex = -1;
  for (let i = messages.length - 1; i >= 0; i--) {
    if (messages[i]?.role === "user" && !isSynthetic(messages[i])) {
      lastUserIndex = i;
      break;
    }
  }
  // ... rest unchanged
}
```

How can I resolve this? If you propose a fix, please make it concise.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

🔍 Dyadbot Code Review Summary

Verdict: 🤔 NOT SURE - Potential issues

Reviewed by 3 independent agents: Correctness Expert, Code Health Expert, UX Wizard.

Issues Summary

Severity File Issue
🟡 MEDIUM local_agent_handler.ts:1210 Non-retryable stream errors produce degraded user-facing error messages
🟡 MEDIUM local_agent_handler.ts:1353 Set variable names are swapped, making retry replay logic confusing
🟡 MEDIUM local_agent_handler.ts:982-1048 Retry-and-continue logic duplicated across two error handling sites
🟡 MEDIUM local_agent_connection_retry.spec.ts:78 Second test uses .first() instead of asserting count=1, may mask duplicate tool execution
🟢 Low Priority Notes (7 items)
  • dropConnectionOnAttempts unused - localAgentTypes.ts:50 - Declared in type but no fixture uses it; dead test infrastructure
  • streamingPreview not reset between retries - local_agent_handler.ts:267 - Stale tool preview could briefly flicker on retry (self-correcting)
  • connectionAttempts map never cleaned up - localAgentHandler.ts:26 - Module-level map grows unbounded in long-running fake server
  • Unnecessary xmlChunk variable - local_agent_handler.ts:496 - Intermediate variable used once, adds no clarity
  • Magic string matching for retry detection - chatCompletionHandler.ts:59 - Hardcoded substring should reference shared constant
  • Second test lacks message ordering verification - local_agent_connection_retry.spec.ts:62 - Missing bounding box ordering + snapshot compared to first test
  • Local getErrorMessage shadows src/lib/errors.ts export - local_agent_handler.ts:1232 - Same function name with different behavior is confusing
🚫 Dropped False Positives (4 items)
  • No user-visible feedback during retry delay - Dropped: Max total delay is ~1.2s across all retries; users won't notice a sub-second pause during AI streaming
  • Retry delay doesn't respect abort signal - Dropped: Max delay is 800ms, negligible wait before abort is detected on next loop iteration
  • maybeCaptureRetryReplayEvent takes unknown - Dropped: Reasonable defensive TypeScript pattern; function is called from a generic stream loop
  • MAX_TERMINATED_STREAM_RETRIES = 2 is conservative - Dropped: Reasonable starting point; PR adds telemetry to determine right threshold from real-world data

Generated by Dyadbot multi-agent code review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-agent review: 4 MEDIUM issues found

"Local agent stream error:",
getErrorMessage(normalizedError),
);
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | error-handling

Non-retryable stream errors produce degraded user-facing error messages

The old onError callback immediately sent chat:response:error to the UI with a well-formatted AI error: ${error?.error?.message || JSON.stringify(error)} message. The new callback silently captures the error, and for non-retryable errors, it eventually throws to the outer catch (line 1210) which formats it as `Error: ${error}`.

If the unwrapped error is a plain object (e.g., from API rate limits, auth failures, server 500s), the user will see Error: [object Object] in the ChatErrorBox instead of a meaningful message.

💡 Suggestion: In the outer catch block (line 1210-1213), use the getErrorMessage() helper already defined in this file to format the error, and preserve the AI error: prefix:

error: `AI error: ${getErrorMessage(error)}`

`Transient stream termination for chat ${req.chatId}; retrying pass (${terminatedRetryCount}/${MAX_TERMINATED_STREAM_RETRIES}) after ${retryDelayMs}ms`,
);
await delay(retryDelayMs);
continue;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | duplication

Retry-and-continue logic duplicated across two error handling sites

The ~20-line retry block here (stream iteration errors) is nearly identical to lines 1030-1048 (response finalization errors). The only differences are the error variable name, the phase telemetry string, and the log message.

This duplication makes it easy to update one path and forget the other when retry behavior evolves.

💡 Suggestion: Extract a helper like handleTerminatedStreamRetry(error, phase) that encapsulates the shouldRetry check, replay append, counter increment, telemetry event, log, and delay. Return a boolean so the caller knows whether to continue or fall through.

if (event.type === "tool-result") {
toolCallsWithResult.add(event.toolCallId);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | naming

Set variable names are swapped, making retry replay logic confusing

toolCallsWithResult is populated from tool-result events, and toolResultsWithCall is populated from tool-call events — the names are backwards. The intersection logic works correctly because set intersection is commutative, but the misleading names could cause bugs if this code is modified by someone trusting the variable names.

💡 Suggestion: Rename to match what they actually track, e.g., idsWithToolCall and idsWithToolResult, or simply toolCallIds and toolResultIds.

await expect(
po.page
.getByRole("button", {
name: /recovered-after-tool-call\.ts .*src\/recovered-after-tool-call\.ts.*Edit/,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | test-correctness

Using .first() may mask duplicate tool execution visible to user

The first test (recovers from connection drop) correctly asserts toHaveCount(1) for the edit card. This test uses .first() without asserting the count first. If the retry logic accidentally creates duplicate edit cards (e.g., the tool runs twice), this test would still pass while the user would see a confusing duplicate.

💡 Suggestion: Add await expect(editCard).toHaveCount(1) before .first(), matching the pattern in the first test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-human:review-issue ai agent flagged an issue that requires human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant