Handle local agent connection recovery#2853
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the resilience of the local agent by introducing automatic retry mechanisms for transient connection failures during AI stream processing. The changes ensure that agent conversations can seamlessly recover from network interruptions, preserving ongoing work and providing a more robust user experience. This improvement is validated through new comprehensive end-to-end tests and supported by an updated fake LLM server capable of simulating these failure conditions. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
@BugBot run |
There was a problem hiding this comment.
Code Review
This pull request introduces a robust connection recovery mechanism for the local agent, enhancing network resilience with retry logic and backoff for transient stream terminations, validated by unit and end-to-end tests. A security audit confirms that these changes do not introduce any new security vulnerabilities, with the implementation correctly utilizing AbortController, retry limits, and backoff delays while maintaining conversation history integrity. My review focuses on increasing the robustness of the error detection logic and correcting the handling of tool results during the retry process to ensure compatibility with the AI SDK, and I've also included a suggestion for a minor refactoring to improve code clarity.
| function isTerminatedStreamError(error: unknown): boolean { | ||
| const normalized = unwrapStreamError(error); | ||
| const message = getErrorMessage(normalized).toLowerCase(); | ||
| if (message.includes("typeerror: terminated") || message === "terminated") { | ||
| return true; | ||
| } | ||
| const cause = | ||
| isRecord(normalized) && "cause" in normalized | ||
| ? normalized.cause | ||
| : undefined; | ||
| if (cause) { | ||
| return isTerminatedStreamError(cause); | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
The current implementation of isTerminatedStreamError is brittle as it primarily checks for the specific string "typeerror: terminated". This is tailored to the mocked error in the unit tests but is unlikely to cover the range of real-world network errors that can occur when a stream is terminated (e.g., ECONNRESET).
To make the retry logic more robust and effective in production, this check should be expanded to include common network error codes (like ECONNRESET) and more generic error messages (like "fetch failed").
| function toToolResultOutput(value: unknown): { type: "text"; value: string } { | ||
| if (typeof value === "string") { | ||
| return { type: "text", value }; | ||
| } | ||
| try { | ||
| return { type: "text", value: JSON.stringify(value) }; | ||
| } catch { | ||
| return { type: "text", value: String(value) }; | ||
| } | ||
| } |
There was a problem hiding this comment.
The toToolResultOutput function incorrectly wraps the tool output in an object { type: 'text', value: '...' }. This structure is then assigned to the output field of a tool-result message part. The output field of a tool-result should contain the raw, JSON-serializable result of the tool execution, not an object that mimics a content part.
This incorrect nesting will likely be misinterpreted by the AI SDK when it serializes the message for the model provider, potentially causing errors. The function should pass through the raw tool output, as the AI SDK handles the final serialization. Note that the corresponding unit test assertion in src/__tests__/local_agent_handler.test.ts will also need to be updated to reflect this change.
function toToolResultOutput(value: unknown): unknown {
// The AI SDK will handle serialization of the output, so we should pass the raw value through.
return value;
}| maybeAppendRetryReplayForRetry({ | ||
| partialResponse: fullResponse.slice( | ||
| passAttemptResponseStartLength, | ||
| ), | ||
| retryReplayEvents, | ||
| currentMessageHistoryRef: currentMessageHistory, | ||
| accumulatedAiMessagesRef: accumulatedAiMessages, | ||
| onCurrentMessageHistoryUpdate: (next) => | ||
| (currentMessageHistory = next), | ||
| }); |
There was a problem hiding this comment.
The function maybeAppendRetryReplayForRetry modifies currentMessageHistory via a callback, onCurrentMessageHistoryUpdate. This pattern of passing a setter can make the data flow harder to trace and maintain.
For better clarity, consider refactoring maybeAppendRetryReplayForRetry to be a pure function that returns the new message history. The caller can then explicitly re-assign currentMessageHistory. This would make the state update more explicit and easier to follow.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ca24a3719
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| partialResponse: fullResponse.slice( | ||
| passAttemptResponseStartLength, |
There was a problem hiding this comment.
Replay only model output when constructing retry history
The retry branch builds partialResponse from fullResponse, but fullResponse includes synthetic XML added via onXmlComplete (tool-call/result markup and other UI tags), not just model text. If a disconnect happens after tool XML is appended, retry history injects that XML back as assistant text before the next streamText() call, so the model sees renderer-specific markup and duplicated tool transcript data. This can derail continuation and produce repeated or inconsistent tool behavior; retry context should be derived from streamed model text only.
Useful? React with 👍 / 👎.
| if (event.type === "tool-call") { | ||
| replayMessages.push({ | ||
| role: "assistant", |
There was a problem hiding this comment.
Replay only completed tool exchanges after termination
This retry replay loop re-adds every captured tool-call even when no matching tool-result was captured before the stream dropped. In the common failure window between a tool-call event and its tool-result, the next attempt receives an orphaned tool call in history, which can cause provider-side message validation failures or make the model re-run a state-changing tool. Retry replay should only include tool calls that have corresponding results.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR adds connection-drop resilience to the local agent IPC stream handler: when a Key observations:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client as Electron Client
participant Handler as local_agent_handler
participant SDK as AI SDK (streamText)
participant LLM as Local Agent LLM
Client->>Handler: handleLocalAgentStream()
loop Retry while loop (max 2 retries)
Handler->>SDK: streamText(attemptMessages)
SDK->>LLM: POST /v1/chat/completions
LLM-->>SDK: SSE stream (text-delta, tool-call, tool-result)
SDK-->>Handler: fullStream events (captured in retryReplayEvents)
alt Stream terminates normally
SDK-->>Handler: streamResult.response resolved
Handler->>Handler: Accumulate responseMessages
Handler->>Client: sendResponseChunk()
Note over Handler: break out of retry loop
else Stream terminates with TypeError("terminated")
Note over Handler: isTerminatedStreamError() → true
Handler->>Handler: maybeAppendRetryReplayForRetry()<br/>(inject replayed tool exchanges into currentMessageHistory)
Handler->>Handler: append continuation instruction<br/>(needsContinuationInstruction = true)
Handler->>Handler: delay(400ms * retryCount)
Note over Handler: continue → retry loop
Handler->>SDK: streamText([...replay, continuationInstruction])
SDK->>LLM: POST /v1/chat/completions (with context)
LLM-->>SDK: SSE stream (resumed)
SDK-->>Handler: fullStream events
SDK-->>Handler: streamResult.response resolved
Note over Handler: break
else Retries exhausted (retryCount >= MAX_TERMINATED_STREAM_RETRIES)
Handler->>Client: throw error (caught upstream)
end
end
Handler->>Client: chat:response:end
Last reviewed commit: 47623be |
🔍 Dyadbot Code Review SummaryVerdict: ✅ YES - Ready to merge Reviewed by 3 independent agents: Correctness Expert, Code Health Expert, UX Wizard. Issues Summary
🟢 Low Priority Notes (4 items)
🚫 Dropped False Positives (8 items)
Generated by Dyadbot multi-agent code review |
| if ( | ||
| shouldRetryTerminatedStreamError({ | ||
| error: err, | ||
| retryCount: terminatedRetryCount, | ||
| aborted: abortController.signal.aborted, | ||
| }) | ||
| ) { | ||
| maybeAppendRetryReplayForRetry({ | ||
| partialResponse: fullResponse.slice( | ||
| passAttemptResponseStartLength, | ||
| ), | ||
| retryReplayEvents, | ||
| currentMessageHistoryRef: currentMessageHistory, | ||
| accumulatedAiMessagesRef: accumulatedAiMessages, | ||
| onCurrentMessageHistoryUpdate: (next) => | ||
| (currentMessageHistory = next), | ||
| }); | ||
| terminatedRetryCount += 1; | ||
| needsContinuationInstruction = true; | ||
| const retryDelayMs = | ||
| STREAM_RETRY_BASE_DELAY_MS * terminatedRetryCount; | ||
| logger.warn( | ||
| `Transient stream termination while finalizing response for chat ${req.chatId}; retrying pass (${terminatedRetryCount}/${MAX_TERMINATED_STREAM_RETRIES}) after ${retryDelayMs}ms`, | ||
| ); | ||
| await delay(retryDelayMs); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
🟡 MEDIUM | duplication
Retry-on-terminated logic duplicated in two catch branches
This retry block (lines 985–1011) is nearly identical to the one at lines 948–975 — the only difference is the log message string. Any future change to the retry logic (delay formula, counter increment, telemetry, etc.) must be applied in both places, making it easy to introduce inconsistencies.
💡 Suggestion: Extract a shared helper, e.g. async function performTerminatedRetry(logContext: string), that encapsulates maybeAppendRetryReplayForRetry, counter increment, delay, and the continue signal. Call it from both branches.
| function isTerminatedStreamError(error: unknown): boolean { | ||
| const normalized = unwrapStreamError(error); | ||
| const message = getErrorMessage(normalized).toLowerCase(); | ||
| if (message.includes("typeerror: terminated") || message === "terminated") { | ||
| return true; | ||
| } | ||
| const cause = | ||
| isRecord(normalized) && "cause" in normalized | ||
| ? normalized.cause | ||
| : undefined; | ||
| if (cause) { | ||
| return isTerminatedStreamError(cause); | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
🟡 MEDIUM | fragile-detection
isTerminatedStreamError relies on fragile string matching without documenting the source
This function detects "terminated" errors by matching lowercased error message strings ("typeerror: terminated", "terminated"). This is a fragile heuristic that will silently stop working if the underlying HTTP library (undici? node-fetch? Node.js built-in?) changes its error message wording. There's no comment explaining:
- Which library emits
TypeError: terminatedand under what condition - Why a typed error check (e.g.,
error.codeor error subclass) can't be used instead
The recursive cause traversal (with no depth limit) adds to the opacity.
💡 Suggestion: Add a comment documenting which library emits this error and why string matching is necessary. If possible, also check error.code or the error constructor as a more robust detection mechanism.
| `Transient stream termination for chat ${req.chatId}; retrying pass (${terminatedRetryCount}/${MAX_TERMINATED_STREAM_RETRIES}) after ${retryDelayMs}ms`, | ||
| ); | ||
| await delay(retryDelayMs); | ||
| continue; |
There was a problem hiding this comment.
Duplicated retry-and-continue blocks risk inconsistent future fixes
Low Severity
The shouldRetryTerminatedStreamError → maybeAppendRetryReplayForRetry → increment → delay → continue sequence is copy-pasted nearly identically across two error-handling paths (stream iteration error and response finalization error). Only the error variable name and log message differ. If the retry logic ever needs updating, both blocks must be changed in lockstep, creating a maintenance risk and inconsistency hazard.
Additional Locations (1)
|
@BugBot run |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1583cc151c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const attemptKey = `${sessionId}-${passIndex}-${turnIndex}`; | ||
| const currentAttempt = (connectionAttempts.get(attemptKey) || 0) + 1; | ||
| connectionAttempts.set(attemptKey, currentAttempt); |
There was a problem hiding this comment.
Reset connection-drop attempt state per fixture session
The global connectionAttempts counter is incremented for each ${sessionId}-${passIndex}-${turnIndex} key but never cleared, and sessionId is derived from the first user message content. When the same fixture prompt is reused in the same fake-server process (for example with Playwright --repeat-each or multiple tests that send the same trigger), later runs start at attempt 2+ so rules like attempts: [1] no longer drop the connection, making the reconnection fixtures non-deterministic and potentially masking regressions.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
2 issues found across 7 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="e2e-tests/local_agent_connection_retry.spec.ts">
<violation number="1" location="e2e-tests/local_agent_connection_retry.spec.ts:44">
P2: The conversation-order check is layout-dependent (`boundingBox().y`) and can become flaky across environments; assert DOM/message sequence instead of pixel coordinates.</violation>
</file>
<file name="testing/fake-llm-server/localAgentHandler.ts">
<violation number="1" location="testing/fake-llm-server/localAgentHandler.ts:480">
P2: The global `connectionAttempts` map is never cleared between test sessions. Since `sessionId` is derived from user message content, reusing the same fixture prompt within the same fake-server process (e.g., via Playwright `--repeat-each` or multiple tests sharing a trigger) causes the attempt counter to start at 2+ on subsequent runs. This means `attempts: [1]` rules won't fire, making connection-drop fixtures non-deterministic and potentially masking regressions. Consider clearing the relevant keys when a new session/fixture starts, or resetting the map between test runs.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| const introY = (await introText.boundingBox())?.y; | ||
| const editCardY = (await recoveredEditCard.boundingBox())?.y; | ||
| const completionY = (await completionText.boundingBox())?.y; | ||
| expect(introY).toBeDefined(); | ||
| expect(editCardY).toBeDefined(); | ||
| expect(completionY).toBeDefined(); | ||
| expect(introY!).toBeLessThan(editCardY!); | ||
| expect(editCardY!).toBeLessThan(completionY!); | ||
|
|
There was a problem hiding this comment.
P2: The conversation-order check is layout-dependent (boundingBox().y) and can become flaky across environments; assert DOM/message sequence instead of pixel coordinates.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At e2e-tests/local_agent_connection_retry.spec.ts, line 44:
<comment>The conversation-order check is layout-dependent (`boundingBox().y`) and can become flaky across environments; assert DOM/message sequence instead of pixel coordinates.</comment>
<file context>
@@ -20,16 +20,35 @@ testSkipIfWindows(
+
+ // The replayed conversation order must stay:
+ // intro assistant text -> tool edit card -> completion assistant text.
+ const introY = (await introText.boundingBox())?.y;
+ const editCardY = (await recoveredEditCard.boundingBox())?.y;
+ const completionY = (await completionText.boundingBox())?.y;
</file context>
| const introY = (await introText.boundingBox())?.y; | |
| const editCardY = (await recoveredEditCard.boundingBox())?.y; | |
| const completionY = (await completionText.boundingBox())?.y; | |
| expect(introY).toBeDefined(); | |
| expect(editCardY).toBeDefined(); | |
| expect(completionY).toBeDefined(); | |
| expect(introY!).toBeLessThan(editCardY!); | |
| expect(editCardY!).toBeLessThan(completionY!); | |
| await expect(recoveredEditCard).toBeVisible(); | |
| const [introEl, editEl, completionEl] = await Promise.all([ | |
| introText.elementHandle(), | |
| recoveredEditCard.elementHandle(), | |
| completionText.elementHandle(), | |
| ]); | |
| expect(introEl).toBeTruthy(); | |
| expect(editEl).toBeTruthy(); | |
| expect(completionEl).toBeTruthy(); | |
| const isInOrder = await po.page.evaluate( | |
| ([intro, edit, completion]) => | |
| !!intro && | |
| !!edit && | |
| !!completion && | |
| !!(intro.compareDocumentPosition(edit) & Node.DOCUMENT_POSITION_FOLLOWING) && | |
| !!(edit.compareDocumentPosition(completion) & Node.DOCUMENT_POSITION_FOLLOWING), | |
| [introEl, editEl, completionEl], | |
| ); | |
| expect(isInOrder).toBe(true); |
| turnScopedDropAfterToolCallAttempts && | ||
| turnScopedDropAfterToolCallAttempts.length > 0 | ||
| ? (() => { | ||
| const attemptKey = `${sessionId}-${passIndex}-${turnIndex}-after-tool-call`; |
There was a problem hiding this comment.
P2: The global connectionAttempts map is never cleared between test sessions. Since sessionId is derived from user message content, reusing the same fixture prompt within the same fake-server process (e.g., via Playwright --repeat-each or multiple tests sharing a trigger) causes the attempt counter to start at 2+ on subsequent runs. This means attempts: [1] rules won't fire, making connection-drop fixtures non-deterministic and potentially masking regressions. Consider clearing the relevant keys when a new session/fixture starts, or resetting the map between test runs.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At testing/fake-llm-server/localAgentHandler.ts, line 480:
<comment>The global `connectionAttempts` map is never cleared between test sessions. Since `sessionId` is derived from user message content, reusing the same fixture prompt within the same fake-server process (e.g., via Playwright `--repeat-each` or multiple tests sharing a trigger) causes the attempt counter to start at 2+ on subsequent runs. This means `attempts: [1]` rules won't fire, making connection-drop fixtures non-deterministic and potentially masking regressions. Consider clearing the relevant keys when a new session/fixture starts, or resetting the map between test runs.</comment>
<file context>
@@ -454,7 +473,23 @@ export async function handleLocalAgentFixture(
+ turnScopedDropAfterToolCallAttempts &&
+ turnScopedDropAfterToolCallAttempts.length > 0
+ ? (() => {
+ const attemptKey = `${sessionId}-${passIndex}-${turnIndex}-after-tool-call`;
+ const currentAttempt =
+ (connectionAttempts.get(attemptKey) || 0) + 1;
</file context>
🎭 Playwright Test Results❌ Some tests failed
Summary: 236 passed, 2 failed, 6 flaky, 6 skipped Failed Tests🍎 macOS
📋 Re-run Failing Tests (macOS)Copy and paste to re-run all failing spec files locally: npm run e2e \
e2e-tests/context_manage.spec.ts \
e2e-tests/template-create-nextjs.spec.ts
|
🔍 Dyadbot Code Review SummaryVerdict: ✅ YES - Ready to merge Reviewed by 3 independent agents: Correctness Expert, Code Health Expert, UX Wizard. Issues Summary
🟢 Low Priority Notes (7 items)
🚫 Dropped False Positives (3 items)
Generated by Dyadbot multi-agent code review |
| const toolCallsWithResult = new Set<string>(); | ||
| const toolResultsWithCall = new Set<string>(); | ||
|
|
||
| for (const event of retryReplayEvents) { | ||
| if (event.type === "tool-call") { | ||
| toolResultsWithCall.add(event.toolCallId); | ||
| continue; | ||
| } | ||
| if (event.type === "tool-result") { | ||
| toolCallsWithResult.add(event.toolCallId); | ||
| } |
There was a problem hiding this comment.
🟡 MEDIUM | naming
Set variable names are swapped relative to their content
toolCallsWithResult is populated from tool-result events (line 1315), and toolResultsWithCall is populated from tool-call events (line 1311). The names are the opposite of what they store. The intersection logic for completedToolExchangeIds happens to produce the correct result, but the swapped names make this code confusing and error-prone for future maintenance.
💡 Suggestion: Rename to seenToolCallIds and seenToolResultIds to directly describe what each set holds.
| return error; | ||
| } | ||
|
|
||
| function getErrorMessage(error: unknown): string { |
There was a problem hiding this comment.
🟡 MEDIUM | duplication
getErrorMessage duplicates existing utility in src/lib/errors.ts
A getErrorMessage function already exists at src/lib/errors.ts. This PR adds a second, slightly different implementation. Having two getErrorMessage functions with different behavior is a maintenance hazard.
💡 Suggestion: Extend the existing getErrorMessage in src/lib/errors.ts to handle the additional cases, or rename this one to something distinct like formatStreamErrorMessage.
| let terminatedRetryCount = 0; | ||
| let needsContinuationInstruction = false; | ||
|
|
||
| while (!abortController.signal.aborted) { |
There was a problem hiding this comment.
🟡 MEDIUM | missing-why-comment
Inner retry while loop lacks high-level documentation
This inner while loop is the core of this PR — it implements the entire retry-on-terminated-stream protocol. A block comment explaining the high-level contract would help future readers understand: (1) what conditions cause a retry, (2) what state is preserved vs reset between attempts, and (3) how replay events feed into the next attempt's message history.
💡 Suggestion: Add a comment like: "Retry loop: if the stream terminates with a transient error, captured text/tool events are replayed into message history, a continuation instruction is appended, and the stream is re-opened. Breaks on success, abort, or non-retryable error."
| const closingThinkBlock = "</think>\n"; | ||
| fullResponse += closingThinkBlock; | ||
| await updateResponseInDb(placeholderMessageId, fullResponse); |
There was a problem hiding this comment.
🟡 MEDIUM | data-integrity
Synthetic </think> tag not captured in retry replay events
When a stream terminates mid-thinking-block, this code appends a synthetic </think>\n to fullResponse and persists it to DB. However, maybeCaptureRetryReplayText is only called from the text-delta handler, so this synthetic closing tag is NOT included in the retry replay events. This means the replay text fed into message history on retry is inconsistent with what's shown to the user (user sees closed thinking block, but message history has an unclosed one).
| STREAM_RETRY_BASE_DELAY_MS * terminatedRetryCount; | ||
| logger.warn( | ||
| `Transient stream termination for chat ${req.chatId}; retrying pass (${terminatedRetryCount}/${MAX_TERMINATED_STREAM_RETRIES}) after ${retryDelayMs}ms`, | ||
| ); | ||
| await delay(retryDelayMs); | ||
| continue; | ||
| } | ||
| throw streamError; |
There was a problem hiding this comment.
🟡 MEDIUM | UX
No visual feedback during retry; raw error after retries exhausted
Two UX concerns here:
-
During retry: The user experiences a 400–800ms pause with no visual indication that recovery is happening. The stream stops mid-response, waits, then resumes. Consider a transient "Reconnecting…" indicator.
-
After retries exhausted: The
throw streamErrorsurfaces as a rawTypeError: terminatedto the user, which is not actionable. After silently retrying multiple times, showing a user-friendly message like "The connection to your local agent was lost and could not be recovered" would be much better.
| if (turnScopedDropAttempts && turnScopedDropAttempts.length > 0) { | ||
| const attemptKey = `${sessionId}-${passIndex}-${turnIndex}`; | ||
| const currentAttempt = (connectionAttempts.get(attemptKey) || 0) + 1; | ||
| connectionAttempts.set(attemptKey, currentAttempt); |
There was a problem hiding this comment.
Retry drop counters leak across conversations
Medium Severity
connectionAttempts is keyed by sessionId, but sessionId is derived from the first user message content. Separate chats that start with the same fixture trigger share the same key, so attempt counts bleed across runs. In localAgentHandler.ts, later conversations may skip configured first-attempt drops unexpectedly.
Additional Locations (1)
| // Check if we should simulate a connection drop for this attempt | ||
| const turnScopedDropAttempts = | ||
| fixture.dropConnectionByTurn?.find((rule) => rule.turnIndex === turnIndex) | ||
| ?.attempts ?? fixture.dropConnectionOnAttempts; |
There was a problem hiding this comment.
Global drop rule applies to every turn
Medium Severity
dropConnectionOnAttempts is documented for the first turn, but localAgentHandler.ts falls back to it for any turnIndex without a per-turn rule. This makes later turns drop unexpectedly, so fixture behavior diverges from the declared contract in localAgentTypes.ts.
Additional Locations (1)
|
|
||
| // Track connection attempts per session+turn for connection drop simulation. | ||
| // Key: `${sessionId}-${passIndex}-${turnIndex}`, Value: attempt count | ||
| const connectionAttempts = new Map<string, number>(); |
There was a problem hiding this comment.
Connection attempt map never gets cleared
Low Severity
connectionAttempts accumulates per-session keys but has no cleanup path. In long-running fake-server processes, each unique sessionId/pass/turn key remains forever, causing unbounded growth and stale state retention in localAgentHandler.ts.
Additional Locations (2)
| ...currentMessageHistory, | ||
| buildTerminatedRetryContinuationInstruction(), | ||
| ] | ||
| : currentMessageHistory; |
There was a problem hiding this comment.
Retry instruction resets fixture turn state
Medium Severity
Appending buildTerminatedRetryContinuationInstruction() as the last user message can make fixture turn detection use the synthetic retry prompt as the boundary. countToolResultRounds() then ignores prior replayed tool results, so retries can restart from turn 0 and re-run earlier fixture work instead of resuming.
Additional Locations (1)
| } | ||
| try { | ||
| return { type: "text", value: JSON.stringify(value) }; | ||
| } catch { |
There was a problem hiding this comment.
Undefined tool results break replay message encoding
Medium Severity
toToolResultOutput() returns JSON.stringify(value) directly for non-strings. When value is undefined, JSON.stringify returns undefined, so replayed tool-result messages get a non-string output.value. That can make the retry replay payload invalid and fail the recovery attempt.
Additional Locations (1)
|
@BugBot run |
|
@BugBot run |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 47623be5f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| (userTextContent.includes("incomplete todo(s)") || | ||
| userTextContent.includes("previous response stream was interrupted")) |
There was a problem hiding this comment.
Select the most recent fixture for retry continuation
Including "previous response stream was interrupted" in this fallback means retry continuation prompts now reuse the "search earlier messages" path, but that path picks the first tc=local-agent/... trigger in the chat rather than the latest one. In a chat that has multiple fixture prompts, a retry can be routed to an older fixture, and handleLocalAgentFixture then derives turn state from the wrong script, which can replay earlier turns and re-run tool side effects instead of resuming the interrupted one.
Useful? React with 👍 / 👎.
| if (event.type === "tool-result") { | ||
| toolCallsWithResult.add(event.toolCallId); | ||
| } | ||
| } |
There was a problem hiding this comment.
Swapped variable names in tool exchange tracking sets
Medium Severity
toolCallsWithResult is populated from tool-result events and toolResultsWithCall is populated from tool-call events — the opposite of what the names suggest. The final completedToolExchangeIds computation is still correct because set intersection is commutative, but the naming is misleading. A future developer trusting the variable names while modifying this logic (e.g., changing the intersection to a directional filter) could easily introduce a real bug.
| if (event.type === "tool-result") { | ||
| toolCallsWithResult.add(event.toolCallId); | ||
| } | ||
| } |
There was a problem hiding this comment.
Swapped set variable names obscure retry replay logic
Low Severity
In maybeAppendRetryReplayForRetry, the sets toolCallsWithResult and toolResultsWithCall have their population logic swapped relative to their names. toolCallsWithResult is populated from tool-result events, and toolResultsWithCall is populated from tool-call events — the opposite of what the names suggest. The intersection used to compute completedToolExchangeIds is commutative, so the result is still correct. However, the naming mismatch is confusing and could mislead future maintainers into introducing a real bug when modifying this code.
Additional Comments (1)
Because the continuation instruction is the last user message, This causes the fake server to re-serve turn 0 (which may include tool calls) on retry, instead of the correct turn 1. For the A targeted fix would be to skip synthetic user messages (continuation instructions, todo-reminders) when searching for the baseline user message: function countToolResultRounds(messages: any[]): number {
// Skip synthetic messages injected by the retry/outer-loop machinery
const isSynthetic = (msg: any) =>
isTodoReminderMessage(msg) ||
(typeof msg?.content === "string" &&
msg.content.includes("previous response stream was interrupted")) ||
(Array.isArray(msg?.content) &&
msg.content.some(
(p: any) =>
p.type === "text" &&
p.text?.includes("previous response stream was interrupted"),
));
let lastUserIndex = -1;
for (let i = messages.length - 1; i >= 0; i--) {
if (messages[i]?.role === "user" && !isSynthetic(messages[i])) {
lastUserIndex = i;
break;
}
}
// ... rest unchanged
}Prompt To Fix With AIThis is a comment left during a code review.
Path: testing/fake-llm-server/localAgentHandler.ts
Line: 64-96
Comment:
**`countToolResultRounds` miscalculates turn index after retry with replay**
`countToolResultRounds` finds the **last** user message and counts tool-result rounds that appear after it. In the retry path, `handleLocalAgentStream` appends a continuation instruction as the last user message _after_ any replay messages:
```
[user: fixture trigger]
[assistant: "I'll create a file…" + write_file] ← replay
[tool: write_file result] ← replay
[assistant: "Partial response…"] ← replay (turn 1 partial)
[user: "[System] …stream was interrupted…"] ← continuation (LAST user msg)
```
Because the continuation instruction is the last user message, `lastUserIndex` points to it and the tool-result at the replay position is **not counted**. The function returns `rounds = 0`, so `turnIndex = 0`.
This causes the fake server to re-serve turn 0 (which may include tool calls) on retry, instead of the correct turn 1. For the `connection-drop` fixture this means `write_file` gets executed a second time (the file write is idempotent, so it may not surface as a test failure, but the conversation history will contain a duplicate tool exchange and potentially two edit cards in the UI).
A targeted fix would be to skip synthetic user messages (continuation instructions, todo-reminders) when searching for the baseline user message:
```ts
function countToolResultRounds(messages: any[]): number {
// Skip synthetic messages injected by the retry/outer-loop machinery
const isSynthetic = (msg: any) =>
isTodoReminderMessage(msg) ||
(typeof msg?.content === "string" &&
msg.content.includes("previous response stream was interrupted")) ||
(Array.isArray(msg?.content) &&
msg.content.some(
(p: any) =>
p.type === "text" &&
p.text?.includes("previous response stream was interrupted"),
));
let lastUserIndex = -1;
for (let i = messages.length - 1; i >= 0; i--) {
if (messages[i]?.role === "user" && !isSynthetic(messages[i])) {
lastUserIndex = i;
break;
}
}
// ... rest unchanged
}
```
How can I resolve this? If you propose a fix, please make it concise. |
🔍 Dyadbot Code Review SummaryVerdict: 🤔 NOT SURE - Potential issues Reviewed by 3 independent agents: Correctness Expert, Code Health Expert, UX Wizard. Issues Summary
🟢 Low Priority Notes (7 items)
🚫 Dropped False Positives (4 items)
Generated by Dyadbot multi-agent code review |
| "Local agent stream error:", | ||
| getErrorMessage(normalizedError), | ||
| ); | ||
| }, |
There was a problem hiding this comment.
🟡 MEDIUM | error-handling
Non-retryable stream errors produce degraded user-facing error messages
The old onError callback immediately sent chat:response:error to the UI with a well-formatted AI error: ${error?.error?.message || JSON.stringify(error)} message. The new callback silently captures the error, and for non-retryable errors, it eventually throws to the outer catch (line 1210) which formats it as `Error: ${error}`.
If the unwrapped error is a plain object (e.g., from API rate limits, auth failures, server 500s), the user will see Error: [object Object] in the ChatErrorBox instead of a meaningful message.
💡 Suggestion: In the outer catch block (line 1210-1213), use the getErrorMessage() helper already defined in this file to format the error, and preserve the AI error: prefix:
error: `AI error: ${getErrorMessage(error)}`| `Transient stream termination for chat ${req.chatId}; retrying pass (${terminatedRetryCount}/${MAX_TERMINATED_STREAM_RETRIES}) after ${retryDelayMs}ms`, | ||
| ); | ||
| await delay(retryDelayMs); | ||
| continue; |
There was a problem hiding this comment.
🟡 MEDIUM | duplication
Retry-and-continue logic duplicated across two error handling sites
The ~20-line retry block here (stream iteration errors) is nearly identical to lines 1030-1048 (response finalization errors). The only differences are the error variable name, the phase telemetry string, and the log message.
This duplication makes it easy to update one path and forget the other when retry behavior evolves.
💡 Suggestion: Extract a helper like handleTerminatedStreamRetry(error, phase) that encapsulates the shouldRetry check, replay append, counter increment, telemetry event, log, and delay. Return a boolean so the caller knows whether to continue or fall through.
| if (event.type === "tool-result") { | ||
| toolCallsWithResult.add(event.toolCallId); | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 MEDIUM | naming
Set variable names are swapped, making retry replay logic confusing
toolCallsWithResult is populated from tool-result events, and toolResultsWithCall is populated from tool-call events — the names are backwards. The intersection logic works correctly because set intersection is commutative, but the misleading names could cause bugs if this code is modified by someone trusting the variable names.
💡 Suggestion: Rename to match what they actually track, e.g., idsWithToolCall and idsWithToolResult, or simply toolCallIds and toolResultIds.
| await expect( | ||
| po.page | ||
| .getByRole("button", { | ||
| name: /recovered-after-tool-call\.ts .*src\/recovered-after-tool-call\.ts.*Edit/, |
There was a problem hiding this comment.
🟡 MEDIUM | test-correctness
Using .first() may mask duplicate tool execution visible to user
The first test (recovers from connection drop) correctly asserts toHaveCount(1) for the edit card. This test uses .first() without asserting the count first. If the retry logic accidentally creates duplicate edit cards (e.g., the tool runs twice), this test would still pass while the user would see a confusing duplicate.
💡 Suggestion: Add await expect(editCard).toHaveCount(1) before .first(), matching the pattern in the first test.


Handle connection-drop and retry behavior in local agent IPC handling.
Add an end-to-end scenario that verifies recovery after a temporary local agent disconnect.
Align test fixtures and snapshots for local-agent reconnection behavior.