Skip to content

fix: retry transient local agent server errors#3044

Merged
wwwillchen merged 3 commits intodyad-sh:mainfrom
wwwillchen-bot:fix-local-agent-azure-retry
Mar 20, 2026
Merged

fix: retry transient local agent server errors#3044
wwwillchen merged 3 commits intodyad-sh:mainfrom
wwwillchen-bot:fix-local-agent-azure-retry

Conversation

@wwwillchen
Copy link
Copy Markdown
Collaborator

@wwwillchen wwwillchen commented Mar 19, 2026

Summary

  • retry local-agent stream passes when providers emit transient server errors like Azure server_error
  • keep the existing terminated-stream continuation path and apply it to retryable provider-side failures
  • add a regression test covering the structured provider error event shape

Test plan

  • npm run fmt
  • npm run lint:fix
  • npm run ts
  • npm test

🤖 Generated with Claude Code


Open with Devin

@wwwillchen
Copy link
Copy Markdown
Collaborator Author

@BugBot run

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the resilience of the local agent's streaming capabilities by introducing a robust retry mechanism for transient server errors originating from external providers. By intelligently retrying failed stream passes and leveraging existing continuation paths, the system can gracefully recover from temporary service interruptions, leading to a more stable and reliable user experience.

Highlights

  • Retry Mechanism for Transient Errors: Implemented retry logic for local agent stream when providers encounter transient server errors, such as Azure server_error.
  • Enhanced Stream Continuation: Extended the existing terminated-stream continuation mechanism to handle newly identified retryable provider-side failures.
  • Regression Test Coverage: Added a new regression test to validate the handling of structured provider error events.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@wwwillchen
Copy link
Copy Markdown
Collaborator Author

@BugBot run

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a robust retry mechanism for transient server errors encountered during the local agent stream processing. This significantly improves the reliability and resilience of the AI agent by allowing it to recover from temporary network or provider-side issues. A new test case has been added to validate this functionality, which is a good practice. The refactoring of the retry logic and the introduction of specific retryable error patterns are well-implemented.

Comment on lines +92 to +109
const RETRYABLE_STREAM_ERROR_STATUS_CODES = new Set([
408, 429, 500, 502, 503, 504,
]);
const RETRYABLE_STREAM_ERROR_PATTERNS = [
"server_error",
"internal server error",
"service unavailable",
"bad gateway",
"gateway timeout",
"too many requests",
"rate_limit",
"overloaded",
"timeout",
"econnrefused",
"enotfound",
"econnreset",
"epipe",
"etimedout",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider grouping these new constants (RETRYABLE_STREAM_ERROR_STATUS_CODES, RETRYABLE_STREAM_ERROR_PATTERNS) with the other related constants (e.g., MAX_TERMINATED_STREAM_RETRIES, STREAM_RETRY_BASE_DELAY_MS, STREAM_CONTINUE_MESSAGE) for better organization and readability. This would make it easier to find all configuration-like values in one place.

const MAX_TERMINATED_STREAM_RETRIES = 3;
const STREAM_RETRY_BASE_DELAY_MS = 400;
const STREAM_CONTINUE_MESSAGE =
  "[System] Your previous response stream was interrupted by a transient network error. Continue from exactly where you left off and do not repeat text that has already been sent.";

const RETRYABLE_STREAM_ERROR_STATUS_CODES = new Set([
  408, 429, 500, 502, 503, 504,
]);
const RETRYABLE_STREAM_ERROR_PATTERNS = [
  "server_error",
  "internal server error",
  "service unavailable",
  "bad gateway",
  "gateway timeout",
  "too many requests",
  "rate_limit",
  "overloaded",
  "timeout",
  "econnrefused",
  "enotfound",
  "econnreset",
  "epipe",
  "etimedout",
];

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR extends the local agent stream retry logic to cover structured provider-side errors (e.g. Azure server_error) in addition to transport-level "terminated" errors. It adds isRetryableProviderStreamError which checks HTTP status codes and a set of error string patterns, renames shouldRetryTerminatedStreamError to shouldRetryTransientStreamError, adds a focused regression test, and documents the behaviour in rules/local-agent-tools.md.

  • isRetryableProviderStreamError correctly unwraps Azure's { type: "error", error: { type: "server_error", ... } } shape and matches it via RETRYABLE_STREAM_ERROR_PATTERNS, which the new test validates end-to-end.
  • The telemetry event name "local_agent:terminated_stream_retry" (lines 1033 and 1082) is now a misnomer — it fires for provider errors too. This will conflate two distinct failure types in any monitoring dashboard.
  • unwrapStreamError is called before reading the status code, which discards any statusCode/status on the outer error envelope. A 408 Request Timeout error wrapped as { status: 408, error: { message: "..." } } would lose its status code and fall through to string-pattern matching where no "request timeout" pattern exists, potentially silently skipping the retry.

Confidence Score: 4/5

  • Safe to merge with minor follow-up recommended for telemetry naming and the status-code unwrap edge case.
  • The core retry logic is sound and well-tested. The new isRetryableProviderStreamError function correctly handles the Azure structured error shape demonstrated in the PR description. Two non-blocking issues exist: the telemetry event name is now misleading, and unwrapStreamError called before the status-code extraction can silently drop a 408 from a wrapped envelope. Neither causes incorrect behaviour for the targeted use case, but both could become harder to debug later.
  • src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts — specifically the telemetry event names at lines 1033/1082 and the status-code extraction in isRetryableProviderStreamError (lines 1357–1363).

Important Files Changed

Filename Overview
src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts Adds isRetryableProviderStreamError to catch structured provider errors (e.g. Azure server_error) and renames shouldRetryTerminatedStreamErrorshouldRetryTransientStreamError. Two minor issues: the telemetry event name "local_agent:terminated_stream_retry" is now a misnomer, and unwrapStreamError drops the outer-envelope HTTP status code which could cause 408 errors to slip through unretried.
src/tests/local_agent_handler.test.ts Adds a regression test for the structured provider error retry path. The test correctly simulates Azure's { type: "error", error: { type: "server_error", ... } } shape, verifies attemptCount === 2, confirms no error events are emitted, and asserts the continuation instruction is injected into the second attempt's messages.
rules/local-agent-tools.md Adds a "Stream retries" section documenting that handleLocalAgentStream retry behavior should cover both transport errors and structured provider stream errors.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Stream iteration / response.await] -->|error thrown| B{aborted?}
    B -- yes --> C[Break / log]
    B -- no --> D{shouldRetryTransientStreamError?}
    D -->|retryCount >= MAX 3| E[throw error → surface to user]
    D -->|retryCount < MAX| F{isTerminatedStreamError?}
    F -- yes --> H[Schedule retry with continuation instruction]
    F -- no --> G{isRetryableProviderStreamError?}
    G -->|statusCode in set OR ≥500| H
    G -->|error string matches pattern| H
    G -- no match --> E
    H --> I[maybeAppendRetryReplayForRetry]
    I --> J[terminatedRetryCount++]
    J --> K[needsContinuationInstruction = true]
    K --> L[delay STREAM_RETRY_BASE_DELAY_MS × retryCount]
    L --> M[continue → restart stream pass]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts
Line: 1033-1039

Comment:
**Telemetry event name no longer reflects its scope**

The event name `"local_agent:terminated_stream_retry"` is now fired for both transport-level `"terminated"` errors and structured provider-side errors (e.g. Azure `server_error`). Any telemetry dashboards filtering on this event name will silently lump in provider retries under the old "terminated" bucket, making it hard to distinguish the two failure modes.

The same stale name is used again at line 1082 (the `response` await catch path).

Consider a name that covers both cases, and optionally add an `errorKind` field to differentiate:

```suggestion
              sendTelemetryEvent("local_agent:transient_stream_retry", {
                chatId: req.chatId,
                dyadRequestId,
                retryCount: terminatedRetryCount,
                error: String(streamError),
                phase: "stream_iteration",
              });
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts
Line: 1357-1363

Comment:
**`unwrapStreamError` discards the outer envelope's HTTP status code**

`unwrapStreamError` returns `error.error` when an `"error"` key is present, which strips any `statusCode` / `status` sitting on the outer wrapper. For a provider error shaped like `{ status: 408, error: { message: "Request Timeout" } }`, the `408` is lost and the code falls through to string-pattern matching — but `"request timeout"` is not in `RETRYABLE_STREAM_ERROR_PATTERNS`, so a legitimate 408 wrapped error would silently not be retried.

Consider extracting the status code from both levels before unwrapping:

```
const statusCode =
  (isRecord(error) && typeof (error as any).statusCode === "number" && (error as any).statusCode) ||
  (isRecord(error) && typeof (error as any).status === "number" && (error as any).status) ||
  // ... then fall through to the unwrapped object
```

Or alternatively, add `"request timeout"` to `RETRYABLE_STREAM_ERROR_PATTERNS` so the string-path still catches it.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "Remove 'timeout' err..."

Comment on lines +1366 to +1371
if (
typeof statusCode === "number" &&
(statusCode >= 500 || RETRYABLE_STREAM_ERROR_STATUS_CODES.has(statusCode))
) {
return true;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 >= 500 makes the explicit Set redundant and retries non-retryable codes

statusCode >= 500 catches every 5xx code including non-transient ones like 501 Not Implemented and 505 HTTP Version Not Supported, which a provider would never recover from on a retry. Because the Set already enumerates the exact 5xx codes worth retrying (500, 502, 503, 504) alongside the 4xx ones (408, 429), the >= 500 branch is both overly broad and redundant.

Consider replacing the condition with just the Set lookup:

Suggested change
if (
typeof statusCode === "number" &&
(statusCode >= 500 || RETRYABLE_STREAM_ERROR_STATUS_CODES.has(statusCode))
) {
return true;
}
if (
typeof statusCode === "number" &&
RETRYABLE_STREAM_ERROR_STATUS_CODES.has(statusCode)
) {
return true;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts
Line: 1366-1371

Comment:
**`>= 500` makes the explicit Set redundant and retries non-retryable codes**

`statusCode >= 500` catches every 5xx code including non-transient ones like `501 Not Implemented` and `505 HTTP Version Not Supported`, which a provider would never recover from on a retry. Because the Set already enumerates the exact 5xx codes worth retrying (`500, 502, 503, 504`) alongside the 4xx ones (`408, 429`), the `>= 500` branch is both overly broad and redundant.

Consider replacing the condition with just the Set lookup:

```suggestion
  if (
    typeof statusCode === "number" &&
    RETRYABLE_STREAM_ERROR_STATUS_CODES.has(statusCode)
  ) {
    return true;
  }
```

How can I resolve this? If you propose a fix, please make it concise.

@dyad-assistant
Copy link
Copy Markdown
Contributor

🔍 Dyadbot Code Review Summary

Verdict: 🤔 NOT SURE - Potential issues

Reviewed by 3 independent agents: Correctness Expert, Code Health Expert, UX Wizard.

Issues Summary

Severity File Issue
🟡 MEDIUM local_agent_handler.ts:1096 Exhausted-retries telemetry only fires for terminated errors, not new provider errors
🟡 MEDIUM local_agent_handler.ts:104 Bare timeout pattern is too broad — will match non-transient client timeouts
🟡 MEDIUM local_agent_handler.ts:92 429 rate-limit errors retried with same short delay (~400ms) as server errors
🟢 Low Priority Notes (4 items)
  • Continuation message inaccuracy - local_agent_handler.ts:89 — STREAM_CONTINUE_MESSAGE says "network error" but is now also sent for server-side provider errors
  • Redundant 5xx codes in Set - local_agent_handler.ts:92 — 500/502/503/504 in the Set are already covered by the >= 500 check; only 408 and 429 are needed
  • Stale naming - local_agent_handler.ts:87MAX_TERMINATED_STREAM_RETRIES / terminatedRetryCount no longer reflect the broadened retry scope
  • statusCode extraction fragility - local_agent_handler.ts:1358&& short-circuit yields false instead of undefined for edge cases; works but is confusing
🚫 Dropped False Positives (3 items)
  • No user feedback during retries — Dropped: Pre-existing behavior from the terminated-stream retry path, not introduced by this PR
  • Test only covers stream-iteration path — Dropped: The existing terminated-stream tests already validate the response-finalization retry pattern; adding another is nice-to-have
  • getErrorMessage fallback complexity — Dropped: Defensive fallback for edge-case error shapes is reasonable

Generated by Dyadbot multi-agent code review

} catch (err) {
if (
shouldRetryTerminatedStreamError({
shouldRetryTransientStreamError({
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exhausted-retries telemetry missing for new provider errors

Low Severity

In the response_finalization phase, the telemetry guard at line 1096 still only checks isTerminatedStreamError(err). Since shouldRetryTransientStreamError now also retries isRetryableProviderStreamError errors, when those new provider errors exhaust retries, the terminated_stream_retries_exhausted telemetry event silently won't fire. The stream_iteration phase (line 1047) correctly fires telemetry unconditionally on exhaustion, making this an inconsistency between the two paths.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor

@dyad-assistant dyad-assistant bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-agent review: 3 issue(s) found

"too many requests",
"rate_limit",
"overloaded",
"timeout",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | correctness

Bare timeout pattern is too broad

The substring 'timeout' will match any error whose message/code/type contains the word — including non-transient client-imposed timeouts (e.g., AbortSignal.timeout(), user-configured request timeouts, or messages like "connection timeout set too low"). This would cause up to 3 silent retries of non-retryable errors before surfacing the real failure.

The more specific 'etimedout' and 'gateway timeout' patterns already cover the network-level and HTTP 504 cases. Status code 408 is also handled by the Set.

💡 Suggestion: Remove the bare 'timeout' entry and rely on the existing specific patterns (etimedout, gateway timeout, status 408/504).

import { DEFAULT_MAX_TOOL_CALL_STEPS } from "@/constants/settings_constants";

const RETRYABLE_STREAM_ERROR_STATUS_CODES = new Set([
408, 429, 500, 502, 503, 504,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | user-experience

429 rate-limit errors retried too aggressively

429 (Too Many Requests) is included in RETRYABLE_STREAM_ERROR_STATUS_CODES and will be retried after only ~400ms-1200ms (linear backoff). Providers that emit 429 typically expect longer backoff (seconds to minutes) and may include a Retry-After header. Retrying in <2s will likely hit the rate limit again immediately, burning all 3 retry attempts and delaying the error the user sees by ~2.4s with no benefit.

💡 Suggestion: Either exclude 429 from automatic retry (and let the existing rate-limit error UI surface immediately), or apply a significantly longer minimum delay for 429s and respect Retry-After headers if present.

@@ -1338,7 +1394,7 @@ function shouldRetryTerminatedStreamError(params: {
return (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | observability

Exhausted-retries telemetry misses new provider errors

The terminated_stream_retries_exhausted telemetry event in the response-finalization path (~line 1096) is still gated on isTerminatedStreamError(err). After this PR, retries can also be exhausted by provider-side errors (e.g., Azure server_error, 429s) matched by isRetryableProviderStreamError — but those cases silently skip telemetry.

Note: the stream-iteration exhaustion path (~line 1047) fires unconditionally, so only the response-finalization path has the gap.

💡 Suggestion: Change the condition at ~line 1096 to isTerminatedStreamError(err) || isRetryableProviderStreamError(err).

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Missing "retries exhausted" telemetry for retryable provider errors in response finalization phase

The PR expands shouldRetryTransientStreamError (line 1066) to cover both terminated errors and retryable provider errors, but the fallback telemetry guard at local_agent_handler.ts:1096 still only checks isTerminatedStreamError(err). When a retryable provider error (e.g., 500 server_error) exhausts its MAX_TERMINATED_STREAM_RETRIES retries during the response finalization phase, no "local_agent:terminated_stream_retries_exhausted" telemetry event is emitted — unlike the stream iteration error site (local_agent_handler.ts:1047) which unconditionally sends the telemetry. This creates an observability blind spot for the newly added error types.

(Refers to lines 1096-1107)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Confidence score: 3/5

  • There is a concrete reliability risk in src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts: using statusCode >= 500 broadens retries to all 5xx responses, including non-transient cases that are unlikely to succeed on retry.
  • Because this issue is medium severity (6/10) with high confidence (9/10), it introduces meaningful behavior risk rather than a minor housekeeping concern.
  • The retry logic may ignore the intended curated RETRYABLE_STREAM_ERRORS behavior, which could cause unnecessary retry loops and delayed failure handling for users.
  • Pay close attention to src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts - ensure retry conditions exclude non-retryable 5xx statuses and align with the curated retry policy.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts">

<violation number="1" location="src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts:1368">
P2: `statusCode >= 500` catches every 5xx code including non-transient ones like `501 Not Implemented` and `505 HTTP Version Not Supported`, which will never recover on retry. This makes the curated `RETRYABLE_STREAM_ERROR_STATUS_CODES` Set redundant for 5xx codes and causes unnecessary retries for permanent failures. Replace the compound condition with just the Set lookup.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


if (
typeof statusCode === "number" &&
(statusCode >= 500 || RETRYABLE_STREAM_ERROR_STATUS_CODES.has(statusCode))
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: statusCode >= 500 catches every 5xx code including non-transient ones like 501 Not Implemented and 505 HTTP Version Not Supported, which will never recover on retry. This makes the curated RETRYABLE_STREAM_ERROR_STATUS_CODES Set redundant for 5xx codes and causes unnecessary retries for permanent failures. Replace the compound condition with just the Set lookup.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts, line 1368:

<comment>`statusCode >= 500` catches every 5xx code including non-transient ones like `501 Not Implemented` and `505 HTTP Version Not Supported`, which will never recover on retry. This makes the curated `RETRYABLE_STREAM_ERROR_STATUS_CODES` Set redundant for 5xx codes and causes unnecessary retries for permanent failures. Replace the compound condition with just the Set lookup.</comment>

<file context>
@@ -1329,7 +1349,43 @@ function isTerminatedStreamError(error: unknown): boolean {
+
+  if (
+    typeof statusCode === "number" &&
+    (statusCode >= 500 || RETRYABLE_STREAM_ERROR_STATUS_CODES.has(statusCode))
+  ) {
+    return true;
</file context>
Suggested change
(statusCode >= 500 || RETRYABLE_STREAM_ERROR_STATUS_CODES.has(statusCode))
RETRYABLE_STREAM_ERROR_STATUS_CODES.has(statusCode)
Fix with Cubic

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@github-actions github-actions bot added the needs-human:review-issue ai agent flagged an issue that requires human review label Mar 19, 2026
Removed 'timeout' from the list of error messages.
@wwwillchen
Copy link
Copy Markdown
Collaborator Author

@BugBot run

@wwwillchen wwwillchen merged commit 60e7444 into dyad-sh:main Mar 20, 2026
6 of 9 checks passed
@dyad-assistant
Copy link
Copy Markdown
Contributor

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge

Reviewed by 3 independent agents: Correctness Expert, Code Health Expert, UX Wizard.

Issues Summary

Severity File Issue
🟡 MEDIUM src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts:89 Continuation instruction semantically wrong for provider errors
🟢 Low Priority Notes (3 items)
  • Constant naming inconsistency - src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts:87 - MAX_TERMINATED_STREAM_RETRIES was not renamed to match shouldRetryTransientStreamError
  • No user feedback during retry delays - src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts - Users see a silent pause (up to ~2.4s) with no indication retries are happening
  • Error message doesn't mention retries - src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts - When retries exhaust, the user-facing error gives no context that automatic retries already occurred
🚫 Dropped False Positives (7 items)
  • Exhausted-retries telemetry missing for provider errors - Dropped: Already covered by existing comments from cursor[bot], dyad-assistant[bot], and devin-ai-integration[bot]
  • >= 500 makes Set redundant / catches non-transient 5xx - Dropped: Already covered by existing comments from greptile-apps[bot] and cubic-dev-ai[bot]
  • 429 rate-limit retried too aggressively - Dropped: Already covered by existing comment from dyad-assistant[bot]
  • timeout pattern too broad - Dropped: Already covered by existing comment from dyad-assistant[bot]
  • statusCode short-circuit returns false for status 0 - Dropped: HTTP status 0 is not a real scenario; typeof check downstream correctly filters it
  • Test may not cover Error-wrapped provider shape - Dropped: Speculative without evidence the AI SDK wraps differently; test covers the specific Azure shape from the PR description
  • Test throws vs yields error event - Dropped: The thrown-error path is the actual code path being exercised

Generated by Dyadbot multi-agent code review

Copy link
Copy Markdown
Contributor

@dyad-assistant dyad-assistant bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-agent review: 1 issue found

const MAX_TERMINATED_STREAM_RETRIES = 3;
const STREAM_RETRY_BASE_DELAY_MS = 400;
const STREAM_CONTINUE_MESSAGE =
"[System] Your previous response stream was interrupted by a transient network error. Continue from exactly where you left off and do not repeat text that has already been sent.";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | user-experience / correctness

Continuation instruction semantically wrong for provider errors

STREAM_CONTINUE_MESSAGE says "interrupted by a transient network error. Continue from exactly where you left off" — but when a provider emits a structured server error (e.g. Azure server_error) before any text was streamed, there is nothing to continue from. The model receives a misleading instruction to "continue" non-existent partial output.

For the existing terminated-stream path this was appropriate (the TCP connection dropped mid-response), but provider errors can fire before any output is generated.

💡 Suggestion: Only set needsContinuationInstruction = true when fullResponse is non-empty, or use a distinct retry message for provider errors that fired before any output (e.g. a simple "Please retry the request" instead of "continue from where you left off").

@github-actions
Copy link
Copy Markdown
Contributor

🎭 Playwright Test Results

❌ Some tests failed

OS Passed Failed Flaky Skipped
🍎 macOS 391 2 3 126
🪟 Windows 396 7 8 126

Summary: 787 passed, 9 failed, 11 flaky, 252 skipped

Failed Tests

🍎 macOS

  • queued_message.spec.ts > editing queued message restores attachments and selected components
    • Error: expect(locator).toBeVisible() failed
  • queued_message.spec.ts > canceling queued message edit clears restored components
    • Error: expect(locator).toBeVisible() failed

🪟 Windows

  • concurrent_chat.spec.ts > concurrent chat
    • Error: expect(locator).toBeVisible() failed
  • edit_code.spec.ts > edit code
    • Error: expect(locator).toBeVisible() failed
  • github.spec.ts > create and sync to new repo
    • Error: expect(locator).toHaveClass(expected) failed
  • github.spec.ts > create and sync to new repo - custom branch
    • TimeoutError: locator.click: Timeout 30000ms exceeded.
  • github.spec.ts > create and sync to existing repo
    • Error: expect(locator).toMatchAriaSnapshot(expected) failed
  • github.spec.ts > create and sync to existing repo - custom branch
    • Error: expect(locator).toMatchAriaSnapshot(expected) failed
  • setup_flow.spec.ts > Setup Flow > node.js install flow
    • TimeoutError: locator.dispatchEvent: Timeout 30000ms exceeded.

📋 Re-run Failing Tests (macOS)

Copy and paste to re-run all failing spec files locally:

npm run e2e \
  e2e-tests/queued_message.spec.ts

⚠️ Flaky Tests

🍎 macOS

  • fix_error.spec.ts > fix error with AI (passed after 1 retry)
  • logs_server.spec.ts > system messages UI shows server logs with correct type (passed after 1 retry)
  • setup_flow.spec.ts > Setup Flow > setup banner shows correct state when node.js is installed (passed after 1 retry)

🪟 Windows

  • chat_input.spec.ts > send button disabled during pending proposal (passed after 1 retry)
  • chat_input.spec.ts > send button disabled during pending proposal - reject (passed after 1 retry)
  • chat_mode.spec.ts > chat mode selector - ask mode (passed after 1 retry)
  • chat_tabs.spec.ts > right-click context menu: Close tabs to the right (passed after 2 retries)
  • edit_code.spec.ts > edit code edits the right file (passed after 1 retry)
  • github.spec.ts > should connect to GitHub using device flow (passed after 2 retries)
  • github.spec.ts > github clear integration settings (passed after 1 retry)
  • setup_flow.spec.ts > Setup Flow > setup banner shows correct state when node.js is installed (passed after 1 retry)

📊 View full report

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

1 issue from previous review remains unresolved.

Fix All in Cursor

Comment @cursor review or bugbot run to trigger another review on this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-human:review-issue ai agent flagged an issue that requires human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant