-
Notifications
You must be signed in to change notification settings - Fork 2.3k
fix: retry transient local agent server errors #3044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -80,14 +80,34 @@ import { | |||||||||||||||||||||||||
| checkAndMarkForCompaction, | ||||||||||||||||||||||||||
| } from "@/ipc/handlers/compaction/compaction_handler"; | ||||||||||||||||||||||||||
| import { getPostCompactionMessages } from "@/ipc/handlers/compaction/compaction_utils"; | ||||||||||||||||||||||||||
| import { DEFAULT_MAX_TOOL_CALL_STEPS } from "@/constants/settings_constants"; | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| const logger = log.scope("local_agent_handler"); | ||||||||||||||||||||||||||
| const PLANNING_QUESTIONNAIRE_TOOL_NAME = "planning_questionnaire"; | ||||||||||||||||||||||||||
| const MAX_TERMINATED_STREAM_RETRIES = 3; | ||||||||||||||||||||||||||
| const STREAM_RETRY_BASE_DELAY_MS = 400; | ||||||||||||||||||||||||||
| const STREAM_CONTINUE_MESSAGE = | ||||||||||||||||||||||||||
| "[System] Your previous response stream was interrupted by a transient network error. Continue from exactly where you left off and do not repeat text that has already been sent."; | ||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MEDIUM | user-experience / correctness Continuation instruction semantically wrong for provider errors
For the existing terminated-stream path this was appropriate (the TCP connection dropped mid-response), but provider errors can fire before any output is generated. 💡 Suggestion: Only set |
||||||||||||||||||||||||||
| import { DEFAULT_MAX_TOOL_CALL_STEPS } from "@/constants/settings_constants"; | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| const RETRYABLE_STREAM_ERROR_STATUS_CODES = new Set([ | ||||||||||||||||||||||||||
| 408, 429, 500, 502, 503, 504, | ||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MEDIUM | user-experience 429 rate-limit errors retried too aggressively 429 (Too Many Requests) is included in 💡 Suggestion: Either exclude 429 from automatic retry (and let the existing rate-limit error UI surface immediately), or apply a significantly longer minimum delay for 429s and respect |
||||||||||||||||||||||||||
| ]); | ||||||||||||||||||||||||||
| const RETRYABLE_STREAM_ERROR_PATTERNS = [ | ||||||||||||||||||||||||||
| "server_error", | ||||||||||||||||||||||||||
| "internal server error", | ||||||||||||||||||||||||||
| "service unavailable", | ||||||||||||||||||||||||||
| "bad gateway", | ||||||||||||||||||||||||||
| "gateway timeout", | ||||||||||||||||||||||||||
| "too many requests", | ||||||||||||||||||||||||||
| "rate_limit", | ||||||||||||||||||||||||||
| "overloaded", | ||||||||||||||||||||||||||
| "timeout", | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
| "econnrefused", | ||||||||||||||||||||||||||
| "enotfound", | ||||||||||||||||||||||||||
| "econnreset", | ||||||||||||||||||||||||||
| "epipe", | ||||||||||||||||||||||||||
| "etimedout", | ||||||||||||||||||||||||||
|
Comment on lines
+92
to
+108
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider grouping these new constants ( const MAX_TERMINATED_STREAM_RETRIES = 3;
const STREAM_RETRY_BASE_DELAY_MS = 400;
const STREAM_CONTINUE_MESSAGE =
"[System] Your previous response stream was interrupted by a transient network error. Continue from exactly where you left off and do not repeat text that has already been sent.";
const RETRYABLE_STREAM_ERROR_STATUS_CODES = new Set([
408, 429, 500, 502, 503, 504,
]);
const RETRYABLE_STREAM_ERROR_PATTERNS = [
"server_error",
"internal server error",
"service unavailable",
"bad gateway",
"gateway timeout",
"too many requests",
"rate_limit",
"overloaded",
"timeout",
"econnrefused",
"enotfound",
"econnreset",
"epipe",
"etimedout",
]; |
||||||||||||||||||||||||||
| ]; | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| // ============================================================================ | ||||||||||||||||||||||||||
| // Tool Streaming State Management | ||||||||||||||||||||||||||
|
|
@@ -994,7 +1014,7 @@ export async function handleLocalAgentStream( | |||||||||||||||||||||||||
| streamErrorFromIteration ?? streamErrorFromCallback; | ||||||||||||||||||||||||||
| if (streamError) { | ||||||||||||||||||||||||||
| if ( | ||||||||||||||||||||||||||
| shouldRetryTerminatedStreamError({ | ||||||||||||||||||||||||||
| shouldRetryTransientStreamError({ | ||||||||||||||||||||||||||
| error: streamError, | ||||||||||||||||||||||||||
| retryCount: terminatedRetryCount, | ||||||||||||||||||||||||||
| aborted: abortController.signal.aborted, | ||||||||||||||||||||||||||
|
|
@@ -1043,7 +1063,7 @@ export async function handleLocalAgentStream( | |||||||||||||||||||||||||
| responseMessages = response.messages; | ||||||||||||||||||||||||||
| } catch (err) { | ||||||||||||||||||||||||||
| if ( | ||||||||||||||||||||||||||
| shouldRetryTerminatedStreamError({ | ||||||||||||||||||||||||||
| shouldRetryTransientStreamError({ | ||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Exhausted-retries telemetry missing for new provider errorsLow Severity In the response_finalization phase, the telemetry guard at line 1096 still only checks Additional Locations (1) |
||||||||||||||||||||||||||
| error: err, | ||||||||||||||||||||||||||
| retryCount: terminatedRetryCount, | ||||||||||||||||||||||||||
| aborted: abortController.signal.aborted, | ||||||||||||||||||||||||||
|
|
@@ -1329,7 +1349,43 @@ function isTerminatedStreamError(error: unknown): boolean { | |||||||||||||||||||||||||
| return false; | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| function shouldRetryTerminatedStreamError(params: { | ||||||||||||||||||||||||||
| function isRetryableProviderStreamError(error: unknown): boolean { | ||||||||||||||||||||||||||
| const normalized = unwrapStreamError(error); | ||||||||||||||||||||||||||
| if (!isRecord(normalized)) { | ||||||||||||||||||||||||||
| return false; | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| const statusCode = | ||||||||||||||||||||||||||
| (typeof normalized.statusCode === "number" && normalized.statusCode) || | ||||||||||||||||||||||||||
| (typeof normalized.status === "number" && normalized.status) || | ||||||||||||||||||||||||||
| (isRecord(normalized.response) && | ||||||||||||||||||||||||||
| typeof normalized.response.status === "number" | ||||||||||||||||||||||||||
| ? normalized.response.status | ||||||||||||||||||||||||||
| : undefined); | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| if ( | ||||||||||||||||||||||||||
| typeof statusCode === "number" && | ||||||||||||||||||||||||||
| (statusCode >= 500 || RETRYABLE_STREAM_ERROR_STATUS_CODES.has(statusCode)) | ||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. P2: Prompt for AI agents
Suggested change
|
||||||||||||||||||||||||||
| ) { | ||||||||||||||||||||||||||
| return true; | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
Comment on lines
+1365
to
+1370
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Consider replacing the condition with just the Set lookup:
Suggested change
Prompt To Fix With AIThis is a comment left during a code review.
Path: src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts
Line: 1366-1371
Comment:
**`>= 500` makes the explicit Set redundant and retries non-retryable codes**
`statusCode >= 500` catches every 5xx code including non-transient ones like `501 Not Implemented` and `505 HTTP Version Not Supported`, which a provider would never recover from on a retry. Because the Set already enumerates the exact 5xx codes worth retrying (`500, 502, 503, 504`) alongside the 4xx ones (`408, 429`), the `>= 500` branch is both overly broad and redundant.
Consider replacing the condition with just the Set lookup:
```suggestion
if (
typeof statusCode === "number" &&
RETRYABLE_STREAM_ERROR_STATUS_CODES.has(statusCode)
) {
return true;
}
```
How can I resolve this? If you propose a fix, please make it concise. |
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| const errorString = | ||||||||||||||||||||||||||
| [ | ||||||||||||||||||||||||||
| typeof normalized.message === "string" ? normalized.message : undefined, | ||||||||||||||||||||||||||
| typeof normalized.code === "string" ? normalized.code : undefined, | ||||||||||||||||||||||||||
| typeof normalized.type === "string" ? normalized.type : undefined, | ||||||||||||||||||||||||||
| ] | ||||||||||||||||||||||||||
| .filter(Boolean) | ||||||||||||||||||||||||||
| .join(" ") | ||||||||||||||||||||||||||
| .toLowerCase() || getErrorMessage(normalized).toLowerCase(); | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| return RETRYABLE_STREAM_ERROR_PATTERNS.some((pattern) => | ||||||||||||||||||||||||||
| errorString.includes(pattern), | ||||||||||||||||||||||||||
| ); | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| function shouldRetryTransientStreamError(params: { | ||||||||||||||||||||||||||
| error: unknown; | ||||||||||||||||||||||||||
| retryCount: number; | ||||||||||||||||||||||||||
| aborted: boolean; | ||||||||||||||||||||||||||
|
|
@@ -1338,7 +1394,7 @@ function shouldRetryTerminatedStreamError(params: { | |||||||||||||||||||||||||
| return ( | ||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MEDIUM | observability Exhausted-retries telemetry misses new provider errors The Note: the stream-iteration exhaustion path (~line 1047) fires unconditionally, so only the response-finalization path has the gap. 💡 Suggestion: Change the condition at ~line 1096 to |
||||||||||||||||||||||||||
| !aborted && | ||||||||||||||||||||||||||
| retryCount < MAX_TERMINATED_STREAM_RETRIES && | ||||||||||||||||||||||||||
| isTerminatedStreamError(error) | ||||||||||||||||||||||||||
| (isTerminatedStreamError(error) || isRetryableProviderStreamError(error)) | ||||||||||||||||||||||||||
| ); | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Missing "retries exhausted" telemetry for retryable provider errors in response finalization phase
The PR expands
shouldRetryTransientStreamError(line 1066) to cover both terminated errors and retryable provider errors, but the fallback telemetry guard atlocal_agent_handler.ts:1096still only checksisTerminatedStreamError(err). When a retryable provider error (e.g., 500 server_error) exhausts itsMAX_TERMINATED_STREAM_RETRIESretries during the response finalization phase, no"local_agent:terminated_stream_retries_exhausted"telemetry event is emitted — unlike the stream iteration error site (local_agent_handler.ts:1047) which unconditionally sends the telemetry. This creates an observability blind spot for the newly added error types.(Refers to lines 1096-1107)
Was this helpful? React with 👍 or 👎 to provide feedback.