-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: add exponential backoff retry for LLM rate limits #1273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds retry mechanism for LLM API calls to handle transient rate limits gracefully instead of failing immediately. - Add withLLMRetry wrapper with exponential backoff (2s, 4s, 8s, max 60s) - Add extractLLMErrorInfo to detect rate limits across providers - Wrap createGenerateText and createGenerateObject with retry logic - Add tests for retry behavior
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| const status: number | undefined = | ||
| cause?.status ?? original?.status ?? cause?.response?.status; | ||
| const message: string = cause?.message ?? original?.message ?? ""; | ||
| const errorCode: string = | ||
| cause?.code ?? original?.code ?? cause?.error?.type ?? ""; | ||
|
|
||
| const isRateLimit = | ||
| status === 429 || | ||
| errorCode === "rate_limit_exceeded" || // OpenAI | ||
| errorCode === "rate_limit_error" || // Anthropic | ||
| errorCode === "RESOURCE_EXHAUSTED" || // Google | ||
| /rate.?limit/i.test(message) || | ||
| /quota.?exceeded/i.test(message) || | ||
| /too.?many.?requests/i.test(message); | ||
|
|
||
| const isServerError = | ||
| status === 500 || | ||
| status === 502 || | ||
| status === 503 || | ||
| status === 504 || | ||
| /internal.?error/i.test(message) || | ||
| /server.?error/i.test(message); | ||
|
|
||
| let retryAfterMs: number | undefined; | ||
| const headers = cause?.response?.headers; | ||
| const retryAfterHeader: string | undefined = | ||
| headers?.["retry-after"] ?? headers?.["x-ratelimit-reset-requests"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extractLLMErrorInfo only looks at status/cause.response.headers, but the ai SDK's APICallError exposes statusCode and responseHeaders (per the ai 5.0.106 docs). Because we never read the actual numeric status or retry headers that the provider surfaces, retryable becomes false for real 429/5xx responses and the new LLM retry never fires.
Prompt for AI Agents:
In apps/web/utils/llms/retry.ts around lines 154 to 180, the extractLLMErrorInfo
function only reads status and cause.response.headers, but some SDK errors expose
statusCode and responseHeaders (e.g. ai 5.x). Update this block to also read statusCode
(e.g. cause?.statusCode ?? original?.statusCode) when computing status, and to
normalize/accept response headers from multiple shapes (e.g. cause?.response?.headers,
cause?.responseHeaders, original?.responseHeaders). Ensure header keys are normalized to
lowercase before reading retry-after or x-ratelimit-reset-requests so numeric or date
retry values are parsed correctly. This will ensure 429 and Retry-After headers from
different providers are recognized and retryable is set correctly.
Finding type: Type Inconsistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 0a7d9c7 addressed this comment. The status extraction now includes statusCode from both cause and original objects (Hunk 0), and the headers extraction now checks multiple shapes including cause?.responseHeaders and original?.responseHeaders (Hunk 1). These changes ensure that AI SDK's APICallError properties are properly recognized, allowing 429 and 5xx responses to be correctly identified as retryable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - added statusCode and responseHeaders fallbacks to handle the AI SDK's APICallError shape.
| return pRetry(operation, { | ||
| retries: maxRetries, | ||
| onFailedAttempt: async (error) => { | ||
| const errorInfo = extractLLMErrorInfo(error); | ||
|
|
||
| if (!errorInfo.retryable) { | ||
| throw error; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pRetry's default exponential backoff (minTimeout=1s) runs before the custom sleep in onFailedAttempt, effectively ignoring Retry-After hints. Also, rethrowing error wraps failures in FailedAttemptError, hiding the original HTTP error shape from callers—should we throw error.originalError instead?
| return pRetry(operation, { | |
| retries: maxRetries, | |
| onFailedAttempt: async (error) => { | |
| const errorInfo = extractLLMErrorInfo(error); | |
| if (!errorInfo.retryable) { | |
| throw error; | |
| } | |
| return pRetry(operation, { | |
| retries: maxRetries, | |
| minTimeout: 0, | |
| maxTimeout: 0, | |
| factor: 1, | |
| onFailedAttempt: async (error) => { | |
| const failedAttemptError = error as any; | |
| const originalError = failedAttemptError.originalError ?? error; | |
| const errorInfo = extractLLMErrorInfo(originalError); | |
| if (!errorInfo.retryable) { | |
| throw originalError; | |
| } |
Finding types: Logical Bugs Breaking Changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 0a7d9c7 addressed this comment partially. The diff adds minTimeout: 0 to the pRetry configuration (Hunk 2, line 230), which prevents pRetry's default 1-second exponential backoff from interfering with the custom Retry-After handling. However, the second concern about unwrapping FailedAttemptError to throw originalError instead of error was not addressed in this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - added minTimeout: 0 to disable p-retry's built-in delay since we use custom delay logic via sleep().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 3 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="apps/web/utils/llms/retry.ts">
<violation number="1" location="apps/web/utils/llms/retry.ts:221">
P2: p-retry applies its own exponential backoff delay (default 1s base) after `onFailedAttempt` returns, causing double-delaying. Add `minTimeout: 0` to disable p-retry's built-in delay when using custom delay logic via `sleep()`.</violation>
</file>
<file name="apps/web/utils/llms/retry.test.ts">
<violation number="1" location="apps/web/utils/llms/retry.test.ts:239">
P2: Weak test assertion: `.rejects.toBeDefined()` only verifies that an error was thrown, not that it's the expected error. Use `.rejects.toThrow("Invalid API key")` or `.rejects.toThrow(authError)` to ensure the correct error propagates.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| ): Promise<T> { | ||
| const { label, maxRetries = 3 } = options; | ||
|
|
||
| return pRetry(operation, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: p-retry applies its own exponential backoff delay (default 1s base) after onFailedAttempt returns, causing double-delaying. Add minTimeout: 0 to disable p-retry's built-in delay when using custom delay logic via sleep().
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/utils/llms/retry.ts, line 221:
<comment>p-retry applies its own exponential backoff delay (default 1s base) after `onFailedAttempt` returns, causing double-delaying. Add `minTimeout: 0` to disable p-retry's built-in delay when using custom delay logic via `sleep()`.</comment>
<file context>
@@ -132,3 +133,115 @@ export async function withNetworkRetry<T>(
+): Promise<T> {
+ const { label, maxRetries = 3 } = options;
+
+ return pRetry(operation, {
+ retries: maxRetries,
+ onFailedAttempt: async (error) => {
</file context>
| return pRetry(operation, { | |
| return pRetry(operation, { | |
| minTimeout: 0, |
✅ Addressed in 0a7d9c7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - added minTimeout: 0 to prevent double-delaying.
apps/web/utils/llms/retry.test.ts
Outdated
| const authError = createError("Invalid API key", { status: 401 }); | ||
| const fn = vi.fn().mockRejectedValue(authError); | ||
|
|
||
| await expect(withLLMRetry(fn, { label: "test" })).rejects.toBeDefined(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Weak test assertion: .rejects.toBeDefined() only verifies that an error was thrown, not that it's the expected error. Use .rejects.toThrow("Invalid API key") or .rejects.toThrow(authError) to ensure the correct error propagates.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/utils/llms/retry.test.ts, line 239:
<comment>Weak test assertion: `.rejects.toBeDefined()` only verifies that an error was thrown, not that it's the expected error. Use `.rejects.toThrow("Invalid API key")` or `.rejects.toThrow(authError)` to ensure the correct error propagates.</comment>
<file context>
@@ -0,0 +1,266 @@
+ const authError = createError("Invalid API key", { status: 401 });
+ const fn = vi.fn().mockRejectedValue(authError);
+
+ await expect(withLLMRetry(fn, { label: "test" })).rejects.toBeDefined();
+ expect(fn).toHaveBeenCalledTimes(1);
+ });
</file context>
| await expect(withLLMRetry(fn, { label: "test" })).rejects.toBeDefined(); | |
| await expect(withLLMRetry(fn, { label: "test" })).rejects.toThrow("Invalid API key"); |
✅ Addressed in 0a7d9c7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 0a7d9c7 addressed this comment. The weak .rejects.toBeDefined() assertion was replaced with .rejects.toMatchObject({ error: { message: "Invalid API key" } }), which now verifies the specific error message rather than just checking that any error was thrown. This ensures the correct error propagates as requested in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - changed to .rejects.toMatchObject({ error: { message: "Invalid API key" } }) to verify the correct error is thrown (accounting for p-retry's FailedAttemptError wrapper).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I've saved this as a new learning to improve future reviews.
- Add statusCode fallback for AI SDK APICallError compatibility - Add responseHeaders fallback for different error shapes - Add minTimeout: 0 to prevent p-retry double-delaying - Strengthen test assertion for non-retryable errors
User description
Adds retry mechanism for LLM API calls to handle transient rate limits gracefully instead of failing immediately.
Fixes #1242
withLLMRetrywrapper using p-retry with exponential backoff (2s → 4s → 8s, max 60s)extractLLMErrorInfoto detect rate limits across providers (OpenAI, Anthropic, Google, OpenRouter)Retry-Afterheaders when presentcreateGenerateTextandcreateGenerateObjectwith retry logicGenerated description
Below is a concise technical summary of the changes proposed in this PR:
graph LR createGenerateText_("createGenerateText"):::modified withLLMRetry_("withLLMRetry"):::added createGenerateObject_("createGenerateObject"):::modified withNetworkRetry_("withNetworkRetry"):::modified extractLLMErrorInfo_("extractLLMErrorInfo"):::added P_RETRY_("P_RETRY"):::added shouldRetry_("shouldRetry"):::modified createGenerateText_ -- "Wraps text generation with rate-limit aware retries via p-retry." --> withLLMRetry_ createGenerateObject_ -- "Ensures object generation retries honor LLM rate limits." --> withLLMRetry_ withLLMRetry_ -- "Invokes network-layer retry inside each LLM retry attempt." --> withNetworkRetry_ withLLMRetry_ -- "Classifies errors to decide retryability and delay." --> extractLLMErrorInfo_ withLLMRetry_ -- "Uses p-retry library to orchestrate exponential backoff retries." --> P_RETRY_ createGenerateObject_ -- "Keeps validation/no-object predicate for retry decisions." --> shouldRetry_ classDef added stroke:#15AA7A classDef removed stroke:#CD5270 classDef modified stroke:#EDAC4C linkStyle default stroke:#CBD5E1,font-size:13pxEnhances the
inbox-zero-aimodule's reliability by introducing a newwithLLMRetrywrapper, which intelligently handles transient LLM API errors like rate limits and server issues using exponential backoff andRetry-Afterheaders. Integrates this robust retry logic into core LLM generation functions such ascreateGenerateTextandcreateGenerateObject.Latest Contributors(1)