fix: add exponential backoff retry for LLM rate limits #1273

elie222 · 2026-01-14T00:20:07Z

User description

Adds retry mechanism for LLM API calls to handle transient rate limits gracefully instead of failing immediately.

Add withLLMRetry wrapper using p-retry with exponential backoff (2s → 4s → 8s, max 60s)
Add extractLLMErrorInfo to detect rate limits across providers (OpenAI, Anthropic, Google, OpenRouter)
Honors Retry-After headers when present
Wrap createGenerateText and createGenerateObject with retry logic
Add unit tests for retry behavior

Generated description

Below is a concise technical summary of the changes proposed in this PR:

graph LR
createGenerateText_("createGenerateText"):::modified
withLLMRetry_("withLLMRetry"):::added
createGenerateObject_("createGenerateObject"):::modified
withNetworkRetry_("withNetworkRetry"):::modified
extractLLMErrorInfo_("extractLLMErrorInfo"):::added
P_RETRY_("P_RETRY"):::added
shouldRetry_("shouldRetry"):::modified
createGenerateText_ -- "Wraps text generation with rate-limit aware retries via p-retry." --> withLLMRetry_
createGenerateObject_ -- "Ensures object generation retries honor LLM rate limits." --> withLLMRetry_
withLLMRetry_ -- "Invokes network-layer retry inside each LLM retry attempt." --> withNetworkRetry_
withLLMRetry_ -- "Classifies errors to decide retryability and delay." --> extractLLMErrorInfo_
withLLMRetry_ -- "Uses p-retry library to orchestrate exponential backoff retries." --> P_RETRY_
createGenerateObject_ -- "Keeps validation/no-object predicate for retry decisions." --> shouldRetry_
classDef added stroke:#15AA7A
classDef removed stroke:#CD5270
classDef modified stroke:#EDAC4C
linkStyle default stroke:#CBD5E1,font-size:13px

Enhances the inbox-zero-ai module's reliability by introducing a new withLLMRetry wrapper, which intelligently handles transient LLM API errors like rate limits and server issues using exponential backoff and Retry-After headers. Integrates this robust retry logic into core LLM generation functions such as createGenerateText and createGenerateObject.

Latest Contributors(1)

User	Commit	Date
elie222	feat-add-getActionErro...	January 08, 2026

This pull request is reviewed by Baz. Review like a pro on (Baz).

Adds retry mechanism for LLM API calls to handle transient rate limits gracefully instead of failing immediately. - Add withLLMRetry wrapper with exponential backoff (2s, 4s, 8s, max 60s) - Add extractLLMErrorInfo to detect rate limits across providers - Wrap createGenerateText and createGenerateObject with retry logic - Add tests for retry behavior

vercel · 2026-01-14T00:20:12Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
inbox-zero	Ready	Preview	Jan 14, 2026 0:48am

baz-reviewer · 2026-01-14T00:26:35Z

apps/web/utils/llms/retry.ts

+  const status: number | undefined =
+    cause?.status ?? original?.status ?? cause?.response?.status;
+  const message: string = cause?.message ?? original?.message ?? "";
+  const errorCode: string =
+    cause?.code ?? original?.code ?? cause?.error?.type ?? "";
+
+  const isRateLimit =
+    status === 429 ||
+    errorCode === "rate_limit_exceeded" || // OpenAI
+    errorCode === "rate_limit_error" || // Anthropic
+    errorCode === "RESOURCE_EXHAUSTED" || // Google
+    /rate.?limit/i.test(message) ||
+    /quota.?exceeded/i.test(message) ||
+    /too.?many.?requests/i.test(message);
+
+  const isServerError =
+    status === 500 ||
+    status === 502 ||
+    status === 503 ||
+    status === 504 ||
+    /internal.?error/i.test(message) ||
+    /server.?error/i.test(message);
+
+  let retryAfterMs: number | undefined;
+  const headers = cause?.response?.headers;
+  const retryAfterHeader: string | undefined =
+    headers?.["retry-after"] ?? headers?.["x-ratelimit-reset-requests"];


extractLLMErrorInfo only looks at status/cause.response.headers, but the ai SDK's APICallError exposes statusCode and responseHeaders (per the ai 5.0.106 docs). Because we never read the actual numeric status or retry headers that the provider surfaces, retryable becomes false for real 429/5xx responses and the new LLM retry never fires.

Prompt for AI Agents:

In apps/web/utils/llms/retry.ts around lines 154 to 180, the extractLLMErrorInfo function only reads status and cause.response.headers, but some SDK errors expose statusCode and responseHeaders (e.g. ai 5.x). Update this block to also read statusCode (e.g. cause?.statusCode ?? original?.statusCode) when computing status, and to normalize/accept response headers from multiple shapes (e.g. cause?.response?.headers, cause?.responseHeaders, original?.responseHeaders). Ensure header keys are normalized to lowercase before reading retry-after or x-ratelimit-reset-requests so numeric or date retry values are parsed correctly. This will ensure 429 and Retry-After headers from different providers are recognized and retryable is set correctly.

Finding type: Type Inconsistency

Commit 0a7d9c7 addressed this comment. The status extraction now includes statusCode from both cause and original objects (Hunk 0), and the headers extraction now checks multiple shapes including cause?.responseHeaders and original?.responseHeaders (Hunk 1). These changes ensure that AI SDK's APICallError properties are properly recognized, allowing 429 and 5xx responses to be correctly identified as retryable.

Fixed - added statusCode and responseHeaders fallbacks to handle the AI SDK's APICallError shape.

baz-reviewer · 2026-01-14T00:26:36Z

apps/web/utils/llms/retry.ts

+  return pRetry(operation, {
+    retries: maxRetries,
+    onFailedAttempt: async (error) => {
+      const errorInfo = extractLLMErrorInfo(error);
+
+      if (!errorInfo.retryable) {
+        throw error;
+      }


pRetry's default exponential backoff (minTimeout=1s) runs before the custom sleep in onFailedAttempt, effectively ignoring Retry-After hints. Also, rethrowing error wraps failures in FailedAttemptError, hiding the original HTTP error shape from callers—should we throw error.originalError instead?

Suggested change

return pRetry(operation, {

retries: maxRetries,

onFailedAttempt: async (error) => {

const errorInfo = extractLLMErrorInfo(error);

if (!errorInfo.retryable) {

throw error;

}

return pRetry(operation, {

retries: maxRetries,

minTimeout: 0,

maxTimeout: 0,

factor: 1,

onFailedAttempt: async (error) => {

const failedAttemptError = error as any;

const originalError = failedAttemptError.originalError ?? error;

const errorInfo = extractLLMErrorInfo(originalError);

if (!errorInfo.retryable) {

throw originalError;

}

Finding types: Logical Bugs Breaking Changes

Commit 0a7d9c7 addressed this comment partially. The diff adds minTimeout: 0 to the pRetry configuration (Hunk 2, line 230), which prevents pRetry's default 1-second exponential backoff from interfering with the custom Retry-After handling. However, the second concern about unwrapping FailedAttemptError to throw originalError instead of error was not addressed in this commit.

Fixed - added minTimeout: 0 to disable p-retry's built-in delay since we use custom delay logic via sleep().

cubic-dev-ai

2 issues found across 3 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="apps/web/utils/llms/retry.ts">

<violation number="1" location="apps/web/utils/llms/retry.ts:221">
P2: p-retry applies its own exponential backoff delay (default 1s base) after `onFailedAttempt` returns, causing double-delaying. Add `minTimeout: 0` to disable p-retry's built-in delay when using custom delay logic via `sleep()`.</violation>
</file>

<file name="apps/web/utils/llms/retry.test.ts">

<violation number="1" location="apps/web/utils/llms/retry.test.ts:239">
P2: Weak test assertion: `.rejects.toBeDefined()` only verifies that an error was thrown, not that it's the expected error. Use `.rejects.toThrow("Invalid API key")` or `.rejects.toThrow(authError)` to ensure the correct error propagates.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-01-14T00:30:12Z

apps/web/utils/llms/retry.ts

+): Promise<T> {
+  const { label, maxRetries = 3 } = options;
+
+  return pRetry(operation, {


P2: p-retry applies its own exponential backoff delay (default 1s base) after onFailedAttempt returns, causing double-delaying. Add minTimeout: 0 to disable p-retry's built-in delay when using custom delay logic via sleep().

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/utils/llms/retry.ts, line 221: <comment>p-retry applies its own exponential backoff delay (default 1s base) after `onFailedAttempt` returns, causing double-delaying. Add `minTimeout: 0` to disable p-retry's built-in delay when using custom delay logic via `sleep()`.</comment> <file context> @@ -132,3 +133,115 @@ export async function withNetworkRetry<T>( +): Promise<T> { + const { label, maxRetries = 3 } = options; + + return pRetry(operation, { + retries: maxRetries, + onFailedAttempt: async (error) => { </file context>

Suggested change

return pRetry(operation, {

return pRetry(operation, {

minTimeout: 0,

✅ Addressed in 0a7d9c7

Fixed - added minTimeout: 0 to prevent double-delaying.

cubic-dev-ai · 2026-01-14T00:30:12Z

apps/web/utils/llms/retry.test.ts

+    const authError = createError("Invalid API key", { status: 401 });
+    const fn = vi.fn().mockRejectedValue(authError);
+
+    await expect(withLLMRetry(fn, { label: "test" })).rejects.toBeDefined();


P2: Weak test assertion: .rejects.toBeDefined() only verifies that an error was thrown, not that it's the expected error. Use .rejects.toThrow("Invalid API key") or .rejects.toThrow(authError) to ensure the correct error propagates.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/utils/llms/retry.test.ts, line 239: <comment>Weak test assertion: `.rejects.toBeDefined()` only verifies that an error was thrown, not that it's the expected error. Use `.rejects.toThrow("Invalid API key")` or `.rejects.toThrow(authError)` to ensure the correct error propagates.</comment> <file context> @@ -0,0 +1,266 @@ + const authError = createError("Invalid API key", { status: 401 }); + const fn = vi.fn().mockRejectedValue(authError); + + await expect(withLLMRetry(fn, { label: "test" })).rejects.toBeDefined(); + expect(fn).toHaveBeenCalledTimes(1); + }); </file context>

Suggested change

await expect(withLLMRetry(fn, { label: "test" })).rejects.toBeDefined();

await expect(withLLMRetry(fn, { label: "test" })).rejects.toThrow("Invalid API key");

✅ Addressed in 0a7d9c7

Commit 0a7d9c7 addressed this comment. The weak .rejects.toBeDefined() assertion was replaced with .rejects.toMatchObject({ error: { message: "Invalid API key" } }), which now verifies the specific error message rather than just checking that any error was thrown. This ensures the correct error propagates as requested in the comment.

Fixed - changed to .rejects.toMatchObject({ error: { message: "Invalid API key" } }) to verify the correct error is thrown (accounting for p-retry's FailedAttemptError wrapper).

Thanks for the feedback! I've saved this as a new learning to improve future reviews.

- Add statusCode fallback for AI SDK APICallError compatibility - Add responseHeaders fallback for different error shapes - Add minTimeout: 0 to prevent p-retry double-delaying - Strengthen test assertion for non-retryable errors

vercel bot deployed to Preview January 14, 2026 00:23 View deployment

baz-reviewer bot reviewed Jan 14, 2026

View reviewed changes

cubic-dev-ai bot reviewed Jan 14, 2026

View reviewed changes

elie222 added 2 commits January 14, 2026 02:39

Merge branch 'main' into fix/llm-rate-limit-retry

b7ed987

fix: address PR review comments

0a7d9c7

- Add statusCode fallback for AI SDK APICallError compatibility - Add responseHeaders fallback for different error shapes - Add minTimeout: 0 to prevent p-retry double-delaying - Strengthen test assertion for non-retryable errors

vercel bot deployed to Preview January 14, 2026 00:48 View deployment

elie222 merged commit 7b0b97a into main Jan 14, 2026
38 checks passed

elie222 deleted the fix/llm-rate-limit-retry branch January 14, 2026 01:33

	return pRetry(operation, {
	return pRetry(operation, {
	minTimeout: 0,

	await expect(withLLMRetry(fn, { label: "test" })).rejects.toBeDefined();
	await expect(withLLMRetry(fn, { label: "test" })).rejects.toThrow("Invalid API key");

fix: add exponential backoff retry for LLM rate limits #1273

fix: add exponential backoff retry for LLM rate limits #1273

Uh oh!

Conversation

elie222 commented Jan 14, 2026 • edited by baz-reviewer bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Generated description

Uh oh!

vercel bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baz-reviewer bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

elie222 Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

elie222 Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elie222 Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

elie222 Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

elie222 commented Jan 14, 2026 •

edited by baz-reviewer bot

Loading

vercel bot commented Jan 14, 2026 •

edited

Loading

cubic-dev-ai bot Jan 14, 2026 •

edited

Loading

cubic-dev-ai bot Jan 14, 2026 •

edited

Loading