Skip to content

Conversation

@elie222
Copy link
Owner

@elie222 elie222 commented Jan 14, 2026

User description

Adds retry mechanism for LLM API calls to handle transient rate limits gracefully instead of failing immediately.

Fixes #1242

  • Add withLLMRetry wrapper using p-retry with exponential backoff (2s → 4s → 8s, max 60s)
  • Add extractLLMErrorInfo to detect rate limits across providers (OpenAI, Anthropic, Google, OpenRouter)
  • Honors Retry-After headers when present
  • Wrap createGenerateText and createGenerateObject with retry logic
  • Add unit tests for retry behavior

Generated description

Below is a concise technical summary of the changes proposed in this PR:

graph LR
createGenerateText_("createGenerateText"):::modified
withLLMRetry_("withLLMRetry"):::added
createGenerateObject_("createGenerateObject"):::modified
withNetworkRetry_("withNetworkRetry"):::modified
extractLLMErrorInfo_("extractLLMErrorInfo"):::added
P_RETRY_("P_RETRY"):::added
shouldRetry_("shouldRetry"):::modified
createGenerateText_ -- "Wraps text generation with rate-limit aware retries via p-retry." --> withLLMRetry_
createGenerateObject_ -- "Ensures object generation retries honor LLM rate limits." --> withLLMRetry_
withLLMRetry_ -- "Invokes network-layer retry inside each LLM retry attempt." --> withNetworkRetry_
withLLMRetry_ -- "Classifies errors to decide retryability and delay." --> extractLLMErrorInfo_
withLLMRetry_ -- "Uses p-retry library to orchestrate exponential backoff retries." --> P_RETRY_
createGenerateObject_ -- "Keeps validation/no-object predicate for retry decisions." --> shouldRetry_
classDef added stroke:#15AA7A
classDef removed stroke:#CD5270
classDef modified stroke:#EDAC4C
linkStyle default stroke:#CBD5E1,font-size:13px
Loading

Enhances the inbox-zero-ai module's reliability by introducing a new withLLMRetry wrapper, which intelligently handles transient LLM API errors like rate limits and server issues using exponential backoff and Retry-After headers. Integrates this robust retry logic into core LLM generation functions such as createGenerateText and createGenerateObject.

Latest Contributors(1)
UserCommitDate
elie222feat-add-getActionErro...January 08, 2026
This pull request is reviewed by Baz. Review like a pro on (Baz).

Adds retry mechanism for LLM API calls to handle transient rate limits
gracefully instead of failing immediately.

- Add withLLMRetry wrapper with exponential backoff (2s, 4s, 8s, max 60s)
- Add extractLLMErrorInfo to detect rate limits across providers
- Wrap createGenerateText and createGenerateObject with retry logic
- Add tests for retry behavior
@vercel
Copy link

vercel bot commented Jan 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
inbox-zero Ready Ready Preview Jan 14, 2026 0:48am

Comment on lines 154 to 180
const status: number | undefined =
cause?.status ?? original?.status ?? cause?.response?.status;
const message: string = cause?.message ?? original?.message ?? "";
const errorCode: string =
cause?.code ?? original?.code ?? cause?.error?.type ?? "";

const isRateLimit =
status === 429 ||
errorCode === "rate_limit_exceeded" || // OpenAI
errorCode === "rate_limit_error" || // Anthropic
errorCode === "RESOURCE_EXHAUSTED" || // Google
/rate.?limit/i.test(message) ||
/quota.?exceeded/i.test(message) ||
/too.?many.?requests/i.test(message);

const isServerError =
status === 500 ||
status === 502 ||
status === 503 ||
status === 504 ||
/internal.?error/i.test(message) ||
/server.?error/i.test(message);

let retryAfterMs: number | undefined;
const headers = cause?.response?.headers;
const retryAfterHeader: string | undefined =
headers?.["retry-after"] ?? headers?.["x-ratelimit-reset-requests"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractLLMErrorInfo only looks at status/cause.response.headers, but the ai SDK's APICallError exposes statusCode and responseHeaders (per the ai 5.0.106 docs). Because we never read the actual numeric status or retry headers that the provider surfaces, retryable becomes false for real 429/5xx responses and the new LLM retry never fires.

Prompt for AI Agents:

In apps/web/utils/llms/retry.ts around lines 154 to 180, the extractLLMErrorInfo
function only reads status and cause.response.headers, but some SDK errors expose
statusCode and responseHeaders (e.g. ai 5.x). Update this block to also read statusCode
(e.g. cause?.statusCode ?? original?.statusCode) when computing status, and to
normalize/accept response headers from multiple shapes (e.g. cause?.response?.headers,
cause?.responseHeaders, original?.responseHeaders). Ensure header keys are normalized to
lowercase before reading retry-after or x-ratelimit-reset-requests so numeric or date
retry values are parsed correctly. This will ensure 429 and Retry-After headers from
different providers are recognized and retryable is set correctly.

Fix in Cursor


Finding type: Type Inconsistency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 0a7d9c7 addressed this comment. The status extraction now includes statusCode from both cause and original objects (Hunk 0), and the headers extraction now checks multiple shapes including cause?.responseHeaders and original?.responseHeaders (Hunk 1). These changes ensure that AI SDK's APICallError properties are properly recognized, allowing 429 and 5xx responses to be correctly identified as retryable.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - added statusCode and responseHeaders fallbacks to handle the AI SDK's APICallError shape.

Comment on lines 221 to 228
return pRetry(operation, {
retries: maxRetries,
onFailedAttempt: async (error) => {
const errorInfo = extractLLMErrorInfo(error);

if (!errorInfo.retryable) {
throw error;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pRetry's default exponential backoff (minTimeout=1s) runs before the custom sleep in onFailedAttempt, effectively ignoring Retry-After hints. Also, rethrowing error wraps failures in FailedAttemptError, hiding the original HTTP error shape from callers—should we throw error.originalError instead?

Suggested change
return pRetry(operation, {
retries: maxRetries,
onFailedAttempt: async (error) => {
const errorInfo = extractLLMErrorInfo(error);
if (!errorInfo.retryable) {
throw error;
}
return pRetry(operation, {
retries: maxRetries,
minTimeout: 0,
maxTimeout: 0,
factor: 1,
onFailedAttempt: async (error) => {
const failedAttemptError = error as any;
const originalError = failedAttemptError.originalError ?? error;
const errorInfo = extractLLMErrorInfo(originalError);
if (!errorInfo.retryable) {
throw originalError;
}

Finding types: Logical Bugs Breaking Changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 0a7d9c7 addressed this comment partially. The diff adds minTimeout: 0 to the pRetry configuration (Hunk 2, line 230), which prevents pRetry's default 1-second exponential backoff from interfering with the custom Retry-After handling. However, the second concern about unwrapping FailedAttemptError to throw originalError instead of error was not addressed in this commit.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - added minTimeout: 0 to disable p-retry's built-in delay since we use custom delay logic via sleep().

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 3 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="apps/web/utils/llms/retry.ts">

<violation number="1" location="apps/web/utils/llms/retry.ts:221">
P2: p-retry applies its own exponential backoff delay (default 1s base) after `onFailedAttempt` returns, causing double-delaying. Add `minTimeout: 0` to disable p-retry's built-in delay when using custom delay logic via `sleep()`.</violation>
</file>

<file name="apps/web/utils/llms/retry.test.ts">

<violation number="1" location="apps/web/utils/llms/retry.test.ts:239">
P2: Weak test assertion: `.rejects.toBeDefined()` only verifies that an error was thrown, not that it's the expected error. Use `.rejects.toThrow("Invalid API key")` or `.rejects.toThrow(authError)` to ensure the correct error propagates.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

): Promise<T> {
const { label, maxRetries = 3 } = options;

return pRetry(operation, {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: p-retry applies its own exponential backoff delay (default 1s base) after onFailedAttempt returns, causing double-delaying. Add minTimeout: 0 to disable p-retry's built-in delay when using custom delay logic via sleep().

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/utils/llms/retry.ts, line 221:

<comment>p-retry applies its own exponential backoff delay (default 1s base) after `onFailedAttempt` returns, causing double-delaying. Add `minTimeout: 0` to disable p-retry's built-in delay when using custom delay logic via `sleep()`.</comment>

<file context>
@@ -132,3 +133,115 @@ export async function withNetworkRetry<T>(
+): Promise<T> {
+  const { label, maxRetries = 3 } = options;
+
+  return pRetry(operation, {
+    retries: maxRetries,
+    onFailedAttempt: async (error) => {
</file context>
Suggested change
return pRetry(operation, {
return pRetry(operation, {
minTimeout: 0,

✅ Addressed in 0a7d9c7

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - added minTimeout: 0 to prevent double-delaying.

const authError = createError("Invalid API key", { status: 401 });
const fn = vi.fn().mockRejectedValue(authError);

await expect(withLLMRetry(fn, { label: "test" })).rejects.toBeDefined();
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Weak test assertion: .rejects.toBeDefined() only verifies that an error was thrown, not that it's the expected error. Use .rejects.toThrow("Invalid API key") or .rejects.toThrow(authError) to ensure the correct error propagates.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/utils/llms/retry.test.ts, line 239:

<comment>Weak test assertion: `.rejects.toBeDefined()` only verifies that an error was thrown, not that it's the expected error. Use `.rejects.toThrow("Invalid API key")` or `.rejects.toThrow(authError)` to ensure the correct error propagates.</comment>

<file context>
@@ -0,0 +1,266 @@
+    const authError = createError("Invalid API key", { status: 401 });
+    const fn = vi.fn().mockRejectedValue(authError);
+
+    await expect(withLLMRetry(fn, { label: "test" })).rejects.toBeDefined();
+    expect(fn).toHaveBeenCalledTimes(1);
+  });
</file context>
Suggested change
await expect(withLLMRetry(fn, { label: "test" })).rejects.toBeDefined();
await expect(withLLMRetry(fn, { label: "test" })).rejects.toThrow("Invalid API key");

✅ Addressed in 0a7d9c7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 0a7d9c7 addressed this comment. The weak .rejects.toBeDefined() assertion was replaced with .rejects.toMatchObject({ error: { message: "Invalid API key" } }), which now verifies the specific error message rather than just checking that any error was thrown. This ensures the correct error propagates as requested in the comment.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - changed to .rejects.toMatchObject({ error: { message: "Invalid API key" } }) to verify the correct error is thrown (accounting for p-retry's FailedAttemptError wrapper).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've saved this as a new learning to improve future reviews.

- Add statusCode fallback for AI SDK APICallError compatibility
- Add responseHeaders fallback for different error shapes
- Add minTimeout: 0 to prevent p-retry double-delaying
- Strengthen test assertion for non-retryable errors
@elie222 elie222 merged commit 7b0b97a into main Jan 14, 2026
38 checks passed
@elie222 elie222 deleted the fix/llm-rate-limit-retry branch January 14, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: Add exponential backoff for LLM rate limits and preserve failed emails for retry

2 participants