Skip to content

fix(custom-resources): add retry logic for Lambda initialization errors in AwsCustomResource#36885

Draft
sai-ray wants to merge 1 commit intomainfrom
fix-35931
Draft

fix(custom-resources): add retry logic for Lambda initialization errors in AwsCustomResource#36885
sai-ray wants to merge 1 commit intomainfrom
fix-35931

Conversation

@sai-ray
Copy link

@sai-ray sai-ray commented Feb 4, 2026

Issue # (if applicable)

Closes #35931.

Reason for this change

When AwsCustomResource invokes a target Lambda function that is in a cold start state (inactive), the Lambda service returns an error "Lambda is initializing your function. It will be ready to invoke shortly." The current handler immediately fails without retrying, causing CloudFormation deployments to fail intermittently.

This is particularly problematic for:

  • Infrequently deployed stacks
  • Lambda functions that go inactive due to low traffic
  • Database migration Lambdas triggered during deployments

Description of changes

Added retry logic with exponential backoff to the AwsCustomResource handler to handle Lambda cold start errors. The implementation follows the established pattern from log-retention-handler.

Changes:

  • Added retry helper functions to utils.ts:
    • isRetryableError() - Detects Lambda initialization errors
    • calculateDelay() - Exponential backoff with jitter
    • sleep() - Async delay helper
    • makeWithRetry() - Retry wrapper function
  • Wrapped apiCall.invoke() in retry logic in index.ts
  • Added comprehensive unit tests for retry behavior

Retry Configuration:

  • Max retries: 5 attempts
  • Initial delay: 1000ms (1 second)
  • Max delay: 30000ms (30 seconds)
  • Backoff: 2x exponential with jitter

Retryable Errors:

  • ResourceNotReadyException error name
  • Error message containing "Lambda is initializing" (case-insensitive)

Describe any new or updated permissions being added

N/A - No IAM permission changes. This is a runtime behavior improvement in the Lambda handler.

Description of how you validated changes

  • Unit tests: Added 33 new unit tests covering:

    • isRetryableError() - 5 tests for error detection
    • calculateDelay() - 3 tests for backoff calculation
    • makeWithRetry() - 9 tests for retry wrapper behavior
    • Handler retry behavior - 9 tests including:
      • Success on first attempt (no retry needed)
      • Success after retries on Lambda initialization error
      • Success after retries on ResourceNotReadyException
      • Failure after max retries exceeded
      • Non-retryable errors fail immediately
      • ignoreErrorCodesMatching continues to work correctly
      • All request types (Create, Update, Delete) work with retry
      • Case-insensitive error message matching
  • Integration tests: Existing integration tests continue to pass unchanged

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added bug This issue is a bug. effort/medium Medium work item – several days of effort p2 beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK labels Feb 4, 2026
@aws-cdk-automation aws-cdk-automation requested a review from a team February 4, 2026 22:47
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Feb 4, 2026
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter fails with the following errors:

❌ Fixes must contain a change to an integration test file and the resulting snapshot.

If you believe this pull request should receive an exemption, please comment and provide a justification. A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed, add Clarification Request to a comment.

✅ A exemption request has been requested. Please wait for a maintainer's review.

@sai-ray
Copy link
Author

sai-ray commented Feb 4, 2026

Exemption Request

This PR does not include integration test changes because:

  1. Runtime-only change: This fix modifies the Lambda handler runtime behavior, not CloudFormation template generation. Integration tests in CDK verify synthesized CloudFormation templates, but this change produces identical templates.

  2. No snapshot changes: Since no CloudFormation resources or properties are modified, there are no snapshot changes to capture.

  3. Cannot be tested via integration tests: The "Lambda is initializing" error is a transient, timing-dependent condition that occurs when a Lambda function is in cold start state. This cannot be reliably reproduced in integration tests because:

    • It requires the target Lambda to be inactive for an extended period (typically 15+ minutes)
    • The error window is very short (1-10 seconds during initialization)
    • AWS Lambda's cold start behavior is non-deterministic
  4. Comprehensive unit test coverage: The PR includes 33 new unit tests that thoroughly cover:

    • Retry logic with mocked Lambda initialization errors
    • Exponential backoff calculation
    • Error detection for retryable vs non-retryable errors
    • All request types (Create, Update, Delete)
    • Edge cases (max retries exceeded, case-insensitive matching)
  5. Follows established pattern: This implementation follows the exact same pattern as log-retention-handler which also uses unit tests for retry logic validation rather than integration tests.

The fix is a transparent improvement to the handler's resilience that doesn't affect the CloudFormation contract or require user code changes.

@aws-cdk-automation aws-cdk-automation added the pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. label Feb 4, 2026
@sai-ray
Copy link
Author

sai-ray commented Feb 4, 2026

This PR was created with Kiro and is here for evaluation purposes. We have not yet decided whether we want to introduce this into the CDK.

@sai-ray sai-ray marked this pull request as draft February 5, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. contribution/core This is a PR that came from AWS. effort/medium Medium work item – several days of effort p2 pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cdk-custom-resource: Error for Lambda is initializing your function

2 participants