fix(custom-resources): add retry logic for Lambda initialization errors in AwsCustomResource#36885
Conversation
…rs in AwsCustomResource
There was a problem hiding this comment.
The pull request linter fails with the following errors:
❌ Fixes must contain a change to an integration test file and the resulting snapshot.
If you believe this pull request should receive an exemption, please comment and provide a justification. A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed, add Clarification Request to a comment.
✅ A exemption request has been requested. Please wait for a maintainer's review.
|
Exemption Request This PR does not include integration test changes because:
The fix is a transparent improvement to the handler's resilience that doesn't affect the CloudFormation contract or require user code changes. |
|
This PR was created with Kiro and is here for evaluation purposes. We have not yet decided whether we want to introduce this into the CDK. |
Issue # (if applicable)
Closes #35931.
Reason for this change
When
AwsCustomResourceinvokes a target Lambda function that is in a cold start state (inactive), the Lambda service returns an error "Lambda is initializing your function. It will be ready to invoke shortly." The current handler immediately fails without retrying, causing CloudFormation deployments to fail intermittently.This is particularly problematic for:
Description of changes
Added retry logic with exponential backoff to the
AwsCustomResourcehandler to handle Lambda cold start errors. The implementation follows the established pattern fromlog-retention-handler.Changes:
utils.ts:isRetryableError()- Detects Lambda initialization errorscalculateDelay()- Exponential backoff with jittersleep()- Async delay helpermakeWithRetry()- Retry wrapper functionapiCall.invoke()in retry logic inindex.tsRetry Configuration:
Retryable Errors:
ResourceNotReadyExceptionerror nameDescribe any new or updated permissions being added
N/A - No IAM permission changes. This is a runtime behavior improvement in the Lambda handler.
Description of how you validated changes
Unit tests: Added 33 new unit tests covering:
isRetryableError()- 5 tests for error detectioncalculateDelay()- 3 tests for backoff calculationmakeWithRetry()- 9 tests for retry wrapper behaviorignoreErrorCodesMatchingcontinues to work correctlyIntegration tests: Existing integration tests continue to pass unchanged
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license