taskresource: wait for execution role credentials upon agent restart#4827
Merged
Conversation
8372714 to
5ae4c0b
Compare
446d0c2 to
b6b3a0a
Compare
b6b3a0a to
fa40298
Compare
fa40298 to
cb0b3da
Compare
cb0b3da to
9e4f457
Compare
9e4f457 to
d382456
Compare
d382456 to
0289725
Compare
0289725 to
7e34d5c
Compare
sparrc
reviewed
Dec 18, 2025
| // RequiresExecutionRoleCredentials returns true if the resource requires execution role credentials. | ||
| // We only need execution role credentials when there's an external config to pull from S3 | ||
| func (firelens *FirelensResource) RequiresExecutionRoleCredentials() bool { | ||
| return firelens.externalConfigType == ExternalConfigTypeS3 |
Contributor
There was a problem hiding this comment.
Can this not fallback to instance credentials?
Contributor
Author
There was a problem hiding this comment.
It can, but today it does not:
amazon-ecs-agent/agent/taskresource/firelens/firelens_unix.go
Lines 519 to 522 in 64b39fa
Contributor
Author
There was a problem hiding this comment.
I would consider adding the fallback logic as a separate enhancement
sparrc
reviewed
Dec 18, 2025
ShelbyZ
reviewed
Dec 18, 2025
amogh09
approved these changes
Dec 18, 2025
amogh09
reviewed
Dec 18, 2025
ShelbyZ
approved these changes
Dec 18, 2025
sparrc
approved these changes
Dec 18, 2025
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes the following race condition in ECS agent -
When the ECS agent receives a task payload from ACS, the payload contains the execution role credentials. The ECS agent uses these credentials to create task resources, before the task can be run.
If the ECS agent is restarted between the time when a task payload arrived and the task's resources were created, currently the task fails with error like "unable to find execution role credentials".
This is because ECS agent only saves the credentials ID to its state during shutdown, not the credentials themselves. So after a restart, ECS agent needs to wait for the execution role credentials to (re)arrive from ACS, instead of trying to progress the task too soon and failing.
Implementation details
Updated the task manager's resource transition logic to check whether the execution role is needed for a resource creation, and if so, whether it is available. If not, it will wait, like it waits for execution role credentials to pull container images of a task. The wait timeout is configured to be 1 minute.
Also, added a
RequiresExecutionRoleCredentialsmethod to the taskresource interface that tells whether a resource requires execution role credentials. This is used by the task manager during resource transitions.Testing
New tests cover the changes: yes, added unit and integ tests. The integ test reproduces the race condition and now asserts that the fix works.
Description for the changelog
bugfix: wait for execution role credentials to create task resource upon agent restart
Additional Information
Does this PR include breaking model changes? If so, Have you added transformation functions? No
Does this PR include the addition of new environment variables in the README? No
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.