-
Notifications
You must be signed in to change notification settings - Fork 235
Sync Step Function input files when content changed only with exec id metadata #530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
sbkok
merged 8 commits into
awslabs:master
from
sbkok:fix/account-creation-wait-for-bootstrap-to-complete
Sep 20, 2022
Merged
Sync Step Function input files when content changed only with exec id metadata #530
sbkok
merged 8 commits into
awslabs:master
from
sbkok:fix/account-creation-wait-for-bootstrap-to-complete
Sep 20, 2022
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8c0dea7 to
35e1e08
Compare
javydekoning
previously approved these changes
Sep 14, 2022
StewartW
requested changes
Sep 14, 2022
src/lambda_codebase/initial_commit/bootstrap_repository/adf-bootstrap/deployment/global.yml
Outdated
Show resolved
Hide resolved
src/lambda_codebase/initial_commit/bootstrap_repository/adf-build/shared/helpers/sync_to_s3.py
Outdated
Show resolved
Hide resolved
src/lambda_codebase/initial_commit/bootstrap_repository/adf-build/shared/helpers/sync_to_s3.py
Outdated
Show resolved
Hide resolved
**Why?** By using `aws s3 sync/cp`, it would copy the files when these were changed. However, as the file metadata is also taken into account, it would upload also if the content did not change. Additionally, as described in awslabs#518, we would like to insert metadata when a file is changed. If we would rely on the `aws s3 sync/cp` logic, it will also update the metadata if the metadata itself is changed. Therefore, we cannot add the necessary execution id to the files upon an upload only. **What?** The `sync_to_s3.py` script is added to support syncing the files to S3. This script will: 1. Upload a single file, or walk through a directory recursively. 2. Check each of the files it finds and determines the SHA-256 hash of these. 3. Parse the S3 bucket with an optional prefix, to determine which objects exist. 4. If a file is missing, it will upload the file. 5. If a file exists as an object already, it will check if the SHA-256 hashes match. If they do not, it will upload the new version. 6. If an object exists, but the file does not exist, it will optionally delete the object from the S3 bucket. When it uploads a file to S3, it will add the metadata that is requested through the `--upload-with-metadata` argument. Additionally, it will add the `sha256_hash` metadata to determine if the content changed. The deployment maps and account configuration processes rely on AWS Step Functions. When these are synced, the process is updated to rely on the `sync_to_s3.py` script. This way we can retrieve the `execution_id` and insert that in the invocation id of the Step Function State Machine.
**Why?** To support matching both .yml and .yaml file extensions. **What?** Support added to pass multiple -e or --extension arguments.
…ates **Why?** When files are synced to S3, they only triggered an update of the account management or pipeline generator when the file content changed. If ADF made changes to the pipeline structure, the pipelines and account management should be retriggered to apply them. **What?** By adding the `adf_version` metadata to the files that are synced, we can ensure that we only trigger an update to the file when the version is updated.
648cf9b to
6e14fe2
Compare
Collaborator
Author
|
@StewartW I've updated the PR and included the ADF version to the metadata of the files as we discussed on Friday. |
StewartW
approved these changes
Sep 20, 2022
javydekoning
approved these changes
Sep 20, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why?
By using
aws s3 sync/cp, it would copy the files when these were changed. However, as the file metadata is also taken into account, it would upload also if the content did not change.Additionally, as described in #518, we would like to insert metadata when a file is changed. If we would rely on the
aws s3 sync/cplogic, it will also update the metadata if the metadata itself is changed. Therefore, we cannot add the necessary execution id to the files upon an upload only.What?
The
sync_to_s3.pyscript is added to support syncing the files to S3. This script will:When it uploads a file to S3, it will add the metadata that is requested through the
--upload-with-metadataargument. Additionally, it will add thesha256_hashmetadata to determine if the content changed.The deployment maps and account configuration processes rely on AWS Step Functions. When these are synced, the process is updated to rely on the
sync_to_s3.pyscript. This way we can retrieve theexecution_idand insert that in the invocation id of the Step Function State Machine.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.