Skip to content

Conversation

@sbkok
Copy link
Collaborator

@sbkok sbkok commented Sep 14, 2022

Step 1 of fixing #518.

Why?

By using aws s3 sync/cp, it would copy the files when these were changed. However, as the file metadata is also taken into account, it would upload also if the content did not change.

Additionally, as described in #518, we would like to insert metadata when a file is changed. If we would rely on the aws s3 sync/cp logic, it will also update the metadata if the metadata itself is changed. Therefore, we cannot add the necessary execution id to the files upon an upload only.

What?

The sync_to_s3.py script is added to support syncing the files to S3. This script will:

  1. Upload a single file, or walk through a directory recursively.
  2. Check each of the files it finds and determines the SHA-256 hash of these.
  3. Parse the S3 bucket with an optional prefix, to determine which objects exist.
  4. If a file is missing, it will upload the file.
  5. If a file exists as an object already, it will check if the SHA-256 hashes match. If they do not, it will upload the new version.
  6. If an object exists, but the file does not exist, it will optionally delete the object from the S3 bucket.

When it uploads a file to S3, it will add the metadata that is requested through the --upload-with-metadata argument. Additionally, it will add the sha256_hash metadata to determine if the content changed.

The deployment maps and account configuration processes rely on AWS Step Functions. When these are synced, the process is updated to rely on the sync_to_s3.py script. This way we can retrieve the execution_id and insert that in the invocation id of the Step Function State Machine.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@sbkok sbkok added this to the v3.2.0 milestone Sep 14, 2022
@sbkok sbkok force-pushed the fix/account-creation-wait-for-bootstrap-to-complete branch from 8c0dea7 to 35e1e08 Compare September 14, 2022 13:26
javydekoning
javydekoning previously approved these changes Sep 14, 2022
**Why?**

By using `aws s3 sync/cp`, it would copy the files when these were changed.
However, as the file metadata is also taken into account, it would upload
also if the content did not change.

Additionally, as described in awslabs#518, we would like to insert metadata when a
file is changed. If we would rely on the `aws s3 sync/cp` logic, it will also
update the metadata if the metadata itself is changed. Therefore, we cannot
add the necessary execution id to the files upon an upload only.

**What?**

The `sync_to_s3.py` script is added to support syncing the files to S3.
This script will:
1. Upload a single file, or walk through a directory recursively.
2. Check each of the files it finds and determines the SHA-256 hash of these.
3. Parse the S3 bucket with an optional prefix, to determine which objects
   exist.
4. If a file is missing, it will upload the file.
5. If a file exists as an object already, it will check if the SHA-256 hashes
   match. If they do not, it will upload the new version.
6. If an object exists, but the file does not exist, it will optionally delete
   the object from the S3 bucket.

When it uploads a file to S3, it will add the metadata that is requested
through the `--upload-with-metadata` argument. Additionally, it will add the
`sha256_hash` metadata to determine if the content changed.

The deployment maps and account configuration processes rely on AWS Step
Functions. When these are synced, the process is updated to rely on the
`sync_to_s3.py` script. This way we can retrieve the `execution_id` and insert
that in the invocation id of the Step Function State Machine.
**Why?**

To support matching both .yml and .yaml file extensions.

**What?**

Support added to pass multiple -e or --extension arguments.
…ates

**Why?**

When files are synced to S3, they only triggered an update of the account
management or pipeline generator when the file content changed.

If ADF made changes to the pipeline structure, the pipelines and account
management should be retriggered to apply them.

**What?**

By adding the `adf_version` metadata to the files that are synced, we can
ensure that we only trigger an update to the file when the version is updated.
@sbkok sbkok force-pushed the fix/account-creation-wait-for-bootstrap-to-complete branch from 648cf9b to 6e14fe2 Compare September 19, 2022 11:43
@sbkok
Copy link
Collaborator Author

sbkok commented Sep 19, 2022

@StewartW I've updated the PR and included the ADF version to the metadata of the files as we discussed on Friday.
This allows us to trigger a new run of the Step Function State Machines when ADF is updated.

@sbkok sbkok merged commit ebd8152 into awslabs:master Sep 20, 2022
@sbkok sbkok deleted the fix/account-creation-wait-for-bootstrap-to-complete branch September 23, 2022 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants