Skip to content

Conversation

@sbkok
Copy link
Collaborator

@sbkok sbkok commented Sep 26, 2022

This PR implements steps 2 to 4 of issue #518. Since all these code changes are related, I put them in one PR:

  1. Forward CodePipeline Execution Id to Account Mgmt Creation SFN

    Step 2 of fixing ADF-Bootstrap CodeBuild fails directly after creating a new account using adf-accounts logic #518.

    Why?

    As explained in ADF-Bootstrap CodeBuild fails directly after creating a new account using adf-accounts logic #518, we need to forward the execution id of the CodePipeline
    that triggered the Account Management state machine so we can wait for to
    complete.

    What?

    Adding the CodePipeline execution identifier to the Step Functions
    State Machine invocation to enable tracing state machines in progress at
    CodeBuild execution time.

  2. Only process account and deployment maps generated by same ADF version

    Why?

    When an account file or deployment map is updated, it should only be processed
    when the version equals the current ADF version. Otherwise it should skip the
    file.

    This will ensure we don't get into compatibility issues, where a file structure
    update will make the processing of the file fail.

    What?

    Checking the version number attached to the S3 object metadata against the
    current ADF version number. If those mismatch, it will skip the file.

  3. Minor updates to sync_to_s3.py:

    • Update docstring help to include the arguments correctly.
    • Update helper requirements to be installed when sync_to_s3.py is called.
    • Use CodePipeline execution id instead of CodeBuild for sync_to_s3 ops.
    • Catch low-level error thrown by S3 Head Object.
  4. Await SFN executions before bootstrapping continues

    Why?

    As described in issue ADF-Bootstrap CodeBuild fails directly after creating a new account using adf-accounts logic #518, the bootstrap pipeline fails to perform the main.py
    code when the account creation or bootstrapping process is still in progress.

    What?

    The code changes ensure the script will wait for any Step Function executions
    that are triggered by the sync_to_s3.py process. It will wait for 30 seconds in
    a loop until they succeeded.

  5. Abort bootstrap pipeline when SFN error occurred

    Why?

    As the account management and bootstrapping steps are performed in Step
    Function State Machines, the errors might not be noticed until a follow-up error
    occurs when trying to interact with one of the failing accounts.

    What?

    Modified the bootstrap pipeline to check if these state machines do not have
    any aborted, timed out, or failed executions. If they do, it will log the error
    and instruct the user to look into the fault first.


By submitting this pull request, I confirm that you can use, modify, copy, and
redistribute this contribution, under the terms of your choice.

Step 2 of fixing awslabs#518.

**Why?**

As explained in awslabs#518, we need to forward the execution id of the CodePipeline
that triggered the Account Management state machine so we can wait for to
complete.

**What?**

Adding the CodePipeline execution identifier to the Step Functions
State Machine invocation to enable tracing state machines in progress at
CodeBuild execution time.
**Why?**

When an account file or deployment map is updated, it should only be processed
when the version equals the current ADF version. Otherwise it should skip the
file.

This will ensure we don't get into compatibility issues, where a file structure
update will make the processing of the file fail.

**What?**

Checking the version number attached to the S3 object metadata against the
current ADF version number. If those mismatch, it will skip the file.
**Why?**

SFN has a limit of 80 chars.
**Why?**

As described in issue awslabs#518, the bootstrap pipeline fails to perform the main.py
code when the account creation or bootstrapping process is still in progress.

**What?**

The code changes ensure the script will wait for any Step Function executions
that are triggered by the sync_to_s3.py process. It will wait for 30 seconds in
a loop until they succeeded.
**Why?**

As the account management and bootstrapping steps are performed in Step
Function State Machines, the errors might not be noticed until a follow-up error
occurs when trying to interact with one of the failing accounts.

**What?**

Modified the bootstrap pipeline to check if these state machines do not have
any aborted, timed out, or failed executions. If they do, it will log the error
and instruct the user to look into the fault first.
@sbkok sbkok added the enhancement New feature or request label Sep 26, 2022
@sbkok sbkok added this to the v3.2.0 milestone Sep 26, 2022
@sbkok sbkok changed the title Fix/account creation wait for bootstrap to complete Fix account creation wait for bootstrap to complete Sep 26, 2022
**Why?**

The implementation of the head_object throws a low-level
`botocore.exceptions.ClientError` instead of
the higher level `s3_client.exceptions.NoSuchKey`.

**What?**

Properly caught the error using the low-level approach.
@sbkok sbkok force-pushed the fix/account-creation-wait-for-bootstrap-to-complete branch from 319345e to 62d8115 Compare September 27, 2022 07:59
@StewartW StewartW merged commit 920b142 into awslabs:master Oct 3, 2022
@sbkok sbkok deleted the fix/account-creation-wait-for-bootstrap-to-complete branch December 22, 2022 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants