Before you begin, ensure you have the following:
- AWS Account
- Basic understanding of AWS services
- Appropriate permissions to create IAM roles and policies
- AWS elastic container registry
- Authenticated AWS CLI locally
- A terraform backend for the state
To create an IAM user with the correct permissions you have to follow a couple of steps:
- Sign in to the AWS Console via your browser
- Navigate to IAM -> Users
- Create a new user
- Click the "Add permissions" dropdown and select "Create inline policy"
- Change Visual mode to JSON
- Add the following in the policy and save it
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"iam:*",
"sns:*",
"s3:*",
"logs:*",
"lambda:*",
"ecs:*",
"ec2:*",
"ecr:*"
],
"Resource": "*"
}
]
}- Now you have a user with the correct permissions
To create a repository for the images follow these steps:
- Sign in to the AWS Console via your browser
- Navigate to ECR (Elastic Container Registry)
- Create a new repository
- Select private and give it a name
- Now you have a repository that we will use later.
To get access to the CLI on your account do the following:
- Install the AWS CLI
- Sign in to the AWS Console via your browser
- Head to the user we made before in the IAM settings
- Create access keys for that user
- In your local terminal use
aws configurewith the access defails for the user - Now you are be able to continue
- Sign in to the AWS Console via your browser
- Navigate to S3
- Create a new bucket
- Add the configuration of that bucket to
artifacts/main.tf
Before running the project you need to setup and understand a few things
Configurations are made in the .env file in the root of the project. The comments describe the expected values.
For running the project it contains a number of shell scripts which are all prefixed with a number:
0_destroy_infrastructure.sh-> Cleans up the terraform setup1_build_images.sh-> Builds all images that will be used2_init_terraform.sh-> Initialized terraform, only has to be ran once3_create_infrastructure.sh-> Uses the artifacts to create all cloud resources4_run_experiments.sh-> Starts a container that publishes tasks to be ran by the cloud functions5_get_logs.sh-> Pulls all relevant logs locally so we can process them later6_preprocess.sh-> Reads the local logs and puts them in a usable CSV format
Most of the functions expect a pickled model to be present. For this you can use the scripts in model_training. Simply run the script of one and the pickled model should appear in the directory.
If you want to test some functions this could be your workflow.
- Choose the things you want to benchmark and put them in the
.envfile (e.g.k-means,pca) - Choose the other variables such as memory and batch size, for these we use the
sentiment.csvdataset - Build the images using script
1 - Initialize terraform if that hasn't been doen yet using script
2 - Create the infrastructure using script
3 - Run the experiments with script
4 - If those are done, pull the logs with
5 - Preprocess using script
6 - Do you analysis on the output
There are a couple of ways to extend the project.
- Navigate to the
functionsdirectory - Identify which algorithm resembles your new one most code wise
- Copy that folder
- Change the
config.pywith the new name - Change the
handler.pywith your code, only change the implementation of thehandlerandinitializerfunctions - Check if the
requirements.txtfile is still correct - If required, make a script that trains the model, pickles it, and saves it in the directory of your new function.
- Done, you should be able to use this in the
.envfile with the directory name you used, make sure you rebuild the images after adding it to the.envfile using script1
Adding a dataset it slightly more involved.
- Add your dataset to the
datasetsdirectory - In
artifacts/bucket.tfadd your dataset just like the ones that are already in there - In the
experiments/src/main.gochange the functiongetDatasetto also account for your new dataset key - Done, use the dataset in the
.envfile and make sure to build the images again with1so the newexperimentsimage is available