ML on FaaS

Prerequisites

Before you begin, ensure you have the following:

AWS Account
Basic understanding of AWS services
Appropriate permissions to create IAM roles and policies
AWS elastic container registry
Authenticated AWS CLI locally
A terraform backend for the state

How to setup the IAM

To create an IAM user with the correct permissions you have to follow a couple of steps:

Sign in to the AWS Console via your browser
Navigate to IAM -> Users
Create a new user
Click the "Add permissions" dropdown and select "Create inline policy"
Change Visual mode to JSON
Add the following in the policy and save it

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "VisualEditor0",
			"Effect": "Allow",
			"Action": [
				"iam:*",
				"sns:*",
				"s3:*",
				"logs:*",
				"lambda:*",
				"ecs:*",
				"ec2:*",
				"ecr:*"
			],
			"Resource": "*"
		}
	]
}

Now you have a user with the correct permissions

How to setup the ECR

To create a repository for the images follow these steps:

Sign in to the AWS Console via your browser
Navigate to ECR (Elastic Container Registry)
Create a new repository
Select private and give it a name
Now you have a repository that we will use later.

How to authenticate AWS CLI

To get access to the CLI on your account do the following:

Install the AWS CLI
Sign in to the AWS Console via your browser
Head to the user we made before in the IAM settings
Create access keys for that user
In your local terminal use aws configure with the access defails for the user
Now you are be able to continue

How to create a Terraform backend for the state

Sign in to the AWS Console via your browser
Navigate to S3
Create a new bucket
Add the configuration of that bucket to artifacts/main.tf

Running the project

Before running the project you need to setup and understand a few things

Configuration

Configurations are made in the .env file in the root of the project. The comments describe the expected values.

Shell scripts

For running the project it contains a number of shell scripts which are all prefixed with a number:

0_destroy_infrastructure.sh -> Cleans up the terraform setup
1_build_images.sh -> Builds all images that will be used
2_init_terraform.sh -> Initialized terraform, only has to be ran once
3_create_infrastructure.sh -> Uses the artifacts to create all cloud resources
4_run_experiments.sh -> Starts a container that publishes tasks to be ran by the cloud functions
5_get_logs.sh -> Pulls all relevant logs locally so we can process them later
6_preprocess.sh -> Reads the local logs and puts them in a usable CSV format

Pickling the models

Most of the functions expect a pickled model to be present. For this you can use the scripts in model_training. Simply run the script of one and the pickled model should appear in the directory.

Testing workflow

If you want to test some functions this could be your workflow.

Choose the things you want to benchmark and put them in the .env file (e.g. k-means,pca)
Choose the other variables such as memory and batch size, for these we use the sentiment.csv dataset
Build the images using script 1
Initialize terraform if that hasn't been doen yet using script 2
Create the infrastructure using script 3
Run the experiments with script 4
If those are done, pull the logs with 5
Preprocess using script 6
Do you analysis on the output

Extending the project

There are a couple of ways to extend the project.

Add another algorithm

Navigate to the functions directory
Identify which algorithm resembles your new one most code wise
Copy that folder
Change the config.py with the new name
Change the handler.py with your code, only change the implementation of the handler and initializer functions
Check if the requirements.txt file is still correct
If required, make a script that trains the model, pickles it, and saves it in the directory of your new function.
Done, you should be able to use this in the .env file with the directory name you used, make sure you rebuild the images after adding it to the .env file using script 1

Adding a dataset

Adding a dataset it slightly more involved.

Add your dataset to the datasets directory
In artifacts/bucket.tf add your dataset just like the ones that are already in there
In the experiments/src/main.go change the function getDataset to also account for your new dataset key
Done, use the dataset in the .env file and make sure to build the images again with 1 so the new experiments image is available

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML on FaaS

Prerequisites

How to setup the IAM

How to setup the ECR

How to authenticate AWS CLI

How to create a Terraform backend for the state

Running the project

Configuration

Shell scripts

Pickling the models

Testing workflow

Extending the project

Add another algorithm

Adding a dataset

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
artifacts		artifacts
datasets		datasets
experiments		experiments
functions		functions
model_training		model_training
scripts		scripts
.env		.env
.gitignore		.gitignore
0_destroy_infrastructure.sh		0_destroy_infrastructure.sh
1_build_images.sh		1_build_images.sh
2_init_terraform.sh		2_init_terraform.sh
3_create_infrastructure.sh		3_create_infrastructure.sh
4_run_experiments.sh		4_run_experiments.sh
5_get_logs.sh		5_get_logs.sh
6_preprocess.sh		6_preprocess.sh
README.md		README.md

RickTimmer/ml-on-faas-benchmark

Folders and files

Latest commit

History

Repository files navigation

ML on FaaS

Prerequisites

How to setup the IAM

How to setup the ECR

How to authenticate AWS CLI

How to create a Terraform backend for the state

Running the project

Configuration

Shell scripts

Pickling the models

Testing workflow

Extending the project

Add another algorithm

Adding a dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages