Dataset URL: https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/
Steps:
- Understood the data in the datasets.
- Uploaded the data to S3.
- Built crawlers using AWS Glue.
- The data can be seen on Athena.
- Built a Data Model.
- Built a Dimensional model using star schema.
- Created Dimension and Fact Tables in Python (pandas).
- Loaded data into those tables in Python (pandas).
- Save the resulting CSV files onto S3.
- Written an AWS Glue Job using Python Shell Script.
- Connected to Redshift.
- Created Dimension and Fact Table schemas.
- Loaded data from CSV files in S3 to Redshift.
