Customer360 Pipeline

We will be implementing a pipeline that will be processing the customers and orders file and will provide the output to the customer service team for analysis and this output will be stored on AWS S3 also.

List of Components Used for the pipeline

VSCode Editor 🧑‍💻
Docker 🐳
Amazon S3 🪣
Hive 🐘
Spark 🌟
Airflow 💨
HDFS 📦
Gmail SMTP Server 📧
Slack 🔔

Note*:- We will be using a Docker Contianer for Hive, Spark, HDFS and Airflow. So you just need docker for this 😉

Pipeline implementation involves the below steps

Step 1: Checking if the orders file and customers is available in the S3 bucket

Step 2: Once the file is available , we are fetching the file from the Amazon S3 bucket to our local system

Step 3: Moving both customers and orders data to hdfs

Step 3.1: Processing orders data using Spark so we just have closed orders (This is an additional step you can skip if you want)

Step 4: creating customers and orders data in hive and loading it from hdfs

Step 5: Joining the customers and orders table using spark to get our final output

Step 6: Storing the final output to final_table and saving this output in local (you can store it in hdfs also if you want 🤗)

Step 7: Uploading the final output from local to AWS S3

Step 8: Creating the table in AWS Athena and querying it in AWS Athena

Connections

order_s3 connection to fetch data from aws s3 bucket

spark connection in airflow

slack connection in airflow

Requirements

AWS free account
Docker should be installed

Execution Steps

Clone the repo and your current working directory should be customer360_pipeline
Run start.sh file inside customer360_pipeline folder using command bash start.sh (for windows) / ./start.sh to start the required docker containers needed for the project
if your containers are healthy and running you will be able to navigate to airlfow UI (localhost:8080)
Make the above connections there and the variables will be made by our customer360.py (inside dag folder) automatically.
Just go to the UI now and start the dag.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
customer360_pipeline		customer360_pipeline
data		data
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer360 Pipeline

List of Components Used for the pipeline

Pipeline implementation involves the below steps

Connections

order_s3 connection to fetch data from aws s3 bucket

spark connection in airflow

slack connection in airflow

Requirements

Execution Steps

Customer 360 DAG

Gmail alerts

Slack alerts

About

Uh oh!

Releases

Packages

Languages

License

prikshit-2000/Airflow-Customer360-pipeline

Folders and files

Latest commit

History

Repository files navigation

Customer360 Pipeline

List of Components Used for the pipeline

Pipeline implementation involves the below steps

Connections

order_s3 connection to fetch data from aws s3 bucket

spark connection in airflow

slack connection in airflow

Requirements

Execution Steps

Customer 360 DAG

Gmail alerts

Slack alerts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages