Massively Parallel Algorithms Assignments

This repo contains the code for both the MPA assignments.

Getting started

First install the dependencies. Also make sure you have Spark installed.

pip install -r requirements.txt

Download the McDonalds dataset. Then extract the CSV file into data/mcdonalds/raw.csv.

Then run

spark-submit prepare_data.py

to prepare the data and download the other datasets.

You can run the mst edge sampling with for example

spark-submit mst_edge_sampling.py housing

Finally you can run the kmeans clustering with for example

spark-submit scalable_kmeans++.py housing 100

to run the k-mean algorithm on the housing dataset with k=100

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
common.py		common.py
gen_data.py		gen_data.py
gen_data_no_radius.py		gen_data_no_radius.py
mst_edge_sampling.py		mst_edge_sampling.py
prepare_data.py		prepare_data.py
report_2.ipynb		report_2.ipynb
requirements.txt		requirements.txt
run-all.zsh		run-all.zsh
scalable_kmeans++.py		scalable_kmeans++.py
visualization.ipynb		visualization.ipynb