Skip to content

smitssjors/mpa

Repository files navigation

Massively Parallel Algorithms Assignments

This repo contains the code for both the MPA assignments.

Getting started

First install the dependencies. Also make sure you have Spark installed.

pip install -r requirements.txt

Download the McDonalds dataset. Then extract the CSV file into data/mcdonalds/raw.csv.

Then run

spark-submit prepare_data.py

to prepare the data and download the other datasets.

You can run the mst edge sampling with for example

spark-submit mst_edge_sampling.py housing

Finally you can run the kmeans clustering with for example

spark-submit scalable_kmeans++.py housing 100

to run the k-mean algorithm on the housing dataset with k=100

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages