PageRank on Google Cloud Platform using Pig and Spark

Malo GRALL

Alex MAINGUY

Mathis ROCHER

Method

Small dataset

Big dataset

The differences should be more visible but we did not setup the optimal partitioning, so the differences are not clearly visible.

Problems

In order to get clearer results, we gathered Spark results and saved them in a separate file instead of printing them in the terminal with the Cloud Logging for python feature of GCP.

With pig we had trouble debugging with the logs because they were not easily accessible in GCP. The Logging menu had some logs but they were only logging the terminal outputs.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
images		images
pig		pig
pyspark		pyspark
README.md		README.md
find-max-pagerank.sh		find-max-pagerank.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PageRank on Google Cloud Platform using Pig and Spark

Method

Small dataset

Big dataset

Problems

About

Uh oh!

Releases

Packages

Languages

grallm/m2-spark-hadoop-pig

Folders and files

Latest commit

History

Repository files navigation

PageRank on Google Cloud Platform using Pig and Spark

Method

Small dataset

Big dataset

Problems

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages