Skip to content

dimitraseferiadi/Big-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semester Project – Big Data Analysis with Apache Spark & Hadoop - Group 19

This is the 2024–2025 semester project for the Advanced Database Systems course at the National Technical University of Athens. The project focuses on big data analysis using Apache Spark and Hadoop, processing large-scale datasets such as Los Angeles crime data, census data, and police station records, using Spark SQL, DataFrame, and RDD APIs.

Executing the Files

The project was developed using JupyterLab running on Amazon AWS. The provided queries analyze crime patterns, police department efficiency, income disparities, and geospatial trends. All queries are written in Python and can be executed in a distributed computing environment.

Requirements

To run this project, you will need:

  • Apache Hadoop (version ≥ 3.0)
  • Apache Spark (version ≥ 3.5)

About

The project focuses on big data analysis using Apache Spark and Hadoop, processing large-scale datasets such as Los Angeles crime data, census data, and police station records, using Spark SQL, DataFrame, and RDD APIs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors