This is the 2024–2025 semester project for the Advanced Database Systems course at the National Technical University of Athens. The project focuses on big data analysis using Apache Spark and Hadoop, processing large-scale datasets such as Los Angeles crime data, census data, and police station records, using Spark SQL, DataFrame, and RDD APIs.
The project was developed using JupyterLab running on Amazon AWS. The provided queries analyze crime patterns, police department efficiency, income disparities, and geospatial trends. All queries are written in Python and can be executed in a distributed computing environment.
To run this project, you will need:
- Apache Hadoop (version ≥ 3.0)
- Apache Spark (version ≥ 3.5)