Data-Modeling

This repo demonstrates of creating a structured data model by showing how a large dataset containing taxi trip data (2009–2023) can be transformed from a flat file format into a dimensional model using Python and visualization tools like Lucidchart.

Overview of Data:

Data Source: https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Data Columns: Includes trip details (vendor ID, pickup/drop datetime, passenger count, trip distance, locations, rate codes, payment types, and charges).
Data Dictionary: https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf
Objective: Extract and understand the dataset's structure using a data dictionary.

Steps

Understanding Columns: Review data and identify key columns for dimensional modeling.
Dimensional Modeling:
- Create fact and dimension tables.
- Dimension tables contain static/descriptive values (e.g., pickup/drop locations, rate codes, payment types).
- Fact tables include transactional data (e.g., IDs, passenger counts, trip distances, fare amounts).
Tools Used:
- Lucidchart: For designing the data model.
- Python (Pandas): To transform the flat file into the fact-dimension schema:
  - Convert datetime columns to proper formats.
  - Extract metadata (e.g., hours, months).
  - Drop duplicates and reset indices.
  - Map descriptive values (e.g., rate codes) using dictionaries.
Code Implementation:
- Sequentially create and populate dimension tables.
- Merge dimension tables into a fact table using common keys.
- Validate the schema by outputting sample data.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
README.md		README.md
data_model.jpg		data_model.jpg
dim and fact table.ipynb		dim and fact table.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data-Modeling

Overview of Data:

Steps

About

Uh oh!

Releases

Packages

Uh oh!

Languages

SreeHarsha72/Data-Modeling

Folders and files

Latest commit

History

Repository files navigation

Data-Modeling

Overview of Data:

Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages