AI-4-UX - Data Cleaning and Analysis Repository

This repository contains a collection of Python scripts and CSV files used for data cleaning, transformation, and statistical analysis. The workflow includes preprocessing open-ended responses, normalizing data, computing statistical measures, and generating correlation matrices.

Repository Structure

Data Cleaning Scripts

cleanup.py – Cleans and processes structured data, ensuring uniform column names and removing unnecessary entries.
scale_mappings.py – Maps scale-based responses to numerical values.
convert_to_numbers.py – Converts categorical survey responses into numerical values for analysis.

Statistical Analysis Scripts

shapiro_wilk.py – Performs a Shapiro-Wilk test to assess normality in the dataset.
spearman.py – Computes Spearman’s correlation and generates a heatmap visualization.
skewness_kurtosis.py – Computes skewness and kurtosis values for evaluating data distribution.
iqr_median.py – Computes median and interquartile range (IQR) to understand data spread.

Qualitative Script

clean_open.py – Cleans open-ended responses by merging multiline answers, removing unwanted values, and formatting the text.
translate.py – Translates open-ended responses from French to English using deep_translator.
feature.py – Extracts key features from the dataset for further statistical analysis.1
themes.py – Extracts key themes from the dataset for further statistical analysis.1

1 feature.py and themes.py should not be considered reliable as the extraction is limited to the keywords set in the script. Please avail human analysis for features and themes.

CSV Data Files

open_ended.csv – Raw open-ended responses before cleaning.
open_ended_cleaned.csv – Cleaned open-ended responses after text processing.
open_ended_translated.csv – Translated open-ended responses (French → English).
your_data.csv – Main dataset containing survey responses.
your_data+openended.csv – Combined dataset including structured and open-ended responses.
data_no_outliers.csv – Preprocessed dataset with outliers removed.
features.csv – Extracted features from the dataset.
median_iqr_results.csv – Computed median and IQR values.
normality_test_results.csv – Results of the Shapiro-Wilk normality test.
skewness_kurtosis_results.csv – Skewness and kurtosis analysis results.
spearman_correlation.csv – Spearman’s correlation matrix.
spearman_correlation_heatmap.png – Heatmap visualization of Spearman's correlation.
themes.csv – Processed thematic analysis results.

Usage

Data Cleaning & Preparation
- Run cleanup.py to standardize structured data.
- Use convert_to_numbers.py and scale_mappings.py to transform categorical responses into numerical values.
Statistical Analysis
- Run shapiro_wilk.py for normality tests.
- Run spearman.py for correlation analysis.
- Run skewness_kurtosis.py for distribution analysis.
- Use iqr_median.py for median and IQR computations.
Preprocessing Open-Ended Responses
- Run clean_open.py to clean open-ended responses.
- Run translate.py to translate responses into English.
- Run feature.py to extract the key features.
- Run themes.py to extract the key themes.

Requirements

To run these scripts, install the required dependencies using:

pip install pandas numpy scipy matplotlib seaborn deep_translator

Contributions

Feel free to contribute by submitting pull requests to improve data processing and analysis workflows.

License

This project is open-source and licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI-4-UX - Data Cleaning and Analysis Repository

Repository Structure

Data Cleaning Scripts

Statistical Analysis Scripts

Qualitative Script

CSV Data Files

Usage

Requirements

Contributions

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
clean_open.py		clean_open.py
cleaned_data.csv		cleaned_data.csv
cleanup.py		cleanup.py
convert_to_numbers.py		convert_to_numbers.py
data_no_outliers.csv		data_no_outliers.csv
feature.py		feature.py
features.csv		features.csv
iqr_median.py		iqr_median.py
median_iqr_results.csv		median_iqr_results.csv
normality_test_results.csv		normality_test_results.csv
numeric_likert.csv		numeric_likert.csv
open_ended.csv		open_ended.csv
open_ended_cleaned.csv		open_ended_cleaned.csv
open_ended_translated.csv		open_ended_translated.csv
scale_mappings.py		scale_mappings.py
shapiro_wilk.csv		shapiro_wilk.csv
shapiro_wilk.py		shapiro_wilk.py
skewness_kurtosis.py		skewness_kurtosis.py
skewness_kurtosis_results.csv		skewness_kurtosis_results.csv
spearman.py		spearman.py
spearman_correlation.csv		spearman_correlation.csv
spearman_correlation_heatmap.png		spearman_correlation_heatmap.png
themes.csv		themes.csv
themes.py		themes.py
translate.py		translate.py
your_data+openended.csv		your_data+openended.csv
your_data.csv		your_data.csv

danve93/AI-4-UX

Folders and files

Latest commit

History

Repository files navigation

AI-4-UX - Data Cleaning and Analysis Repository

Repository Structure

Data Cleaning Scripts

Statistical Analysis Scripts

Qualitative Script

CSV Data Files

Usage

Requirements

Contributions

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages