Customer Segmentation Analysis

This repository contains a comprehensive analysis of customer segmentation using various clustering techniques. The goal of this project is to identify distinct customer segments based on their purchasing behavior and visualize the results.

Introduction

Customer segmentation is a crucial task in marketing and business strategy. By identifying distinct customer segments, businesses can tailor their marketing efforts and improve customer satisfaction. In this project, we use various clustering techniques to segment customers based on their purchasing behavior.

Dataset

The dataset used in this analysis is the "Online Retail" dataset, which contains transactional data for a UK-based online retail store. The dataset includes information such as invoice number, stock code, description, quantity, invoice date, unit price, customer ID, and country.

Data Preprocessing

Loading the Dataset: The dataset is loaded from an Excel file using pandas.
Cleaning the Data: Missing values are removed, and the index is reset.
Encoding Categorical Features: Categorical features are encoded into numerical values using LabelEncoder.
Normalizing the Data: Features are scaled to a range of 1 to 5 using MinMaxScaler.

Exploratory Data Analysis

Histograms: Histograms are created for all features to visualize their distributions.
Pair Plots: Pair plots are generated to visualize relationships between pairs of features.
Correlation Heatmap: A heatmap is created to visualize the correlation between features.

Clustering

K-Means Clustering: K-Means clustering is performed for different values of K (2 to 11). The Elbow method is used to determine the optimal number of clusters.
Evaluation Metrics: Clustering performance is evaluated using Silhouette Score, Calinski-Harabasz Score, and Davies-Bouldin Score.
Visualization: Scatter plots are created to visualize the clusters and their centroids.

Principal Component Analysis (PCA)

Dimensionality Reduction: PCA is performed to reduce the dimensionality of the data while retaining most of the variance.
Explained Variance: The explained variance for different numbers of principal components is evaluated.
Visualization: 2D and 3D scatter plots are created to visualize the PCA results.

Mean Shift Clustering

Sampling the Data: A random sample of 5000 rows is taken from the normalized DataFrame.
Estimating Bandwidth: The bandwidth parameter for the Mean Shift algorithm is estimated.
Clustering: Mean Shift clustering is performed on the sampled data.
Visualization: Scatter plots are created to visualize the clusters and their centroids.

Results

The analysis identified distinct customer segments based on their purchasing behavior. The optimal number of clusters was determined using the Elbow method and evaluation metrics. The clusters were visualized using scatter plots and PCA, providing valuable insights into the distribution of data points across different clusters.

Conclusion

This project demonstrates the effectiveness of clustering techniques in customer segmentation. By identifying distinct customer segments, businesses can tailor their marketing efforts and improve customer satisfaction. The use of PCA and various evaluation metrics ensures that the clustering results are reliable and meaningful.

Usage

Clone the repository:

git clone https://github.com/your-username/your-repository.git

Navigate to the repository directory:
```
cd your-repository
```
Install the required dependencies:
```
pip install -r requirements.txt
```
Run the Jupyter notebook to perform the analysis and visualize the results. Feel free to explore the code and modify it to suit your needs. If you have any questions or suggestions, please open an issue or submit a pull request.

Happy analyzing!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
customer-segmentation (1).ipynb		customer-segmentation (1).ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Customer Segmentation Analysis

Table of Contents

Introduction

Dataset

Data Preprocessing

Exploratory Data Analysis

Clustering

Principal Component Analysis (PCA)

Mean Shift Clustering

Results

Conclusion

Usage

About

Uh oh!

Releases

Packages

Languages

License

sntk-76/Customer-Segmentation

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation Analysis

Table of Contents

Introduction

Dataset

Data Preprocessing

Exploratory Data Analysis

Clustering

Principal Component Analysis (PCA)

Mean Shift Clustering

Results

Conclusion

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages