Apply unsupervised machine-learning strategies to cluster banking customers by their demographics and financial behavior (using PCA and KMeans).
Determine customer segmentation, by:
- Demographics (using only the twm_customer table)
- Banking behavior (using engineered features from all available data)
- Clean data and engineer features for clustering (numpy, pandas)
- Use KMeans to find 3-5 clusters per category: demographics, banking behavior
- Reduce dimensions for plotting with PCA
- Visualize results by plotting clusters in 2D radar charts
- Present findings and insights on clustered groups.
Financial transaction data from here.
The data contains following tables:
- twm_customer - information about customers
- twm_accounts - information about accounts
- twm_checking_accounts - information about checking accounts (subset of twm_accounts)
- twm_credit_accounts - information about checking accounts (subset of twm_accounts)
- twm_savings_accounts - information about checking accounts (subset of twm_accounts)
- twm_transactions - information about financial transactions
- twm_savings_tran - information about savings transactions (subset of twm_transactions)
- twm_checking_tran - information about savings transactions (subset of twm_transactions)
- twm_credit_tran - information about credit checking (subset of twm_transactions)