This project aims to predict whether a house will be sold based on various features such as price, location, amenities, and community factors. Using a machine learning pipeline, the project involves data preprocessing, exploratory data analysis (EDA), outlier detection, and model building to achieve reliable predictions.
House selling predictions are critical for real estate businesses to understand market trends and improve decision-making. This project uses machine learning techniques to predict sales status (Sold) using a dataset of housing features.
The dataset contains 18 features describing properties and their surroundings:
- Numerical Features:
price,room_num,age,n_hos_beds,n_hot_rooms, etc. - Categorical Features:
waterbody,bus_ter,airport. - Target Variable:
Sold- whether the house was sold (1=Yes,0=No).
- Loaded data using
pandas. - Handled missing values and inconsistent data entries.
- Scaled numerical variables and encoded categorical variables.
- Identified skewness and outliers in features such as
n_hot_roomsandrainfall. - Visualized categorical variables to analyze their impact on the target variable.
- Observations included:
- Missing values in
n_hos_beds. - Skewness in
ageandn_hot_rooms. - Uniform values in
bus_ter.
- Missing values in
- Used statistical methods and visualization tools (e.g., boxplots) for outlier detection.
- Applied transformations and imputation to handle outliers.
- Built machine learning models to predict
Soldusing algorithms like:- Logistic Regression
- K Neartest Neighbor
- Logistic Regression: For baseline binary classification.
- K Neartest Neighbor
- Feature Importance: Identified key factors influencing sales, such as proximity to amenities and property age.
- Model Performance: Achieved high accuracy with ensemble models.
- Actionable Insights: Recommendations for real estate stakeholders to improve property marketing.