This project implements Support Vector Machine (SVM) classifiers with both Linear and RBF kernels. It includes preprocessing, model training, hyperparameter tuning, evaluation, and optional 2D decision boundary visualization.
If no dataset is provided, it defaults to the Breast Cancer Wisconsin dataset from scikit-learn.
- Load dataset from CSV (
--csv) and specify target (--target), or use the default dataset. - Preprocessing: handle missing values, drop obvious ID/unnamed columns, and encode non-numeric targets.
- Feature scaling with
StandardScaler. - Train Linear SVM and RBF SVM.
- Hyperparameter tuning for RBF SVM using
GridSearchCVoverCandgamma. - Cross-validated metrics, confusion matrices, and classification reports saved to disk.
- Decision boundary plots when exactly two features are used.
svm_classification_project/
│
├── data/
├── outputs/
├── src/
│ └── svm_model.py
├── README.md
└── requirements.txt
python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate
pip install -r requirements.txtpython src/svm_model.pypython src/svm_model.py --csv data/your_dataset.csv --target target_column_namepython src/svm_model.py --csv data/your_dataset.csv --target target_column_name --features feature1 feature2outputs/metrics.json– accuracy and best parametersoutputs/confusion_matrix_linear.pngoutputs/confusion_matrix_rbf.pngoutputs/decision_boundary_linear.png(if 2D)outputs/decision_boundary_rbf.png(if 2D)outputs/accuracy_bar.png
- For categorical targets (e.g., "M"/"B"), the script automatically encodes them.
- Missing numeric values are imputed with the column mean.
- If you do not pass
--targetwith a CSV, the script will prompt you to specify it.