A machine learning project that uses CatBoost to classify iris flower species based on their physical characteristics. This implementation demonstrates gradient boosting for multiclass classification on the classic Iris dataset.
This project implements a CatBoost classifier to predict iris species (Setosa, Versicolor, and Virginica) based on four features:
- Sepal length
- Sepal width
- Petal length
- Petal width
The model achieves high accuracy using gradient boosting techniques with minimal code and configuration.
- Dataset Loading: Uses the built-in Iris dataset from scikit-learn
- Data Inspection: Displays dataset shape, features, and class distribution
- Train/Test Split: 80/20 split for model validation
- CatBoost Implementation: Gradient boosting classifier with multiclass support
- Performance Evaluation: Accuracy metrics on test data
pandas
scikit-learn
catboost- Clone this repository:
git clone https://github.com/javsan77/CatBoost-Iris-Classifier.git
cd catboost-iris-classifier- Install dependencies:
pip install pandas scikit-learn catboostRun the script:
python iris_classifier.pyExpected output:
--- Dataset Inspection ---
X shape (rows, columns): (150, 4)
y shape (samples): (150,)
First 5 rows of X (Features):
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
...
Value counts of y (Target Species 0, 1, 2):
0 50
1 50
2 50
--------------------------
Iniciando el entrenamiento del modelo CatBoost en la CPU...
¡Entrenamiento completado!
Precisión del modelo en el conjunto de prueba: 1.0000
CatBoost Classifier Parameters:
iterations: 100 treeslearning_rate: 0.1depth: 6 (tree depth)loss_function: MultiClass (for 3 species)random_seed: 42 (for reproducibility)
Iris Dataset:
- Samples: 150 (50 per species)
- Features: 4 numeric attributes
- Classes: 3 (Setosa, Versicolor, Virginica)
- Split: 120 training samples, 30 test samples
The model typically achieves ~97-100% accuracy on the test set, demonstrating CatBoost's effectiveness for this classification task.
- Data Loading: Imports the Iris dataset and converts to pandas DataFrame
- Data Inspection: Prints dataset dimensions and distribution
- Data Splitting: Divides data into training (80%) and testing (20%) sets
- Model Training: Trains CatBoost classifier with 100 iterations
- Prediction: Makes predictions on unseen test data
- Evaluation: Calculates and displays accuracy score
- High Performance: Often outperforms other gradient boosting libraries
- Easy to Use: Minimal hyperparameter tuning required
- Built-in Categorical Support: Handles categorical features automatically
- Fast Training: Optimized for CPU training
- Robust: Less prone to overfitting
This project helps you understand:
- Gradient boosting for classification
- CatBoost library implementation
- Multiclass classification problems
- Model evaluation techniques
- Data preprocessing with pandas
Contributions are welcome! Feel free to:
- Report bugs
- Suggest improvements
- Add new features
- Improve documentation
This project is licensed under the MIT License - see the LICENSE file for details.
Javier Sanchez Ayte - https://www.linkedin.com/in/javier-sanchez-ayte/
- Built with CatBoost
- Dataset from scikit-learn
- Inspired by gradient boosting techniques
⭐ If you found this helpful, please give it a star!