clustermatic is a Python library designed to accelerate clustering tasks using scikit-learn. It serves as a quick tool for selecting the optimal clustering algorithm and its hyperparameters, providing visualizations and metrics for comparison.
- Clustering Algorithms: Analyzes six clustering algorithms from
scikit-learn:KMeansDBSCANMiniBatchKMeansAgglomerativeClusteringOPTICSSpectralClustering
- Optimization Methods: Includes Bayesian optimization and random search for hyperparameter tuning.
- Flexible Preprocessing: Allows users to customize how the data is meant to be preprocessed, adjusting methods such as scaling, normalization, and dimensionality reduction.
- Evaluation Metrics: Supports evaluation with
silhouette,calinski_harabasz, anddavies_bouldinscores. - Report Generation: Generates reports in HTML format after optimization.
To install clustermatic, use pip:
pip install clustermaticFor a quick start, use the following code snippet:
from clustermatic import AutoClusterizer
# Load data
from sklearn.datasets import make_moons
X, _ = make_moons(n_samples=200, noise=0.1, random_state=42)
# Initialize AutoClusterizer
ac = AutoClusterizer()
# Fit the data
ac.fit(X)
# Generate report
ac.evaluate()For more detailed walkthrough, check out this example Jupyter Notebook
