Real-time Data Quality & Anomaly Detection Platform
DataMetronome is an open-source, community-driven platform that provides real-time data quality monitoring, anomaly detection, and comprehensive analytics. Built with modern Python technologies, it's designed to help data engineers, DevOps teams, and data scientists ensure their data pipelines are healthy and reliable.
- asyncpg - Lightning-fast async PostgreSQL driver
- psycopg3 - Modern, feature-rich PostgreSQL connector
- SQLAlchemy - ORM integration with async support
- UUID optimization - Distributed system ready
- Connection pooling - Enterprise-grade performance
- Isolation Forest algorithm for statistical outliers
- Real-time monitoring of data quality metrics
- Statistical analysis with configurable thresholds
- Pattern recognition across multiple data sources
- Automated alerting for data quality issues
- Modern web UI built for real-time monitoring
- Responsive layouts tuned for analysts and SREs
- Chart.js visualizations for trends, anomalies, and forecasting
- Interactive drilldowns across clefs, staves, and incident timelines
- Dark/light themes with professional styling out of the box
- Modular design - Easy to extend and customize
- Async-first - High-performance, non-blocking operations
- Clean interfaces - Simple, consistent APIs
- Standalone testing - Each datapulse has comprehensive, independent tests
- Docker support - Easy deployment and testing
π― See it in action!
cd ui-nuxt npm install npm run dev
- Data Engineers - Build robust, monitored data pipelines
- DevOps Teams - Monitor data infrastructure health
- Data Scientists - Ensure data quality for ML models
- Startups - Get enterprise-grade tools on a budget
- Open Source Contributors - Extend and improve the platform
- Enterprise Teams - Deploy in production environments
- Python 3.13+
- Docker and Docker Compose
- uv package manager
git clone https://github.com/datametronome/datametronome.git
cd datametronomedocker-compose -f docker-compose.test.yml up -duv pip install -e ./datametronome/pulse/core
uv pip install -e ./datametronome/pulse/postgrescd ui-nuxt
npm install
npm run devThe dashboard will open at http://localhost:3000 with full anomaly detection capabilities!
DataMetronome uses a standalone testing approach where each datapulse contains its own comprehensive test suite. This allows you to:
- Test independently - Each datapulse can be tested without the entire ecosystem
- Plugin and out - Easily add/remove datapulses as needed
- Deploy separately - Each datapulse can be a standalone package
- Maintain independently - Isolated dependencies and test coverage
# Test the core datapulse
cd datametronome/pulse/core
make install && make test
# Test the PostgreSQL datapulse (AsyncPG)
cd datametronome/pulse/postgres
make install && make test
# Test the PostgreSQL datapulse (Psycopg3)
cd datametronome/pulse/postgres-psycopg3
make install && make test
# Test the PostgreSQL datapulse (SQLAlchemy)
cd datametronome/pulse/postgres-sqlalchemy
make install && make testFor detailed testing information, see TESTING_ARCHITECTURE.md.
The DataMetronome dashboard provides 5 powerful tabs that showcase the complete platform:
- Real-time system health metrics
- Data quality score with beautiful visualizations
- Key performance indicators and statistics
- Professional metric cards with gradients
- Live anomaly detection from PostgreSQL
- Statistical analysis of data quality issues
- Detailed breakdown by table and issue type
- Actionable insights for immediate action
- Machine learning powered detection using Isolation Forest
- Advanced outlier detection for numerical data
- Interactive visualizations showing normal vs anomalous patterns
- ML performance metrics and confidence scores
- Data Distribution Analysis - Histograms with anomaly highlighting
- Time Series Analysis - User registrations and orders over time
- Correlation Analysis - Age vs order amount relationships with trend lines
- Anomaly Pattern Analysis - Heatmaps and trend analysis over time
- Custom SQL queries for deep data exploration
- Data profiling tools for comprehensive table analysis
- Sample data viewing for quick insights
- Interactive data exploration capabilities
- Interactive Histograms with anomaly highlighting
- Time Series Charts with trend analysis
- Scatter Plots with correlation analysis and trend lines
- Heatmaps for anomaly distribution patterns
- Real-time Metrics with professional styling
- Responsive Design that works on any device
- DataPulse Core - Abstract interfaces and base classes
- PostgreSQL Connectors - High-performance database drivers
- Anomaly Detection Engine - Statistical + ML algorithms
- Web Dashboard - Dedicated operational console
- API Layer - FastAPI backend for integrations
- Language: Python 3.13 (latest features)
- Database: PostgreSQL 15+ with UUID extensions
- ML Framework: scikit-learn for anomaly detection
- Frontend: SPA web application
- Charts: Chart.js for interactive visualizations
- Containerization: Docker for easy deployment
- Package Management: uv for fast dependency resolution
graph TB
subgraph "DataMetronome Platform"
A[π UI Dashboard] --> B[π DataPulse Connectors]
B --> C[π PostgreSQL Database]
B --> D[π€ Anomaly Detection Engine]
D --> E[π ML Algorithms]
D --> F[π Statistical Analysis]
A --> G[π± Real-time Monitoring]
G --> H[π¨ Alert System]
end
subgraph "Data Sources"
I[ποΈ PostgreSQL]
J[π SQLite]
K[π Custom Connectors]
end
C --> I
B --> J
B --> K
style A fill:#ff6b6b
style D fill:#4ecdc4
style E fill:#45b7d1
style F fill:#96ceb4
- 10x faster than traditional ORMs
- Real-time monitoring with sub-second response
- Scalable architecture for enterprise workloads
- Optimized UUID handling for distributed systems
Our comprehensive testing shows DataMetronome's superior performance:
- asyncpg: 34,981 records/sec (π₯ Winner)
- SQLAlchemy: 15,137 records/sec
- psycopg3: 1,615 records/sec
- psycopg3: 788 queries/sec (π₯ Winner)
- asyncpg: 515 queries/sec
- SQLAlchemy: 451 queries/sec
- β Star the repository on GitHub
- π Report bugs and request features
- π» Contribute code and documentation
- π¬ Join discussions in our community
- π Read the documentation
- π Try the quick start guide
- π― Explore the dashboard features
- π§ Customize for your use case
- π Documentation Hub - Complete documentation index
- π Quick Start Guide - Get started in 5 minutes
- π API Reference - Complete API documentation
- ποΈ Architecture Guide - System design and diagrams
- π¨βπ» Development Guide - Contributing to DataMetronome
- π Deployment Guide - Production deployment strategies
- πΊοΈ Roadmap - Future plans and priorities
- π€ Contributing Guidelines - How to contribute
- π΅ Community Demo - Full demonstration
- Proactive monitoring - Catch issues before they become problems
- Real-time insights - Immediate visibility into data health
- Easy integration - Works with existing PostgreSQL databases
- Extensible platform - Add custom anomaly detection rules
- Infrastructure monitoring - Track database health and performance
- Automated alerting - Get notified of data quality issues
- Performance metrics - Monitor query performance and bottlenecks
- Easy deployment - Docker support for containerized environments
- Data quality assurance - Ensure ML models have clean data
- Anomaly detection - Identify outliers and data drift
- Statistical analysis - Built-in statistical tools and visualizations
- ML integration - Use our algorithms or integrate your own
- Q1 2024 β - Core DataPulse connectors, basic anomaly detection, UI prototype
- Q2 2024 π - Advanced ML algorithms, real-time streaming, alert system
- Q3 2024 π - Multi-database support, advanced analytics, API integrations
- Q4 2024 π - Community features, plugin system, advanced reporting
This project is licensed under the MIT License - see the LICENSE file for details.
- Team: [email protected]
- Website: https://datametronome.dev
- GitHub: https://github.com/datametronome
- Community: https://community.datametronome.dev
π΅ DataMetronome - Making data quality better for everyone!
Built with β€οΈ by the open source community
