π New to this project? Start with GETTING_STARTED.md for the fastest setup!
β οΈ IMPORTANT DISCLAIMER: This is a demonstration and educational tool only. It is NOT production-ready software. Do not use for production code reviews or critical decisions. Use at your own risk.
A comprehensive demonstration of the Mixture of Agents pattern using AutoGen, inspired by feed-forward neural network architectures. This demo showcases a multi-perspective code review system that leverages specialized agents organized in layers for comprehensive code analysis.
Purpose: This project is intended for:
- π Education: Learning about multi-agent systems and Mixture of Agents patterns
- π¬ Research: Exploring agent-based architectures and explainability
- π§ͺ Experimentation: Testing and understanding AI agent collaboration
- π Demonstration: Showcasing concepts and patterns
NOT intended for:
- β Production code reviews
- β Critical decision-making
- β Real-world deployment without significant modifications
The Mixture of Agents is a multi-agent design pattern that solves complex tasks by leveraging the collective intelligence of multiple, specialized agents across a series of processing stages. This implementation demonstrates the pattern through a practical code review use case.
Complete multi-layer architecture diagram showing the full Mixture of Agents workflow: User Input β Orchestrator Agent β Layer 1 (4 specialized agents: Security, Performance, Quality, Best Practices) β Layer 1 Synthesis β Layer 2 (3 refinement agents: Validation, Priority, Enhancement) β Layer 2 Synthesis β Layer 3 (Integration Agent) β Final Review Report. The diagram illustrates the hierarchical flow with all agent interactions and synthesis points.
Interactive Code Review Chat Interface featuring: Welcome message with robot hand icon, Quick Start guide (3 steps), "What I Analyze" section explaining the four analysis areas (Security π΄, Performance π‘, Code Quality π΅, Best Practices π’), and prompt to try sample code. The interface uses a clean white background with clear navigation tabs at the top.
Review processing interface showing: "How It Works" information box explaining the multi-agent parallel analysis process, prominent "Review This Code" button, "Starting Code Review..." status with robot icon, and Layer 3 Final Synthesis progress box with yellow/orange styling showing "Creating comprehensive review report..." with a nearly complete blue progress bar below.
Detailed Review Results dashboard displaying: Review Summary section with overall score card showing 0/100 with "Needs Improvement" status, warning message indicating "Code quality needs improvement. Many issues found", and Total Issues count of 143. The Issues by Category section shows four color-coded cards: Security (43 issues - red), Performance (31 issues - yellow), Quality (32 issues - blue), and Best Practices (37 issues - green), each with expandable "View X issues" buttons.
Comprehensive Code Review Report interface with collapsible sections: Executive Summary, Layer-by-Layer Analysis, and expanded Security Agent analysis showing "Comprehensive Security Analysis" with detailed vulnerability findings. Highlights include SQL Injection (CVE-2022-2615) in the login function with Critical severity, risk assessment, and specific recommendations for parameterized queries and ORM usage. Also shows Cross-Site Scripting (XSS) vulnerability details with code location references and actionable remediation steps.
graph TB
User[User Input<br/>Code to Review] --> Orchestrator[Orchestrator Agent]
Orchestrator --> L1A1[Security Agent<br/>Layer 1]
Orchestrator --> L1A2[Performance Agent<br/>Layer 1]
Orchestrator --> L1A3[Code Quality Agent<br/>Layer 1]
Orchestrator --> L1A4[Best Practices Agent<br/>Layer 1]
L1A1 --> L1Synthesis[Layer 1 Synthesis]
L1A2 --> L1Synthesis
L1A3 --> L1Synthesis
L1A4 --> L1Synthesis
L1Synthesis --> L2A1[Validation Agent<br/>Layer 2]
L1Synthesis --> L2A2[Priority Agent<br/>Layer 2]
L1Synthesis --> L2A3[Enhancement Agent<br/>Layer 2]
L2A1 --> L2Synthesis[Layer 2 Synthesis]
L2A2 --> L2Synthesis
L2A3 --> L2Synthesis
L2Synthesis --> L3A1[Integration Agent<br/>Layer 3]
L3A1 --> FinalReport[Final Review Report]
style Orchestrator fill:#4A90E2,stroke:#2E5C8A,stroke-width:3px
style L1Synthesis fill:#E8F4F8,stroke:#4A90E2,stroke-width:2px
style L2Synthesis fill:#E8F4F8,stroke:#4A90E2,stroke-width:2px
style FinalReport fill:#50C878,stroke:#2E7D4E,stroke-width:3px
This is a demonstration tool, not production software:
- Results may contain errors or inaccuracies
- The system is not optimized for performance
- No guarantees about reliability or correctness
- Use for learning and experimentation only
- Do not rely on outputs for production decisions
Recommended Hardware:
- CPU: Apple Silicon (M1/M2/M3) or modern multi-core processor
- RAM: 8GB minimum, 16GB+ recommended
- Storage: 5GB+ free space for models and dependencies
- GPU: Optional but recommended for faster processing (CUDA/ROCm/Metal)
Tested Configurations:
- β Apple M1 Mac (16GB RAM) - Processing time: ~5 minutes per review
- β Apple M2/M3 Mac - Similar performance to M1
- β Linux/Windows with NVIDIA GPU - Faster processing with CUDA support
β οΈ CPU-only systems - Slower processing, may take 10-15 minutes per review
Performance Notes:
- GPU usage will increase significantly during agent processing (up to 81% GPU usage observed)
- Processing time varies based on:
- Hardware capabilities (CPU/GPU performance)
- Model size (larger models = slower but more accurate)
- Code complexity (longer code = more processing time)
- System load (other applications running)
- On an M1 Mac with 16GB RAM, expect ~5 minutes for a typical code review
- GPU acceleration is automatically used when available (Metal on Mac, CUDA on NVIDIA)
- Python 3.8 or higher
- Ollama installed and running locally
- An Ollama model downloaded (see Model Recommendations below)
-
Clone the repository:
git clone https://github.com/khaosans/mixture-of-agents.git cd mixture-of-agents -
Set up virtual environment (recommended):
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up Ollama:
# Make sure Ollama is running ollama serve # Pull a model (see Model Recommendations below) ollama pull llama3.2:3b # Recommended for most users # or ollama pull llama3.2 # Larger model, better quality
For Apple Silicon (M1/M2/M3 Mac):
- Recommended:
llama3.2:3b(3B parameters) - Good balance of speed and quality - Alternative:
llama3.2(larger model) - Better quality but slower - Fast Option:
phi3:mini- Faster processing, good for quick reviews
For High-End Systems (NVIDIA GPU, 16GB+ RAM):
- Recommended:
llama3.2orllama3.1:8b- Best quality results - Alternative:
mistral:7b- Excellent code understanding
For CPU-Only Systems:
- Recommended:
llama3.2:3borphi3:mini- Smaller models for faster CPU processing - Note: Processing will be slower (10-15 minutes), consider using smaller models
Model Size vs. Performance:
- Small models (3B): Faster processing (~3-5 min), good quality
- Medium models (7B-8B): Moderate speed (~5-8 min), better quality
- Large models (13B+): Slower processing (~10-15 min), best quality
π‘ Tip: Start with llama3.2:3b and upgrade to larger models if you need better analysis quality and have the hardware to support it.
-
Run the application:
# Option 1: Using the start script ./scripts/start_app.sh # Option 2: Direct Streamlit command streamlit run ui_app.py
-
Open your browser: Navigate to
http://localhost:8501
For automated setup, use the provided script:
./scripts/setup_environment.shThis script will:
- Check Python version
- Verify Ollama installation
- Check for required models
- Set up the environment
On Apple M1 Mac (16GB RAM):
- Small code snippets (< 100 lines): ~3-4 minutes
- Medium code files (100-500 lines): ~5-6 minutes
- Large code files (500+ lines): ~7-10 minutes
On Systems with NVIDIA GPU:
- Small code snippets: ~1-2 minutes
- Medium code files: ~2-4 minutes
- Large code files: ~4-6 minutes
On CPU-Only Systems:
- Small code snippets: ~8-12 minutes
- Medium code files: ~12-18 minutes
- Large code files: ~18-25 minutes
During code review processing, you can expect:
- CPU Usage: 40-60% (multi-core utilization across agent processing)
- GPU Usage: 70-85% (when GPU acceleration available - Metal on Mac, CUDA on NVIDIA)
- RAM Usage: 2-4GB (varies by model size and code complexity)
- Storage: Minimal (cached results stored in SQLite database)
Why GPU Usage Increases: The Mixture of Agents system processes multiple LLM calls in parallel across different layers. Each agent makes independent LLM requests to analyze code from different perspectives (security, performance, quality, best practices). When GPU acceleration is available:
- Metal (Apple Silicon): Automatically utilized by Ollama for faster inference
- CUDA (NVIDIA): Used when CUDA-enabled Ollama is installed
- Parallel Processing: Multiple agents running simultaneously = higher GPU utilization
This high GPU usage (up to 81% observed) is normal and expected - it indicates the system is efficiently using available hardware to process multiple agent analyses in parallel. The GPU handles the neural network computations for each LLM request, making processing significantly faster than CPU-only execution.
π‘ Performance Tips:
- Close other GPU-intensive applications during processing for best performance
- Use smaller models (
llama3.2:3b) if processing time is a concern - Ensure adequate RAM (16GB+ recommended for best performance)
- GPU acceleration happens automatically - no configuration needed
- Processing time scales with code complexity and model size
- Security Agent: Identifies vulnerabilities, injection risks, authentication issues
- Performance Agent: Analyzes algorithmic complexity, bottlenecks, optimization opportunities
- Code Quality Agent: Reviews readability, maintainability, code smells, design patterns
- Best Practices Agent: Checks language conventions, documentation, error handling
- Validation Agent: Cross-validates findings, checks for false positives
- Priority Agent: Ranks issues by severity and impact
- Enhancement Agent: Suggests specific improvements with code examples
- Integration Agent: Combines all perspectives into a cohesive review report
- User submits code for review
- Checkpoint Created: System creates initial checkpoint
- Orchestrator dispatches code to Layer 1 agents (parallel processing)
- Checkpoint Saved: After Layer 1 completes
- Orchestrator synthesizes Layer 1 results
- Synthesized results are dispatched to Layer 2 agents (parallel processing)
- Checkpoint Saved: After Layer 2 completes
- Orchestrator synthesizes Layer 2 results
- Layer 3 agent creates final comprehensive report
- Checkpoint Cleaned: Upon successful completion
- User receives detailed review with actionable recommendations
The application includes automatic checkpointing:
- Automatic Saving: Checkpoints saved after each layer completes
- Resume Capability: Interrupted reviews can be resumed from any checkpoint
- Progress Preservation: No work is lost if the review is interrupted
- Status Tracking: Monitor checkpoint status (in_progress, completed, failed)
See CHECKPOINT_SYSTEM.md for detailed documentation.
- Interactive Web Interface: Streamlit-based UI for easy interaction with chat-style interface
- Real-time Visualization: See agents working through Mermaid diagrams and Plotly charts
- Explainability: Understand which agent found which issue and how results were synthesized
- Comprehensive Reporting: Detailed review reports with categorized findings and priority rankings
- Sample Code Library: Pre-loaded examples demonstrating different issue types
- Checkpoint System: Automatic checkpointing allows resuming interrupted reviews
- Model Detection: Automatic model resolution with fallback support
- Progress Tracking: Real-time progress bars and status updates during processing
- Database Integration: SQLite for review history and ChromaDB for semantic search
This implementation is based on research in Mixture of Experts architectures. For complete references and citations, see Research Documentation.
-
Zhang, Y., Davoodi, A., & Hu, J. (2018). A mixture of expert approach for low-cost customization of deep neural networks. arXiv preprint arXiv:1811.00056. https://arxiv.org/abs/1811.00056
Demonstrates MoE for DNN customization with Global and Local Experts. -
Jordan, M. I., & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6(2), 181-214. https://doi.org/10.1162/neco.1994.6.2.181
Foundational work establishing the mathematical framework for hierarchical MoE architectures. -
Yuksel, S. E., Wilson, J. N., & Gader, P. D. (2012). Twenty years of mixture of experts. IEEE Transactions on Neural Networks and Learning Systems, 23(8), 1177-1193. https://doi.org/10.1109/TNNLS.2012.2200299
Comprehensive survey covering two decades of MoE evolution and applications.
Dimik, D. (2024, November 19). Using small language models in a mixture of experts paradigm. AI Engineering Meeting. Albina Public Library, Room 1A, Portland, Oregon.
Presentation discussing practical applications of MoE with smaller models, highlighting how combining multiple specialized language models within an MoE framework can achieve performance comparable to larger models while reducing computational requirements.
Microsoft. (2024). AutoGen: A framework for enabling next-generation LLM applications with multi-agent conversations. https://github.com/microsoft/autogen
Official AutoGen design patterns documentation for Mixture of Agents implementation.
π For complete research references, citations, and additional reading, see Research Documentation
π¬ To understand how research concepts were translated into code, see Research Implementation Guide (includes visual diagrams and code examples)
When reviewing code, different engineers focus on different aspects: security vulnerabilities, performance bottlenecks, code quality/maintainability, and adherence to best practices. A single agent might miss critical issues that a specialized agent would catch.
- Real Need: Teams need comprehensive code reviews but lack time for multiple reviewers
- Specialization Required: Security, performance, and quality require different expertise
- Iterative Refinement: Initial findings need validation and prioritization
- Actionable Output: Produces structured review reports with prioritized recommendations
The system produces comprehensive review reports including:
- Executive summary
- Categorized findings (Security, Performance, Quality, Best Practices)
- Priority rankings (Critical, High, Medium, Low)
- Specific code suggestions with examples
- Overall code health score
- Actionable recommendations
mixture-of-agents/
βββ README.md # This file - main documentation
βββ LICENSE # MIT License
βββ pyproject.toml # Modern Python packaging configuration
βββ setup.py # Package setup script
βββ MANIFEST.in # Package manifest
βββ requirements.txt # Production dependencies
βββ requirements-dev.txt # Development dependencies
βββ docker-compose.yml # Docker Compose configuration
βββ Dockerfile # Docker container definition
β
βββ mixture_of_agents/ # Main Python package
β βββ __init__.py # Package initialization
β βββ core/ # Core functionality
β β βββ __init__.py
β β βββ agents.py # MixtureOfAgentsSystem, Orchestrator, WorkerAgent
β β βββ config.py # Configuration management
β β βββ exceptions.py # Custom exception classes
β βββ utils/ # Utility modules
β β βββ __init__.py
β β βββ data_utils.py # Data normalization utilities
β β βββ error_handling.py # Error handling utilities
β β βββ streamlit_utils.py # Streamlit utility functions
β βββ providers/ # External service providers
β β βββ __init__.py
β β βββ slm_provider.py # Hugging Face SLM support
β β βββ model_detector.py # Model detection and resolution
β βββ database/ # Database modules
β β βββ __init__.py
β β βββ database.py # SQLite + ChromaDB implementation
β βββ ui/ # UI components
β β βββ __init__.py
β β βββ app.py # Streamlit web interface
β βββ visualization/ # Visualization modules
β β βββ __init__.py
β β βββ charts.py # Plotly charts and Mermaid diagrams
β βββ explainability/ # Explainability features
β βββ __init__.py
β βββ analyzer.py # Explainability analyzer
β
βββ case_studies/ # Use case implementations
β βββ code_review.py # Code review use case
β
βββ sample_code/ # Sample code files for testing
β βββ vulnerable_code.py
β βββ slow_code.py
β βββ messy_code.py
β βββ incomplete_code.py
β
βββ docs/ # All documentation
β βββ README.md # Documentation index
β βββ architecture.md # Technical architecture deep dive
β βββ changelog.md # Version history and changes
β βββ contributing.md # Contribution guidelines
β βββ code_of_conduct.md # Community guidelines
β βββ disclaimer.md # Important disclaimers
β βββ quickstart.md # Quick start guide
β βββ api.md # API documentation
β βββ examples.md # Usage examples
β βββ troubleshooting.md # Troubleshooting guide
β βββ guides/ # Implementation guides
β β βββ checkpoint_system.md
β β βββ model_detection.md
β β βββ environment_setup.md
β β βββ threading_analysis.md
β β βββ resilience_improvements.md
β β βββ python_best_practices.md
β β βββ package_reorganization.md
β βββ summaries/ # Implementation summaries
β β βββ implementation_summary.md
β β βββ reorganization_summary.md
β β βββ verification_complete.md
β βββ images/ # Images and screenshots
β βββ screenshots/ # Application screenshots
β βββ environment_setup/ # Setup-related images
β
βββ scripts/ # Utility scripts
β βββ check_errors.py # Error checking utility
β βββ setup_environment.sh # Environment setup
β βββ start_app.sh # Application launcher
β βββ run_tests.sh # Test runner
β βββ verify_and_fix.sh # Verification script
β
βββ tests/ # Test files
βββ test_basic.py
βββ test_ui.py
βββ test_basic.py # Basic unit tests
βββ test_ui.py # UI tests
### Key Directories
- **Root Level:** Core application files and configuration
- **`docs/`:** All documentation including API docs, examples, and images
- **`sample_code/`:** Example code files for testing the review system
- **`scripts/`:** Utility scripts for setup, testing, and maintenance
- **`tests/`:** Test suites for validation
## π§ Configuration
Configuration is managed through `config.py` and environment variables:
- `OLLAMA_BASE_URL`: Ollama server URL (default: `http://localhost:11434`)
- `OLLAMA_MODEL`: Model to use (default: `llama3.2`)
- `NUM_LAYER1_AGENTS`: Number of Layer 1 agents (default: 4)
- `NUM_LAYER2_AGENTS`: Number of Layer 2 agents (default: 3)
- `NUM_LAYER3_AGENTS`: Number of Layer 3 agents (default: 1)
## π Usage Examples
### Basic Usage
```python
from mixture_of_agents_demo import MixtureOfAgentsSystem
system = MixtureOfAgentsSystem()
result = await system.review_code("""
def example():
pass
""")
print(result["final_result"])
- Start the UI:
streamlit run ui_app.py - Navigate to the "Code Review" tab
- Enter code or load a sample
- Click "Start Code Review"
- View results, visualizations, and explainability analysis
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- Microsoft AutoGen team for the framework
- Daniel Dimik for the inspiring presentation on Small Language Models in MoE
- The research community for foundational work on Mixture of Experts architectures
- Not Production-Ready: This is a demo/educational tool
- Performance: Not optimized for speed or efficiency
- Accuracy: Results may contain errors or incomplete analyses
- Reliability: System may fail or produce unexpected outputs
- Security: Not designed with security best practices for production
- Scalability: Not designed to handle large-scale deployments
β Appropriate Uses:
- Learning about multi-agent systems
- Understanding Mixture of Agents patterns
- Educational demonstrations
- Research and experimentation
- Prototyping concepts
β Inappropriate Uses:
- Production code reviews
- Critical decision-making
- Security-critical code analysis
- Automated code approval processes
- Any scenario requiring guaranteed accuracy
THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND. The authors and contributors are not responsible for any damages or issues arising from the use of this software. Use at your own risk.
Performance Optimization:
- Implement caching layer for agent responses to reduce redundant LLM calls
- Add parallel processing optimization for faster multi-agent execution
- Optimize database queries and indexing for review history
- Implement incremental review updates for large codebases
User Experience:
- Add support for multiple programming languages (JavaScript, Java, Go, Rust)
- Implement real-time progress updates with WebSocket connections
- Add export functionality for review reports (PDF, Markdown, JSON)
- Create comparison view for before/after code changes
- Add custom agent configuration options
Code Quality:
- Expand test coverage for core agent functionality
- Add integration tests for end-to-end workflows
- Implement comprehensive error handling and recovery
- Add logging and monitoring capabilities
Advanced Features:
- Support for multi-file code reviews and project-level analysis
- Integration with popular IDEs (VS Code, PyCharm, IntelliJ)
- Git integration for automatic review on commits/PRs
- Custom agent creation interface for domain-specific reviews
- Support for team collaboration and review sharing
Model Improvements:
- Fine-tune models on code review datasets
- Implement model selection based on code type/complexity
- Add support for larger context windows
- Experiment with different MoE architectures
Infrastructure:
- Docker containerization for easy deployment
- Kubernetes support for scalable deployments
- Cloud deployment guides (AWS, GCP, Azure)
- API server mode for programmatic access
Research & Development:
- Explore advanced MoE patterns (hierarchical, dynamic routing)
- Research explainability techniques for agent decisions
- Investigate federated learning for distributed agent training
- Study human-in-the-loop feedback mechanisms
Production Readiness:
- Comprehensive security audit and hardening
- Performance benchmarking and optimization
- Scalability testing and load balancing
- Production deployment documentation
- SLA and reliability guarantees
Community & Ecosystem:
- Plugin system for custom agent types
- Marketplace for pre-trained agent configurations
- Community-contributed agent templates
- Integration with CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins)
We welcome contributions! Areas where help is especially appreciated:
- Documentation: Improving guides, tutorials, and API documentation
- Testing: Adding test cases and improving coverage
- Performance: Optimizing agent execution and reducing latency
- Features: Implementing items from the future work list above
- Bug Fixes: Identifying and fixing issues
See CONTRIBUTING.md for guidelines on how to contribute.
This project provides several research opportunities:
- Agent Coordination: Study optimal communication patterns between agents
- Synthesis Strategies: Research best practices for combining multi-agent outputs
- Explainability: Develop techniques to explain agent reasoning
- Evaluation Metrics: Create benchmarks for multi-agent code review systems
- Efficiency: Investigate ways to reduce computational costs while maintaining quality
For issues, questions, or contributions, please open an issue on GitHub.
Note: This is a demonstration project. Support is provided on a best-effort basis for educational purposes only.
Built with β€οΈ to demonstrate the power of multi-agent systems