Predict attrition, personalize growth, and answer HR queries โ with generative AI utilizing Gemini 2.5 Pro and alternatively locally with Ollama
MindCare AI is an offline-first HR analytics platform that combines predictive modeling with local Large Language Models to help organizations retain talent, identify growth opportunities, and streamline HR operations without compromising data privacy.
- ๐จ Attrition Risk Radar: Predict attrition with explainable insights
- ๐ Sentiment & Theme Mining: Analyze feedback with local NLP
- ๐ Career Pathing & Upskilling: AI-powered development recommendations
- ๐ค HR Policy Copilot: RAG-powered chatbot for instant policy Q&A
- ๐ Privacy-First: 100% local processing, no external API calls
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HR Data Sources (local) โ
โ โข Pulse surveys (CSV) โ
โ โข HRIS exports (CSV/Parquet) โ
โ โข PTO & timesheets โ
โ โข L\&D catalog (CSV/Docs) โ
โ โข HR policy PDFs/Docs โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ (local file share)
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Data & Feature Layer (offline) โ
โ โข Ingestion: Python + DuckDB / SQLite โ
โ โข Cleansing & Anonymization: PII masking, hashing โ
โ โข Features: survey\_sentiment, workload\_ratio, skill\_gap\_score, โ
โ manager\_1on1\_cadence, tenure\_weeks, internal\_moves โ
โ โข Embeddings store for RAG: local (FAISS/Chroma on disk) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Ollama (LLMs local) โ โ Classical Models (local, Python) โ
โ โข mistral / llama3 โ โ โข Sentiment: VADER/TextBlob/spaCy โ
โ โข codellama for prompts โ โ โข Attrition: XGBoost/LogReg (sklearn) โ
โ โข RAG over HR docs โ โ โข Recommender: rules + similarity โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Streamlit/Gradio UI (local) โ
โ โข Risk dashboard & drilldowns โ
โ โข HR Copilot chat (RAG+LLM) โ
โ โข Career paths & export (CSV/JSON)โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Python 3.10+
- Google AI Studio's Gemini 2.5 Pro
- Ollama installed locally
- 8GB+ RAM (recommended), 10GB+ free disk space
- Clone the repository
git clone https://github.com/yourusername/mindcare-ai.git
cd mindcare-ai- Set up Python environment
python -m venv .venv
# macOS/Linux
source .venv/bin/activate
# Windows (PowerShell)
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt- Install and verify Ollama
# Follow install steps at https://ollama.ai
ollama pull mistral
ollama pull llama3
# Quick check
ollama run mistral "Hello, test message"- Prepare sample data (optional)
python scripts/generate_sample_data.py- Launch the app
streamlit run app/ui_app.py- Open the dashboard at
http://localhost:8501
mindcare/
โโโ app/
โ โโโ ui_app.py # Main Streamlit application
โ โโโ rag.py # RAG pipeline for HR Copilot
โ โโโ attrition.py # Attrition prediction models
โ โโโ features.py # Feature engineering utilities
โ โโโ recommender.py # Career pathing recommendations
โ โโโ storage.py # Data management utilities
โโโ data/
โ โโโ employees.csv # Employee master data
โ โโโ surveys.csv # Pulse survey responses
โ โโโ timesheets.csv # Work hours tracking
โ โโโ skills.csv # Employee skills matrix
โ โโโ policies/ # HR policy documents
โโโ models/
โ โโโ vector_index/ # FAISS/Chroma embeddings
โ โโโ attrition_lr.joblib # Trained attrition model
โโโ scripts/
โ โโโ generate_sample_data.py
โ โโโ data_pipeline.py
โโโ tests/
โโโ requirements.txt
โโโ README.md
- Org-wide attrition risk heatmap
- Drilldowns by practice/team/individual
- Top risk drivers (explainable AI)
- Export as CSV
- Personalized development plans
- Course/mentor recommendations
- Project rotation suggestions
- Track skill progression
- Natural-language Q&A over policy docs
- Source citations with RAG
- Handles complex scenarios
- Maintains conversation context
Risk Analysis
- โShow me teams with highest attrition riskโ
- โWhat are the top drivers for the Cloud practice?โ
Career Development
- โRecommend growth plan for employee E_1042โ
- โWhat skills are most in demand for Data Engineers?โ
Policy Questions
- โWhat is our parental leave policy?โ
- โHow many vacation days do I get?โ
- โWhatโs the process for requesting sabbatical?โ
Create a .env file in the repo root:
# Ollama Configuration
OLLAMA_HOST=localhost
OLLAMA_PORT=11434
OLLAMA_MODEL=mistral
# Data Configuration
DATA_PATH=./data
VECTOR_INDEX_PATH=./models/vector_index
# Privacy Settings
ENABLE_PII_MASKING=true
ANONYMIZATION_LEVEL=highEdit config/models.yaml:
attrition:
model_type: "logistic_regression"
features:
- "tenure_weeks"
- "survey_sentiment"
- "workload_ratio"
- "manager_1on1_cadence"
threshold: 0.3
rag:
chunk_size: 512
chunk_overlap: 50
top_k: 4
embedding_model: "sentence-transformers/all-MiniLM-L6-v2"Attrition Risk Export (attrition_risk_by_team.csv)
| team | avg_risk | top_driver_1 | top_driver_2 | top_driver_3 |
|---|---|---|---|---|
| Cloud | 0.31 | workload_ratio | low_1on1 | low_sentiment |
| Data | 0.27 | low_growth | workload_ratio | pto_spike |
| AI | 0.22 | low_recognition | low_1on1 | tenure_transition |
Career Plan JSON (career_plan_E_1042.json)
{
"employee_id": "E_1042",
"current_role": "Data Engineer",
"target_role": "Senior Data Engineer",
"skill_gaps": ["Snowflake perf tuning", "dbt testing"],
"courses": ["Snowflake Performance Deep Dive", "Advanced dbt: Testing & CI"],
"mentor": "M_309 (Senior DE, Bengaluru)",
"project_rotation": "FinTech ETL Revamp (4 weeks)"
}Note: Sample outputs and benchmarks are illustrativeโupdate with your own measurements.
- Local-only processing: No data leaves your infrastructure
- PII masking: Automatic anonymization of sensitive fields
- Role-based access: Configurable permissions
- Audit logging: Track system interactions
- Encryption: Optional at-rest encryption
| Model | Dataset Size | Training Time | Accuracy | Inference Time |
|---|---|---|---|---|
| Attrition (LogReg) | 10K employees | 2.3s | 0.76 AUC | 15ms |
| Sentiment (VADER) | 50K responses | N/A | 0.82 F1 | 5ms |
| RAG Pipeline | 500 documents | 45s indexing | ~90% hit@k | <2s |
# Unit tests
pytest tests/unit/
# Integration tests
pytest tests/integration/
# End-to-end tests
pytest tests/e2e/
# Performance tests
pytest tests/performance/ --benchmarkWe welcome contributions!
- Fork the repo
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit:
git commit -m "Add amazing feature" - Push:
git push origin feature/amazing-feature - Open a Pull Request
- โ Basic attrition prediction
- โ HR policy RAG system
- โ Streamlit dashboard
- โ Local deployment
- ๐ Advanced ML models (XGBoost, Neural Networks)
- ๐ What-if scenario simulation
- ๐ HRIS integrations
- ๐ Mobile-responsive UI
- ๐ Fine-tuned domain-specific LLMs
- ๐ Real-time alerting
- ๐ Advanced analytics dashboard
- ๐ Multi-tenant support
- Privacy-First AI: Enterprise-grade AI without data leakage
- Practical Implementation: Actionable HR insights with accessible tech
- Open Innovation: Contribute to the open-source HR analytics ecosystem
- Local-First Architecture: Prove powerful AI can run fully offline
- Predictive HR Analytics
- Local Language Model Deployment
- Privacy-Preserving ML
- Explainable AI in HR Context
- Ganesh Sundaresan
- Shashank A
- Simran Singh
- Asma Khanum
Made with โค๏ธ by the MindCare AI Team