Transform complex legal documents into clear, understandable insights with AI-powered analysis.
An intelligent legal document analysis platform that helps individuals and small businesses understand contracts, agreements, and legal documents without expensive legal consultations. Upload your document, get instant AI analysis, risk assessment, and ask questions in plain language.
Legalese-to-Simplese democratizes legal document understanding by:
- π Analyzing Legal Documents - Upload PDFs, DOCs, or paste text for instant analysis
- π― Risk Assessment - Identifies high, medium, and low-risk clauses with explanations
- π‘ Plain Language Translation - Converts legal jargon into simple, understandable terms
- π¬ Interactive Q&A - Ask questions about your contract and get AI-powered answers
- π Smart Search - Retrieves relevant document sections using semantic search
- β‘ Real-time Processing - Get comprehensive analysis in under 60 seconds
Frontend (React + Vite)
- Modern, responsive web interface
- Real-time document upload and analysis
- Interactive chat interface for Q&A
- Visual risk assessment dashboard
API Gateway (FastAPI)
- RESTful API with automatic documentation
- CORS-enabled for cross-origin requests
- Structured logging and error handling
- File upload and processing pipeline
Backend Services
- Document Controller: Handles file uploads and orchestrates processing
- Upload Handler: Processes PDFs/TXT files and extracts text
- Q&A Service: Manages question-answering with context retrieval
- Distribution Service: Coordinates document analysis workflow
AI & Search Layer
- LLMs (Ollama):
gpt-ossfor text generation and analysisnomic-embed-textfor semantic embeddings
- Elasticsearch: Vector search for document retrieval and context matching
- S3 Storage: Document persistence and backup
Data Flow
- User uploads document via web portal
- API Gateway routes to Document Controller
- Upload Handler extracts text and creates embeddings
- Document chunks stored in Elasticsearch with metadata
- LLM analyzes document structure, risks, and key terms
- Results returned to frontend for display
- Q&A queries search Elasticsearch for relevant context
- LLM generates answers based on retrieved document sections
- Automatic Classification: Identifies document type (rental agreement, employment contract, NDA, etc.)
- Purpose Extraction: Summarizes the main objective in plain language
- Key Highlights: Extracts critical obligations, rights, and deadlines
- Risk Scoring: 1-10 scale with categorized risk breakdown
- High-Risk Identification: Flags potentially problematic clauses
- Medium-Risk Warnings: Highlights areas needing attention
- Low-Risk Notes: Documents minor concerns
- Detailed Explanations: Each risk includes title and description
- Legal Jargon Translation: Explains complex terms in simple language
- Contextual Definitions: Terms explained within document context
- Searchable Reference: Quick lookup for unfamiliar terminology
- Context-Aware Answers: AI references actual document content
- Suggested Questions: Pre-generated relevant questions
- Natural Language: Ask questions as you would to a lawyer
- Real-time Responses: Instant answers with typing indicators
- Python 3.12+ - Backend runtime
- Node.js 18+ - Frontend development
- Ollama - Local LLM runtime (Install Guide)
- Elasticsearch 8.x - Vector search engine
- Docker (optional) - For containerized deployment
git clone https://github.com/yourusername/legalese-to-simplese.git
cd legalese-to-simplese# Install Ollama from https://ollama.ai/
# Pull required models
ollama pull gpt-oss:cloud
ollama pull nomic-embed-text
# Verify models are available
ollama list# Option A: Using Docker
docker run -d \
--name elasticsearch \
-p 9200:9200 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.11.0
# Option B: Using Docker Compose (included)
docker-compose up -d elasticsearch
# Verify Elasticsearch is running
curl http://localhost:9200cd backend
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure environment variables
cp .env.example .env
# Edit .env with your settings
# Run the backend
uvicorn main:app --reload --port 8000Note: You could also use uv to manage dependencies more efficiently
Backend will be available at: http://localhost:8000
API Documentation: http://localhost:8000/docs
cd frontend
# Install dependencies
npm install
# Configure environment variables
cp .env.example .env
# Edit .env to point to your backend
# Run the development server
npm run devFrontend will be available at: http://localhost:5173
legalese-to-simplese/
βββ backend/ # FastAPI backend
β βββ main.py # Application entry point
β βββ routers/ # API route handlers
β β βββ upload.py # Document upload endpoints
β β βββ qa.py # Q&A endpoints
β β βββ health.py # Health check endpoints
β βββ services/ # Business logic layer
β β βββ UploadService.py # Document processing
β β βββ qa_service.py # Question answering
β β βββ llm_service.py # LLM interactions
β β βββ elastic_search_service.py # Elasticsearch operations
β β βββ logging/ # Structured logging
β βββ clients/ # External service clients
β β βββ ollama.py # Ollama LLM client
β β βββ aws_client.py # AWS services (optional)
β βββ utils/ # Utility functions
β β βββ helper.py # PDF processing, text extraction
β βββ DTO/ # Data transfer objects
β β βββ DTO.py # Request/response models
β βββ tests/ # Test suite
β βββ requirements.txt # Python dependencies
β
βββ frontend/ # React frontend
β βββ src/
β β βββ pages/ # Page components
β β β βββ Home/ # Landing page
β β β βββ Upload/ # Document upload page
β β β βββ Analysis/ # Analysis results page
β β βββ components/ # Reusable components
β β β βββ CustomLoadingOverlay/
β β βββ contexts/ # React contexts
β β β βββ AnalysisContext.jsx
β β β βββ AnalysisProvider.jsx
β β βββ assets/ # Static assets
β β βββ main.jsx # Application entry point
β βββ public/ # Public assets
β βββ .env.example # Environment variables template
β βββ package.json # Node dependencies
β
βββ docker-compose.yaml # Docker services configuration
βββ architecture-diagram.png # System architecture diagram
βββ INTEGRATION_TASKLIST.md # Development roadmap
βββ README.md # This file
Create backend/.env:
# Elasticsearch Configuration
ELASTICSEARCH_URL=http://localhost:9200
ELASTICSEARCH_API_KEY= # Optional for local development
# Application Configuration
LOG_LEVEL=INFO
# AWS Configuration (Optional - for AWS services)
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_DEFAULT_REGION=us-east-1Create frontend/.env:
# Backend API Configuration
VITE_API_BASE_URL=http://localhost:8000
# API Endpoints (relative to base URL)
VITE_UPLOAD_ENDPOINT=/api/documents/upload
VITE_QA_ENDPOINT=/api/qa/ask
VITE_HEALTH_ENDPOINT=/api/healthPOST /api/documents/upload
Content-Type: multipart/form-data
Parameters:
- document: File (PDF, DOC, DOCX, TXT)
Response:
{
"success": true,
"document_id": "uuid",
"filename": "contract.pdf",
"document_analysis": {
"Document_Type": "Rental Agreement",
"Main_Purpose": "...",
"Key_Highlights": [...],
"Risk_Assessment": {...},
"Key_Terms": [...],
"Suggested_Questions": [...]
},
"extracted_text": "...",
"metadata": {...}
}POST /api/qa/ask
Content-Type: application/json
Body:
{
"question": "What happens if I pay rent late?",
"context": "Full document text..."
}
Response:
{
"question": "What happens if I pay rent late?",
"answer": "According to the contract, late payments...",
"status": "success"
}GET /api/health
Response:
{
"service": "legalese-to-simplese",
"status": "healthy",
"timestamp": "2025-01-15T10:30:00Z"
}cd backend
# Run all tests
pytest
# Run with coverage
pytest --cov=. --cov-report=html
# Run specific test file
pytest tests/test_upload.py -vcd frontend
# Run tests (when implemented)
npm test
# Run with coverage
npm test -- --coverage- Hero section with value proposition
- Feature highlights
- How it works section
- Call-to-action buttons
- Drag-and-drop file upload
- Paste text option
- File type validation
- Real-time processing status
- Security badges
- Summary Tab: Document overview and risk score
- Risk Assessment Tab: Categorized risks with severity levels
- Key Terms Tab: Legal terminology glossary
- Q&A Tab: Interactive chat interface
- No Data Persistence: Documents are processed in-memory (optional S3 backup)
- Local LLM: Uses Ollama for on-premise AI processing
- CORS Protection: Configured for specific origins
- File Validation: Type and size checks before processing
- Error Handling: Sanitized error messages to prevent information leakage
- Document upload and text extraction
- LLM-based document analysis
- Risk assessment and categorization
- Interactive Q&A with context retrieval
- Elasticsearch integration for semantic search
- Frontend-backend integration
- Real-time loading states and error handling
- Centralized API service layer (frontend)
- Enhanced error handling and retry mechanisms
- User authentication and session management
- Document comparison feature
- Export analysis reports (PDF, DOCX)
- Multi-language support
- Conversation history and saved analyses
- Advanced analytics dashboard
- Mobile responsive improvements
- Batch document processing
- Custom risk threshold configuration
See INTEGRATION_TASKLIST.md for detailed development tasks.
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
- Follow existing code style and conventions
- Add tests for new features
- Update documentation as needed
- Ensure all tests pass before submitting PR
Backend won't start
# Check if port 8000 is already in use
lsof -i :8000
# Verify Python version
python --version # Should be 3.12+
# Reinstall dependencies
pip install -r requirements.txt --force-reinstallFrontend can't connect to backend
# Verify backend is running
curl http://localhost:8000/api/health
# Check CORS configuration in backend/main.py
# Ensure frontend URL is in allow_origins
# Verify .env file exists and has correct API URL
cat frontend/.envOllama models not found
# List installed models
ollama list
# Pull missing models
ollama pull gpt-oss:cloud
ollama pull nomic-embed-text
# Verify Ollama is running
curl http://localhost:11434/api/tagsElasticsearch connection failed
# Check if Elasticsearch is running
curl http://localhost:9200
# Restart Elasticsearch
docker restart elasticsearch
# Check logs
docker logs elasticsearch- Backend README - Detailed backend documentation
- Frontend README - Frontend setup and structure
- Integration Tasklist - Development tasks and status
- API Documentation - Interactive API docs (when running)
- React 18 - UI framework
- Vite - Build tool and dev server
- React Router - Client-side routing
- Context API - State management
- CSS3 - Styling with animations
- FastAPI - Modern Python web framework
- Pydantic - Data validation
- LangChain - LLM orchestration
- Ollama - Local LLM runtime
- PyPDF2 - PDF text extraction
- Elasticsearch - Vector search and document storage
- Docker - Containerization
- Uvicorn - ASGI server
- S3 (optional) - Document storage
- Ollama - For providing excellent local LLM runtime
- FastAPI - For the amazing Python web framework
- Elasticsearch - For powerful search capabilities
- React Team - For the robust frontend framework
- LangChain - For LLM orchestration tools
If you find this project helpful, please consider giving it a star! β
Made with β€οΈ for everyone who's ever been confused by legal documents
