A production-grade Medical Question Answering system built on Retrieval-Augmented Generation (RAG). MedAI retrieves relevant passages from a curated medical knowledge base, reasons over them using a local language model, and returns plain-language answers with source citations and confidence scoring — all running entirely on your machine, with no external API calls.
MedAI is designed around a straightforward principle: answers should be grounded in real medical literature, not hallucinated. Every response is traceable back to a specific source document and page number. The system also flags low-confidence retrievals and blocks responses that contain unsafe medical advice patterns.
Key capabilities
- Semantic search over medical PDFs using dense vector embeddings
- Retrieval-Augmented Generation with FAISS inner-product similarity
- Local LLM inference via Ollama — no cloud dependency
- Safety layer that intercepts dosage instructions and self-harm language
- Confidence scoring per response based on retrieval similarity
- Collapsible reasoning steps and source citations in the UI
- Persistent chat history with rename and delete
- Dark and light theme
User Question
|
v
EmbeddingService sentence-transformers/all-MiniLM-L6-v2
(query embedding)
|
v
VectorStore.search FAISS IndexFlatIP
(top-k retrieval)
|
v
PromptBuilder Injects retrieved chunks into prompt
(context assembly)
|
v
GenerationService Ollama (phi3 / medgemma)
(LLM inference)
|
v
SafetyChecker Regex-based pattern matching
(response filtering)
|
v
AskResponse JSON -> Browser UI
(answer + sources + confidence)
| Layer | Technology |
|---|---|
| Web framework | FastAPI 0.115 + Uvicorn 0.30 |
| Embedding model | sentence-transformers/all-MiniLM-L6-v2 |
| Vector index | FAISS (faiss-cpu 1.13) |
| LLM backend | Ollama (phi3 by default, medgemma supported) |
| PDF ingestion | pdfplumber 0.11 |
| Data validation | Pydantic v2 |
| Frontend | Vanilla HTML / CSS / JavaScript (no framework) |
| Testing | pytest 8.3 |
| Language | Python 3.10+ |
MedAI/
├── data/
│ ├── oncology/ Raw PDF source documents
│ └── oncology_mvp/ Subset used for the MVP index
├── src/
│ ├── api/
│ │ ├── main.py FastAPI app factory, routes
│ │ ├── schemas.py Pydantic request/response models
│ │ └── static/ Frontend (HTML, CSS, JS)
│ ├── embeddings/
│ │ └── service.py EmbeddingService (SentenceTransformer + fallback)
│ ├── generation/
│ │ └── service.py GenerationService (Ollama HTTP client)
│ ├── ingestion/
│ │ ├── build_index.py CLI entry point for index construction
│ │ ├── chunker.py Static and semantic dynamic chunking
│ │ ├── cleaner.py PDF text normalisation
│ │ ├── loader.py pdfplumber PDF loader
│ │ └── pipeline.py Orchestrates load -> clean -> chunk
│ ├── prompt/
│ │ └── builder.py Prompt assembly with context injection
│ ├── retrieval/
│ │ ├── retriever.py Query embedding + FAISS search
│ │ └── vector_store.py FAISS index build, save, load
│ ├── safety/
│ │ └── checker.py Pattern-based safety filtering
│ └── models.py Shared dataclasses (Chunk, GenerationResult, etc.)
├── vector_db/
│ ├── index.faiss Serialised FAISS index
│ └── metadata.json Chunk metadata (source, page, text)
├── tests/ pytest test suite
├── requirements.txt
└── README.md
Home
Chat interface
Source citations and confidence
Click the image above to watch the demo video, or download it directly.
- Python 3.10 or higher
- Git
- Ollama installed and running locally
git clone https://github.com/2024yuva/MedAI.git
cd MedAIpython -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS / Linuxpip install -r requirements.txtollama pull phi3Any model listed in GenerationService.configured_models can be used. To switch models, update primary_model_name in src/generation/service.py.
Place your medical PDF files in data/oncology_mvp/, then run:
python -m src.ingestion.build_index --data-dir data/oncology_mvp --out-dir vector_dbThis extracts text from every PDF, splits it into semantically coherent chunks, embeds each chunk, and writes vector_db/index.faiss and vector_db/metadata.json.
To use fixed-size chunking instead of semantic chunking:
python -m src.ingestion.build_index --data-dir data/oncology_mvp --out-dir vector_db --static-chunkingpython -m uvicorn src.api.main:app --host 127.0.0.1 --port 8000Open http://127.0.0.1:8000 in your browser.
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Serves the web UI |
| GET | /health |
Overall system health (Ollama reachability, active model) |
| GET | /health/generation |
Detailed generation backend status |
| POST | /ask |
Submit a question, receive an answer |
| POST | /ablation |
Run all 4 ablation experiments for one question |
Request
{
"question": "What are the early symptoms of lung cancer?"
}Response
{
"answer": "Early symptoms of lung cancer include...",
"finalAnswer": "Early symptoms of lung cancer include...",
"reasoningSteps": ["...", "..."],
"sources": [
{
"sourceFile": "basics_of_oncology.pdf",
"pageNumber": 42,
"excerpt": "...",
"similarityScore": 0.87
}
],
"confidenceScore": 0.87,
"blocked": false,
"blockReason": null
}Use the built-in ablation runner to compare:
- Experiment 1: Full pipeline (LAQA + MRL + RAG)
- Experiment 2: No LAQA
- Experiment 3: No LAQA + No MRL
- Experiment 4: No RAG (direct LLM)
Create a JSON file like data/ablation_dataset.sample.json:
[
{
"question": "What are the symptoms of lung cancer?",
"reference": "Common symptoms include persistent cough, coughing blood, chest pain, shortness of breath, unexplained weight loss..."
}
]python -m src.ablation.study --dataset data/ablation_dataset.sample.json --output-dir reportsThis writes flat result files directly into reports/:
experiment_1_per_question.csv,experiment_1_summary.csv,experiment_1_summary.jsonexperiment_2_per_question.csv,experiment_2_summary.csv,experiment_2_summary.jsonexperiment_3_per_question.csv,experiment_3_summary.csv,experiment_3_summary.jsonexperiment_4_per_question.csv,experiment_4_summary.csv,experiment_4_summary.jsonablation_summary.csv,ablation_summary.jsonquality_metrics.pnglatency_breakdown.png
pytest tests/| Setting | Location | Default |
|---|---|---|
| LLM model name | src/generation/service.py |
phi3 |
| Ollama base URL | src/generation/service.py |
http://127.0.0.1:11434 |
| Request timeout | src/generation/service.py |
120 seconds |
| Embedding model | src/embeddings/service.py |
all-MiniLM-L6-v2 |
| Retrieval top-k | src/api/main.py |
3 |
| Chunk size (static) | src/ingestion/chunker.py |
500 tokens, 100 overlap |
| Chunk size (dynamic) | src/ingestion/chunker.py |
120–420 tokens, similarity threshold 0.72 |
| Vector DB path | src/api/main.py |
vector_db/ |
The SafetyChecker scans every generated response before it reaches the user. Responses are blocked if they contain:
- Explicit dosage instructions (e.g. "take 500mg of...")
- Prescription drug names combined with dosage language
- Self-harm or suicide-related language
Blocked responses return a safe fallback message. The system appends a "This is not medical advice" disclaimer to all unblocked responses.
- Support for additional LLMs (Llama 3, Mistral, MedGemma)
- Domain-specific embedding model for improved retrieval accuracy
- Multilingual query support
- Clinical report summarisation
- Voice input interface
- Mobile-optimised deployment
2024yuva - Yuvarrunjitha R S
B.E. Computer Science and Engineering (AI & ML)
MIT


