MedAI

A production-grade Medical Question Answering system built on Retrieval-Augmented Generation (RAG). MedAI retrieves relevant passages from a curated medical knowledge base, reasons over them using a local language model, and returns plain-language answers with source citations and confidence scoring — all running entirely on your machine, with no external API calls.

Overview

MedAI is designed around a straightforward principle: answers should be grounded in real medical literature, not hallucinated. Every response is traceable back to a specific source document and page number. The system also flags low-confidence retrievals and blocks responses that contain unsafe medical advice patterns.

Key capabilities

Semantic search over medical PDFs using dense vector embeddings
Retrieval-Augmented Generation with FAISS inner-product similarity
Local LLM inference via Ollama — no cloud dependency
Safety layer that intercepts dosage instructions and self-harm language
Confidence scoring per response based on retrieval similarity
Collapsible reasoning steps and source citations in the UI
Persistent chat history with rename and delete
Dark and light theme

Architecture

User Question
      |
      v
 EmbeddingService                  sentence-transformers/all-MiniLM-L6-v2
 (query embedding)
      |
      v
 VectorStore.search                FAISS IndexFlatIP
 (top-k retrieval)
      |
      v
 PromptBuilder                     Injects retrieved chunks into prompt
 (context assembly)
      |
      v
 GenerationService                 Ollama  (phi3 / medgemma)
 (LLM inference)
      |
      v
 SafetyChecker                     Regex-based pattern matching
 (response filtering)
      |
      v
 AskResponse                       JSON  ->  Browser UI
 (answer + sources + confidence)

Technology Stack

Layer	Technology
Web framework	FastAPI 0.115 + Uvicorn 0.30
Embedding model	sentence-transformers/all-MiniLM-L6-v2
Vector index	FAISS (faiss-cpu 1.13)
LLM backend	Ollama (phi3 by default, medgemma supported)
PDF ingestion	pdfplumber 0.11
Data validation	Pydantic v2
Frontend	Vanilla HTML / CSS / JavaScript (no framework)
Testing	pytest 8.3
Language	Python 3.10+

Project Structure

MedAI/
├── data/
│   ├── oncology/               Raw PDF source documents
│   └── oncology_mvp/           Subset used for the MVP index
├── src/
│   ├── api/
│   │   ├── main.py             FastAPI app factory, routes
│   │   ├── schemas.py          Pydantic request/response models
│   │   └── static/             Frontend (HTML, CSS, JS)
│   ├── embeddings/
│   │   └── service.py          EmbeddingService (SentenceTransformer + fallback)
│   ├── generation/
│   │   └── service.py          GenerationService (Ollama HTTP client)
│   ├── ingestion/
│   │   ├── build_index.py      CLI entry point for index construction
│   │   ├── chunker.py          Static and semantic dynamic chunking
│   │   ├── cleaner.py          PDF text normalisation
│   │   ├── loader.py           pdfplumber PDF loader
│   │   └── pipeline.py         Orchestrates load -> clean -> chunk
│   ├── prompt/
│   │   └── builder.py          Prompt assembly with context injection
│   ├── retrieval/
│   │   ├── retriever.py        Query embedding + FAISS search
│   │   └── vector_store.py     FAISS index build, save, load
│   ├── safety/
│   │   └── checker.py          Pattern-based safety filtering
│   └── models.py               Shared dataclasses (Chunk, GenerationResult, etc.)
├── vector_db/
│   ├── index.faiss             Serialised FAISS index
│   └── metadata.json           Chunk metadata (source, page, text)
├── tests/                      pytest test suite
├── requirements.txt
└── README.md

Screenshots

Home

Chat interface

Source citations and confidence

Demo

Click the image above to watch the demo video, or download it directly.

Prerequisites

Python 3.10 or higher
Git
Ollama installed and running locally

Setup

1. Clone the repository

git clone https://github.com/2024yuva/MedAI.git
cd MedAI

2. Create and activate a virtual environment

python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # macOS / Linux

3. Install dependencies

pip install -r requirements.txt

4. Pull a language model via Ollama

ollama pull phi3

Any model listed in GenerationService.configured_models can be used. To switch models, update primary_model_name in src/generation/service.py.

5. Build the vector index

Place your medical PDF files in data/oncology_mvp/, then run:

python -m src.ingestion.build_index --data-dir data/oncology_mvp --out-dir vector_db

This extracts text from every PDF, splits it into semantically coherent chunks, embeds each chunk, and writes vector_db/index.faiss and vector_db/metadata.json.

To use fixed-size chunking instead of semantic chunking:

python -m src.ingestion.build_index --data-dir data/oncology_mvp --out-dir vector_db --static-chunking

6. Start the server

python -m uvicorn src.api.main:app --host 127.0.0.1 --port 8000

Open http://127.0.0.1:8000 in your browser.

API Reference

Method	Endpoint	Description
GET	`/`	Serves the web UI
GET	`/health`	Overall system health (Ollama reachability, active model)
GET	`/health/generation`	Detailed generation backend status
POST	`/ask`	Submit a question, receive an answer
POST	`/ablation`	Run all 4 ablation experiments for one question

POST /ask

Request

{
  "question": "What are the early symptoms of lung cancer?"
}

Response

{
  "answer": "Early symptoms of lung cancer include...",
  "finalAnswer": "Early symptoms of lung cancer include...",
  "reasoningSteps": ["...", "..."],
  "sources": [
    {
      "sourceFile": "basics_of_oncology.pdf",
      "pageNumber": 42,
      "excerpt": "...",
      "similarityScore": 0.87
    }
  ],
  "confidenceScore": 0.87,
  "blocked": false,
  "blockReason": null
}

Ablation Study (Professional Report)

Use the built-in ablation runner to compare:

Experiment 1: Full pipeline (LAQA + MRL + RAG)
Experiment 2: No LAQA
Experiment 3: No LAQA + No MRL
Experiment 4: No RAG (direct LLM)

Dataset format

Create a JSON file like data/ablation_dataset.sample.json:

[
  {
    "question": "What are the symptoms of lung cancer?",
    "reference": "Common symptoms include persistent cough, coughing blood, chest pain, shortness of breath, unexplained weight loss..."
  }
]

Run study with metrics + graphs

python -m src.ablation.study --dataset data/ablation_dataset.sample.json --output-dir reports

This writes flat result files directly into reports/:

experiment_1_per_question.csv, experiment_1_summary.csv, experiment_1_summary.json
experiment_2_per_question.csv, experiment_2_summary.csv, experiment_2_summary.json
experiment_3_per_question.csv, experiment_3_summary.csv, experiment_3_summary.json
experiment_4_per_question.csv, experiment_4_summary.csv, experiment_4_summary.json
ablation_summary.csv, ablation_summary.json
quality_metrics.png
latency_breakdown.png

Running Tests

pytest tests/

Configuration

Setting	Location	Default
LLM model name	`src/generation/service.py`	`phi3`
Ollama base URL	`src/generation/service.py`	`http://127.0.0.1:11434`
Request timeout	`src/generation/service.py`	120 seconds
Embedding model	`src/embeddings/service.py`	`all-MiniLM-L6-v2`
Retrieval top-k	`src/api/main.py`	3
Chunk size (static)	`src/ingestion/chunker.py`	500 tokens, 100 overlap
Chunk size (dynamic)	`src/ingestion/chunker.py`	120–420 tokens, similarity threshold 0.72
Vector DB path	`src/api/main.py`	`vector_db/`

Safety

The SafetyChecker scans every generated response before it reaches the user. Responses are blocked if they contain:

Explicit dosage instructions (e.g. "take 500mg of...")
Prescription drug names combined with dosage language
Self-harm or suicide-related language

Blocked responses return a safe fallback message. The system appends a "This is not medical advice" disclaimer to all unblocked responses.

Roadmap

Support for additional LLMs (Llama 3, Mistral, MedGemma)
Domain-specific embedding model for improved retrieval accuracy
Multilingual query support
Clinical report summarisation
Voice input interface
Mobile-optimised deployment

Developer

2024yuva - Yuvarrunjitha R S
B.E. Computer Science and Engineering (AI & ML)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
reports		reports
src		src
tests		tests
vector_db		vector_db
.gitignore		.gitignore
EVALUATION_METRICS.md		EVALUATION_METRICS.md
MedAI-Home.jpg		MedAI-Home.jpg
MedAI-chat-sources.jpg		MedAI-chat-sources.jpg
MedAI-chat.jpg		MedAI-chat.jpg
MedAI-demo_video.mp4		MedAI-demo_video.mp4
README.md		README.md
README_METRICS.md		README_METRICS.md
SAMPLE_FORMATTED_REPORT.md		SAMPLE_FORMATTED_REPORT.md
cleaned_output.json		cleaned_output.json
covert_dataset.py		covert_dataset.py
format_metrics.py		format_metrics.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedAI

Overview

Architecture

Technology Stack

Project Structure

Screenshots

Demo

Prerequisites

Setup

1. Clone the repository

2. Create and activate a virtual environment

3. Install dependencies

4. Pull a language model via Ollama

5. Build the vector index

6. Start the server

API Reference

POST /ask

Ablation Study (Professional Report)

Dataset format

Run study with metrics + graphs

Running Tests

Configuration

Safety

Roadmap

Developer

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MedAI

Overview

Architecture

Technology Stack

Project Structure

Screenshots

Demo

Prerequisites

Setup

1. Clone the repository

2. Create and activate a virtual environment

3. Install dependencies

4. Pull a language model via Ollama

5. Build the vector index

6. Start the server

API Reference

POST /ask

Ablation Study (Professional Report)

Dataset format

Run study with metrics + graphs

Running Tests

Configuration

Safety

Roadmap

Developer

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages