MedChat

Philip Liu, Sparsh Bansal, Jimmy Dinh, Aditya Pawar, Ramani Satishkumar, Shail Desai, Neeraj Gupta, Xin Wang, Shu Hu

A modular framework that uses modern vision back-ends with role-specialised LLM agents to draft glaucoma diagnostic reports from retinal fundus images. The core idea is for each agent to focus on a narrow clinical role, while a Director agent synthesises their opinions into a concise, clinically-grounded report.

Method Overview

Vision Pre-processing
- Classifier (SwinV2) → glaucoma probability p (binned into “no glaucoma / possible glaucoma / likely glaucoma / glaucoma detected”).
- Segmentor (SegFormer) → optic-cup & disc masks → cup-to-disc ratio (CDR).
Core Prompt
Natural-language sentences summarise p and CDR, optionally appended with clinician notes.
Role Generation
A meta-prompt asks GPT-4.1 to list relevant clinical roles (e.g., Ophthalmologist, Optometrist, Pharmacist).
Role-Specialised Sub-Reports
Each role gets the core prompt plus narrow instructions and writes a focused sub-report.
Director Synthesis
Another GPT-4.1 instance combines all sub-reports, resolves minor conflicts, and produces one clean diagnostic report.
Output & Interface
The final report, probability, CDR, and sub-reports are returned to the client (e.g., MedChat front-end) for interactive Q&A and PDF export.

✨ Key Contributions

#	Contribution	Why it matters
1	Multi-agent reasoning: Ophthalmologist, Optometrist, Pharmacist, … plus a Director agent	Reduces hallucinations and reflects real-world collaboration
2	Tight CAD ⇄ LLM loop: SwinV2 classifier (glaucoma probability) + SegFormer segmentor (optic-cup/optic-disc masks)	Keeps language output anchored to verifiable image features (e.g., CDR)
3	MedChat interface (browser-based)	Enables interactive Q&A and PDF report download for clinicians and learners – see `frontend/` for code; no external link included

Repository Layout

├── backend/              # Python API, multi-agent pipeline, model wrappers
│   ├── cad/              #  SwinV2 classifier & SegFormer segmentor
│   ├── agents/           #  Role prompts, Director logic
│   └── api.py            #  FastAPI / Flask endpoints
├── frontend/             #  Lightweight JS + HTML MedChat client
└── README.md

MedChat Interface

The repo ships with a minimal browser client that:

uploads a fundus image + optional notes,
streams sub-reports in real time,
allows follow-up Q&A with full conversation memory, and
exports the complete conversation as a styled PDF report.

Citation

If you use this work, please cite:

@inproceedings{liu2025multiagent,
  title     = {Multi-Agent Diagnosis using Multimodal Large Language Models},
  author    = {Liu, Philip and Bansal, Sparsh and Dinh, Jimmy and Pawar, Aditya and Satishkumar, Ramani and Gupta, Neeraj and Wang, Xin and Hu, Shu},
  booktitle = {IEEE Int'l Conf. on Multimedia Information Processing and Retrieval (MIPR)},
  year      = {2025}
}

License

This project is released under the MIT License (see LICENSE).

Contact

For questions or collaboration requests, please open an issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MedChat

Method Overview

✨ Key Contributions

Repository Layout

MedChat Interface

Citation

License

Contact

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MedChat

Method Overview

✨ Key Contributions

Repository Layout

MedChat Interface

Citation

License

Contact