Skip to content

Ashwin14101/Medical-Chatbot-With-LLMs-LangChain-Pinecone-Flask-AWS

Repository files navigation

πŸ₯ Medical Chatbot β€” RAG-Powered Q&A

A production-ready Retrieval-Augmented Generation (RAG) medical chatbot that answers clinical questions by retrieving relevant context from a curated medical knowledge base, powered by Google Gemini and Pinecone vector search.


πŸ“– About

This project is a full-stack AI medical chatbot built using state-of-the-art LLM and vector database technologies. It ingests medical reference PDFs, indexes them into a Pinecone vector store, and uses Google Gemini 2.5 Flash to answer clinical questions with context-grounded, hallucination-resistant responses.

The system follows a clean RAG (Retrieval-Augmented Generation) architecture:

  • Ingest: Medical PDFs are chunked, embedded, and stored in Pinecone
  • Retrieve: User queries are matched against stored vectors using cosine similarity
  • Generate: Gemini LLM synthesizes a concise, grounded answer using the retrieved context
  • Serve: A Flask REST API and HTML chat UI provide a complete user interface

Built as a portfolio project to demonstrate expertise in LangChain, LLMs, vector databases, and production-grade Python web applications.


🧠 How It Works β€” RAG Pipeline

Medical PDFs  ──▢  load_pdf_file()          ← LangChain DirectoryLoader + PyPDFLoader
                       β”‚
                   filter_to_minimal_docs()  ← strips metadata noise
                       β”‚
                   text_split()             ← RecursiveCharacterTextSplitter (500 tokens, 20 overlap)
                       β”‚
                   download_embedding()     ← sentence-transformers/all-MiniLM-L6-v2 (384d)
                       β”‚
                   PineconeVectorStore      ← stores + indexes embeddings
                       β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”˜
User Query ──▢│  similarity_search (k=3)
              └──▢  create_retrieval_chain()
                       β”‚
                   ChatGoogleGenerativeAI   ← Gemini 2.5 Flash (temp=0.3)
                       β”‚
                   Flask API  /chat         ← JSON response
                       β”‚
                   chat.html UI             ← real-time chat interface

✨ Features

  • πŸ“„ PDF Knowledge Ingestion β€” bulk-load any medical reference PDFs into Pinecone
  • πŸ” Semantic Search β€” cosine similarity retrieval (k=3 most relevant chunks)
  • πŸ€– Gemini 2.5 Flash LLM β€” fast, accurate, grounded answers with source context
  • πŸ›‘οΈ Hallucination Guard β€” explicitly says "I don't know" when context is insufficient
  • 🌐 Flask REST API β€” lightweight /chat endpoint for easy integration
  • πŸ’¬ Chat UI β€” clean HTML/JS chat interface

πŸš€ Quick Start

1. Clone the repository

git clone https://github.com/Ashwin14101/Medical-Chatbot-With-LLMs-LangChain-Pinecone-Flask-AWS.git
cd Medical-Chatbot-With-LLMs-LangChain-Pinecone-Flask-AWS

2. Install dependencies

pip install -r requirement.txt

3. Configure API keys

cp .env.example .env

Edit .env:

PINECONE_API_KEY=your_pinecone_api_key
GOOGLE_API_KEY=your_google_api_key

Get your keys:

4. Add your medical PDFs

Place your PDF files in the data/ directory.

5. Build the vector index (run once)

python store_index.py

This will:

  • Load & chunk your PDFs
  • Generate embeddings with all-MiniLM-L6-v2
  • Create a Pinecone index named medical-bot
  • Upsert all vectors

6. Launch the chatbot

python app.py

Open http://localhost:8080 in your browser.


πŸ“ Project Structure

Medical-Chatbot-With-LLMs-LangChain-Pinecone-Flask-AWS/
β”œβ”€β”€ app.py                  # Original Flask app (basic version)
β”œβ”€β”€ app1.py                 # Updated Flask app with JSON API + improved comments
β”œβ”€β”€ store_index.py          # One-time PDF ingestion + Pinecone indexing script
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ helper.py           # PDF loader, text splitter, embedding model
β”‚   └── prompt.py           # System prompt for the medical assistant
β”œβ”€β”€ templates/
β”‚   └── chat.html           # Frontend chat UI
β”œβ”€β”€ Static/                 # CSS / JS assets
β”œβ”€β”€ data/                   # Place your medical PDFs here (gitignored)
β”œβ”€β”€ research/
β”‚   └── trials.ipynb        # Jupyter notebook for experimentation
β”œβ”€β”€ requirement.txt         # Python dependencies
β”œβ”€β”€ setup.py                # Package setup
β”œβ”€β”€ template.sh             # Project scaffolding script
β”œβ”€β”€ .env.example            # Environment variable template
└── README.md

πŸ› οΈ Tech Stack

Layer Technology
Backend Flask 3.x
LLM Google Gemini 2.5 Flash via langchain-google-genai
Vector DB Pinecone (Serverless, AWS us-east-1, cosine, dim=384)
Embeddings sentence-transformers/all-MiniLM-L6-v2 (HuggingFace)
RAG Framework LangChain (create_retrieval_chain + create_stuff_documents_chain)
PDF Parsing LangChain PyPDFLoader + DirectoryLoader

βš™οΈ Configuration

Parameter Value Notes
Chunk size 500 tokens Balances context vs. retrieval precision
Chunk overlap 20 tokens Prevents information loss at boundaries
Retrieval k 3 Top-3 most similar chunks per query
LLM temperature 0.3 Low temperature for factual, deterministic answers
Embedding dim 384 Matches all-MiniLM-L6-v2 output size

🀝 Contributing

Pull requests are welcome! Please open an issue first for major changes.


πŸ‘€ Author

Ashwin Kotha


πŸ“„ License

MIT Β© Ashwin14101

About

πŸ₯ RAG-powered medical chatbot using LangChain, Google Gemini 2.5 Flash, Pinecone Vector DB & Flask. Answers clinical questions from ingested medical PDFs with hallucination-resistant, context-grounded responses.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors