A production-ready Retrieval-Augmented Generation (RAG) medical chatbot that answers clinical questions by retrieving relevant context from a curated medical knowledge base, powered by Google Gemini and Pinecone vector search.
This project is a full-stack AI medical chatbot built using state-of-the-art LLM and vector database technologies. It ingests medical reference PDFs, indexes them into a Pinecone vector store, and uses Google Gemini 2.5 Flash to answer clinical questions with context-grounded, hallucination-resistant responses.
The system follows a clean RAG (Retrieval-Augmented Generation) architecture:
- Ingest: Medical PDFs are chunked, embedded, and stored in Pinecone
- Retrieve: User queries are matched against stored vectors using cosine similarity
- Generate: Gemini LLM synthesizes a concise, grounded answer using the retrieved context
- Serve: A Flask REST API and HTML chat UI provide a complete user interface
Built as a portfolio project to demonstrate expertise in LangChain, LLMs, vector databases, and production-grade Python web applications.
Medical PDFs βββΆ load_pdf_file() β LangChain DirectoryLoader + PyPDFLoader
β
filter_to_minimal_docs() β strips metadata noise
β
text_split() β RecursiveCharacterTextSplitter (500 tokens, 20 overlap)
β
download_embedding() β sentence-transformers/all-MiniLM-L6-v2 (384d)
β
PineconeVectorStore β stores + indexes embeddings
β
ββββββββββ
User Query βββΆβ similarity_search (k=3)
ββββΆ create_retrieval_chain()
β
ChatGoogleGenerativeAI β Gemini 2.5 Flash (temp=0.3)
β
Flask API /chat β JSON response
β
chat.html UI β real-time chat interface
- π PDF Knowledge Ingestion β bulk-load any medical reference PDFs into Pinecone
- π Semantic Search β cosine similarity retrieval (k=3 most relevant chunks)
- π€ Gemini 2.5 Flash LLM β fast, accurate, grounded answers with source context
- π‘οΈ Hallucination Guard β explicitly says "I don't know" when context is insufficient
- π Flask REST API β lightweight
/chatendpoint for easy integration - π¬ Chat UI β clean HTML/JS chat interface
git clone https://github.com/Ashwin14101/Medical-Chatbot-With-LLMs-LangChain-Pinecone-Flask-AWS.git
cd Medical-Chatbot-With-LLMs-LangChain-Pinecone-Flask-AWSpip install -r requirement.txtcp .env.example .envEdit .env:
PINECONE_API_KEY=your_pinecone_api_key
GOOGLE_API_KEY=your_google_api_keyGet your keys:
- Pinecone β https://app.pinecone.io
- Google AI Studio β https://aistudio.google.com/app/apikey
Place your PDF files in the data/ directory.
python store_index.pyThis will:
- Load & chunk your PDFs
- Generate embeddings with
all-MiniLM-L6-v2 - Create a Pinecone index named
medical-bot - Upsert all vectors
python app.pyOpen http://localhost:8080 in your browser.
Medical-Chatbot-With-LLMs-LangChain-Pinecone-Flask-AWS/
βββ app.py # Original Flask app (basic version)
βββ app1.py # Updated Flask app with JSON API + improved comments
βββ store_index.py # One-time PDF ingestion + Pinecone indexing script
βββ src/
β βββ helper.py # PDF loader, text splitter, embedding model
β βββ prompt.py # System prompt for the medical assistant
βββ templates/
β βββ chat.html # Frontend chat UI
βββ Static/ # CSS / JS assets
βββ data/ # Place your medical PDFs here (gitignored)
βββ research/
β βββ trials.ipynb # Jupyter notebook for experimentation
βββ requirement.txt # Python dependencies
βββ setup.py # Package setup
βββ template.sh # Project scaffolding script
βββ .env.example # Environment variable template
βββ README.md
| Layer | Technology |
|---|---|
| Backend | Flask 3.x |
| LLM | Google Gemini 2.5 Flash via langchain-google-genai |
| Vector DB | Pinecone (Serverless, AWS us-east-1, cosine, dim=384) |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 (HuggingFace) |
| RAG Framework | LangChain (create_retrieval_chain + create_stuff_documents_chain) |
| PDF Parsing | LangChain PyPDFLoader + DirectoryLoader |
| Parameter | Value | Notes |
|---|---|---|
| Chunk size | 500 tokens | Balances context vs. retrieval precision |
| Chunk overlap | 20 tokens | Prevents information loss at boundaries |
| Retrieval k | 3 | Top-3 most similar chunks per query |
| LLM temperature | 0.3 | Low temperature for factual, deterministic answers |
| Embedding dim | 384 | Matches all-MiniLM-L6-v2 output size |
Pull requests are welcome! Please open an issue first for major changes.
Ashwin Kotha
- GitHub: @Ashwin14101
- Project: Medical-Chatbot-With-LLMs-LangChain-Pinecone-Flask-AWS
MIT Β© Ashwin14101