Multimodal RAG System

A production-ready multimodal RAG system for document search with advanced natural language understanding. Built with FastAPI, React, Meilisearch, and Temporal for scalable document processing and semantic search.

Key Features:

🔍 Hybrid search (keyword + semantic) across multilingual documents
🧠 Natural language queries with NER and geographical translation
📄 XML document processing (AFP/IPTC NewsML-G2 format)
⚡ Async workflow orchestration with Temporal
🌐 Multilingual support with language-specific indexes
📊 Distributed tracing and observability (Jaeger + OpenTelemetry)

🏗️ Architecture

The system follows a services-based architecture optimized for GPU workloads and scalability:

┌─────────────────────────────────────────────────────────────┐
│                        Frontend (React)                      │
│                     http://localhost:5173                    │
└───────────────────────────┬─────────────────────────────────┘
                            │ REST API
┌───────────────────────────▼─────────────────────────────────┐
│               Main API (FastAPI + Docker)                    │
│                   http://localhost:5050                      │
├──────────────────────────────────────────────────────────────┤
│  • Document Upload & Processing                              │
│  • Natural Language Search (NER, Query Extraction)           │
│  • Hybrid Search (Keyword + Semantic)                        │
│  • Temporal Workflow Orchestration                           │
└─────────┬────────────────┬────────────────┬─────────────────┘
          │                │                │
          │                │                ▼
          │                │     ┌────────────────────────┐
          │                │     │  GPU Services (Host)   │
          │                │     │  Port 8001             │
          │                │     ├────────────────────────┤
          │                │     │  • Speech-to-Text      │
          │                │     │    (Whisper)           │
          │                │     └────────────────────────┘
          │                │
          ▼                ▼
┌──────────────────┐  ┌──────────────────────┐
│   Meilisearch    │  │  LiteLLM Proxy       │
│   Port 7700      │  │  Port 4000           │
├──────────────────┤  ├──────────────────────┤
│  • Full-text     │  │  • Unified LLM API   │
│  • Vector Search │  │  • OpenAI/Ollama     │
│  • Hybrid Search │  │  • Model Switching   │
└──────────────────┘  └──────────────────────┘

┌────────────────────────────────────────────────────┐
│            Background Services (Docker)            │
├────────────────────────────────────────────────────┤
│  • Temporal Worker (Document Processing)           │
│  • Temporal Server (Workflow Orchestration)        │
│  • PostgreSQL (Temporal Persistence)               │
│  • Redis (LiteLLM Cache)                           │
└────────────────────────────────────────────────────┘

Key Components

Main API (`multimodal-rag-api`)

Language: Python + FastAPI
Deployment: Docker container
Responsibilities:
- REST API endpoints for document management and search
- Natural language query processing with NER (Named Entity Recognition)
- Multilingual geographical entity translation
- Integration with Meilisearch for hybrid search
- Temporal workflow orchestration

GPU services

Deployment: Runs on host machine with GPU access
Current Services:
- Speech-to-Text (STT): Whisper model for audio transcription (port 8001)
Future Services:
- Text Embeddings
- Image Embeddings
- Advanced NER models

Frontend

Framework: React + Vite
Features:
- Document upload with metadata support
- Multi-mode search (keyword, semantic, hybrid)
- Natural language queries
- Real-time search results with highlighting

Infrastructure Services

Meilisearch: High-performance search engine with vector support
LiteLLM: Unified proxy for switching between OpenAI and Ollama models
Temporal: Workflow orchestration for async document processing
PostgreSQL: Persistence for Temporal workflows
Redis: Caching layer for LiteLLM
Jaeger: Distributed tracing with OpenTelemetry for observability

Design Principles

GPU Optimization: GPU-intensive models run as host services for direct hardware access (Mac M1/M2/M3, NVIDIA CUDA)
Services: Independent scaling and deployment of components
Containerization: Docker for reproducible environments (except GPU services)
Workflow Orchestration: Temporal for reliable async processing with retries
Unified LLM API: LiteLLM proxy for seamless model switching (local ↔ cloud)

🚀 Quick Start

Prerequisites

Requirement	Version	Purpose
Docker & Docker Compose	Latest	Main services (API, Meilisearch, Temporal)
Python	3.11+	GPU services (optional, for audio transcription)
Node.js	18+	Frontend development
Make	Any	Convenience commands
GPU (Optional)	Mac M1/M2/M3 or NVIDIA CUDA	Speech-to-text service

1. Environment Setup

# Clone the repository
git clone <repository-url>
cd multimodal-rag

# Create backend environment file
cp backend/multimodal_rag_api/.env.example backend/multimodal_rag_api/.env

# Create frontend environment file
cp frontend/.env.example frontend/.env

# Edit backend/.env and add your API keys:
# - OPENAI_API_KEY (for embeddings via OpenAI)
# - VOYAGE_API_KEY (for multimodal embeddings)
# - Or configure Ollama for local models (see .env.example)

See Configuration section for detailed environment variable setup.

2. Install GPU Dependencies (Optional - For Audio Transcription)

If you want to use the Speech-to-Text feature:

# Install STT service dependencies
make gpu-install

3. Start the Services

Option A: Full Stack with GPU Services

# Start GPU services + all Docker services
make dev-full

Option B: Start Services Separately

# Start Docker services only (no audio transcription)
make dev

# (Optional) In another terminal, start GPU services
make gpu-start

This starts:

Backend API: http://localhost:5050 (API docs at /docs)
Meilisearch: http://localhost:7700
Meilisearch UI: http://localhost:24900
LiteLLM Proxy: http://localhost:4000
Temporal UI: http://localhost:8080
Jaeger UI: http://localhost:16686 (distributed tracing)
GPU Services (if started): STT on port 8001

4. Start Frontend

# Start frontend development server
make front-dev

Access the application at http://localhost:5173

👨‍💻 Developer Onboarding

First Time Setup

Start all services: make dev (skips audio features, faster start)
Verify services are running:
- ✅ Backend API: http://localhost:5050/docs (Swagger UI)
- ✅ Meilisearch UI: http://localhost:24900 (inspect indexes)
- ✅ Temporal UI: http://localhost:8080 (workflow monitoring)
- ✅ Jaeger UI: http://localhost:16686 (distributed tracing)
Start frontend: make front-dev
Test document upload:
- Place a test XML file in ./data/test.xml
- Upload via UI: file path = /app/data/test.xml
- Monitor workflow in Temporal UI
- Search for content once indexed

Understanding the Codebase

Core files to read first (in order):

API Routes: backend/.../api/controller/routes.py
- All REST endpoints with detailed docstrings
- Start here to understand API surface
Document Processing: backend/.../api/services/document_processor.py
- How XML documents are parsed and chunked
- Embedding generation and indexing flow
Search Pipeline: backend/.../api/services/nl_search/pipeline/
- ner_step.py: Entity extraction (countries, cities, dates)
- filter_builder_step.py: Meilisearch filter generation
- Natural language query → structured filters
Workflows: backend/.../temporal_worker/workflows.py
- Document ingestion orchestration
- Retry policies and error handling
- Batch processing logic
Configuration: docker-compose.yml
- Service dependencies and architecture
- Port mappings and environment variables
- Well-commented for easy understanding

Common Development Tasks

Add a New API Endpoint

Add route function in api/controller/routes.py
Add service logic in api/services/
Test at http://localhost:5050/docs

Modify Document Processing

Edit api/services/document_processor.py
Restart Temporal worker: docker restart temporal-worker
Test by uploading a document

Change Search Behavior

Modify api/services/meilisearch/client.py or NL search pipeline
API auto-reloads (Docker watch mode enabled)
Test via frontend or Swagger UI

Add a New Entity Type for NL Search

Add entity type to nl_search/pipeline/types.py
Update NER in nl_search/pipeline/ner_step.py
Update filter builder in filter_builder_step.py
Add field mapping in nl_search/schema.py

Debug a Workflow

Open Temporal UI: http://localhost:8080
Find workflow by ID (returned from upload API)
View execution history, inputs, outputs
Check Jaeger for distributed trace

Project Architecture Patterns

Factory Pattern: get_text_embedding_client(), get_meilisearch_client()
Pipeline Pattern: NL search with composable steps
Dependency Injection: FastAPI Depends() for services
Async/Await: Throughout for I/O-bound operations
Pydantic Models: Type-safe data validation
Retry Policies: Temporal for resilient async processing

Testing Your Changes

# 1. Integration test via UI
make dev && make front-dev
# Upload document, search, verify results

# 2. API test via Swagger
# Visit http://localhost:5050/docs
# Try /search, /documents/upload endpoints

# 3. Check workflow execution
# Visit http://localhost:8080
# Verify no failed workflows

# 4. View traces
# Visit http://localhost:16686
# Check end-to-end request flow

Getting Help

Code documentation: All critical files have comprehensive docstrings
API docs: http://localhost:5050/docs (interactive, try endpoints)
Workflow debugging: http://localhost:8080 (execution history)
Service logs: docker-compose logs -f [service-name]
External docs: Links in Additional Resources

🌐 Service URLs

Service	URL	Description
Frontend	http://localhost:5173	React web application
Backend API	http://localhost:5050	FastAPI REST endpoints
API Docs	http://localhost:5050/docs	Interactive Swagger UI
Meilisearch	http://localhost:7700	Search engine API
Meilisearch UI	http://localhost:24900	Search index dashboard
LiteLLM Proxy	http://localhost:4000	Unified LLM API
Temporal UI	http://localhost:8080	Workflow monitoring
Jaeger UI	http://localhost:16686	Distributed tracing & observability
STT Service	http://localhost:8001	Speech-to-text API

⚙️ Configuration

Environment Variables Quick Reference

Backend (backend/multimodal_rag_api/.env):

Category	Variable	Default	Description
Core Services
	`TEMPORAL_HOST`	`temporal:7233`	Workflow orchestration server
	`MEILISEARCH_URL`	`http://meilisearch:7700`	Search engine URL
	`MEILISEARCH_API_KEY`	`masterKey`	⚠️ Change in production!
	`LITELLM_HOST`	`http://host.docker.internal:4000`	LLM proxy for embeddings
Text Embeddings
	`TEXT_EMBEDDING_MODEL`	`ollama/qwen3-embedding:0.6b`	Local Ollama or `openai/text-embedding-3-small`
	`TEXT_EMBEDDING_DIMENSIONS`	`512`	Vector dimension (512, 768, 1536)
	`TEXT_EMBEDDING_HOST`	`http://host.docker.internal:11434`	Ollama server URL
Image Embeddings
	`IMAGE_EMBEDDING_MODEL`	`voyage/voyage-multimodal-3`	Multimodal model for images
	`IMAGE_EMBEDDING_API_KEY`	Set your Voyage API key	Required for image embeddings
Chat/NL Search
	`CHAT_MODEL`	`ollama/qwen2.5:7b`	Chat model for NL query processing
	`CHAT_HOST`	`http://host.docker.internal:11434`	Ollama server
	`CHAT_TEMPERATURE`	`0.1`	Lower = more deterministic
GPU Services
	`STT_SERVICE_URL`	`http://host.docker.internal:8001`	Speech-to-text service
Observability
	`LANGFUSE_PUBLIC_KEY`	(optional)	LLM observability platform
	`LANGFUSE_SECRET_KEY`	(optional)	For production monitoring

Frontend (frontend/.env):

Variable	Default	Description
`VITE_API_BASE_URL`	`http://localhost:5050`	Backend API URL
`VITE_API_PREFIX`	`/multimodal-rag`	API route prefix
`VITE_ENABLE_UPLOAD`	`true`	Show document upload UI
`VITE_ENABLE_HEALTH_CHECK`	`true`	Show system health status

Configuration Scenarios

Scenario 1: Local Development (Ollama - No API Keys Needed)

# Use local Ollama for embeddings and chat
TEXT_EMBEDDING_MODEL=ollama/qwen3-embedding:0.6b
TEXT_EMBEDDING_HOST=http://host.docker.internal:11434
CHAT_MODEL=ollama/qwen2.5:7b
CHAT_HOST=http://host.docker.internal:11434

# No API keys required!

Setup Ollama:

# Install Ollama: https://ollama.com
ollama pull qwen3-embedding:0.6b
ollama pull qwen2.5:7b
ollama serve  # Keep running in background

Scenario 2: Cloud APIs (OpenAI)

# Use OpenAI for embeddings
TEXT_EMBEDDING_MODEL=openai/text-embedding-3-small
TEXT_EMBEDDING_DIMENSIONS=1536
OPENAI_API_KEY=sk-your-key-here

# Use OpenAI for chat
CHAT_MODEL=openai/gpt-4o-mini
OPENAI_API_KEY=sk-your-key-here

Scenario 3: Mixed (Local + Cloud)

# Local Ollama for embeddings (free, private)
TEXT_EMBEDDING_MODEL=ollama/qwen3-embedding:0.6b
TEXT_EMBEDDING_HOST=http://host.docker.internal:11434

# Cloud for multimodal images (Voyage)
IMAGE_EMBEDDING_MODEL=voyage/voyage-multimodal-3
VOYAGE_API_KEY=pa-your-voyage-key

# Cloud for chat (better quality)
CHAT_MODEL=openai/gpt-4o-mini
OPENAI_API_KEY=sk-your-key-here

Important Configuration Notes

host.docker.internal: Docker's special DNS name to access services running on the host machine (like Ollama). Use this to connect from Docker containers to host services.
LiteLLM Proxy: All embedding and chat requests go through LiteLLM proxy (litellm_config.yaml) which handles model routing and caching. Modify this file to add new providers.
Production Security:
- ⚠️ Change MEILISEARCH_API_KEY from masterKey
- ⚠️ Use secrets manager, not .env files
- ⚠️ Enable HTTPS with reverse proxy
- ⚠️ Set resource limits in docker-compose.yml
Vector Dimensions: Must match between:
- TEXT_EMBEDDING_DIMENSIONS in .env
- Meilisearch embedder configuration
- Model's native dimension (truncate if supported)

📖 Usage Guide

Document Upload

Place documents in ./data/ directory (auto-mounted to Docker containers)
Navigate to the Upload tab in the frontend
Enter file paths relative to /app/data/ (e.g., data/document.pdf)
Optionally add metadata in JSON format (e.g., {"category": "research"})
Click "Upload Documents" to start processing
Monitor workflow status in Temporal UI at http://localhost:8080

Supported Formats: PDF, images (JPEG, PNG), audio files (with STT service)

Natural Language Search

The system supports advanced natural language queries with:

Named Entity Recognition (NER): Detects countries, cities, dates, organizations
Geographical Translation: Translates place names to multiple languages
Date Parsing: Understands relative dates ("last month", "yesterday")
Filter Generation: Converts entities to Meilisearch filters

Examples:

"documents from France last month"
"reports about Paris from 2024"
"contracts with USA companies"

Search Modes

Keyword Search: Traditional full-text search
Semantic Search: AI-powered vector similarity search
Hybrid Search: Combines keyword + semantic (adjustable ratio)

API Usage

# Health check
curl http://localhost:5050/multimodal-rag/health

# Natural language search
curl -X POST "http://localhost:5050/multimodal-rag/search" \
  -H "Content-Type: application/json" \
  -d '{
    "q": "solar panels from France",
    "search_type": "hybrid",
    "federated": true,
    "semanticRatio": 0.8
  }'

# Upload documents
curl -X POST "http://localhost:5050/multimodal-rag/documents/upload" \
  -H "Content-Type: application/json" \
  -d '{
    "file_paths": ["data/document.pdf"],
    "metadata": {"category": "research"},
    "batch_size": 100
  }'

# Check workflow status
curl http://localhost:5050/multimodal-rag/jobs/{workflow_id}/status

📁 Project Structure

multimodal-rag/
├── backend/
│   ├── multimodal_rag_api/           # Main API package (Docker)
│   │   ├── src/multimodal_rag_api/
│   │   │   ├── api/                  # REST endpoints & services
│   │   │   │   ├── controller/       # API routes
│   │   │   │   └── services/         # Business logic
│   │   │   │       ├── chunking.py
│   │   │   │       ├── document_processor.py
│   │   │   │       ├── embedding/    # Embedding services
│   │   │   │       ├── meilisearch/  # Search integration
│   │   │   │       ├── nl_search/    # NL query processing
│   │   │   │       │   └── pipeline/ # NER, translation, filters
│   │   │   │       └── stt/          # STT client
│   │   │   ├── temporal_worker/      # Workflow definitions
│   │   │   ├── meilisearch_utils/    # Search utilities
│   │   │   └── models/               # Data models & settings
│   │   ├── pyproject.toml            # Package dependencies
│   │   ├── docker-compose.yml        # Full stack orchestration
│   │   ├── Dockerfile
│   │   └── .env.example
│   │
│   ├── gpu_services/                 # GPU services (Host)
│   │   ├── stt_service/              # Speech-to-Text
│   │   │   ├── src/stt_service/
│   │   │   │   ├── api.py
│   │   │   │   ├── backend/          # Whisper backends
│   │   │   │   └── main.py
│   │   │   └── pyproject.toml
│   │   ├── start-services.sh         # Service manager
│   │   └── logs/
│   │
│   └── .env.gpu-services.example     # GPU config reference
│
├── frontend/                         # React application
│   ├── src/
│   │   ├── components/               # UI components
│   │   ├── services/                 # API client
│   │   └── App.jsx
│   ├── package.json
│   └── .env.example
│
├── data/                             # Document storage (mounted)
├── config/                           # Configuration files
├── Makefile                          # Development commands
└── README.md

🛠️ Development Commands

# Display all commands
make help

# Docker Services
make dev              # Start Docker services only
make dev-full         # Start GPU + Docker services

# GPU Services
make gpu-install      # Install GPU dependencies (first time)
make gpu-start        # Start GPU services
make gpu-stop         # Stop GPU services
make gpu-status       # Check GPU services status
make gpu-logs         # View GPU services logs (real-time)

# Frontend
make front-dev        # Start development server (port 5173)
make front            # Build & preview production (port 4173)

🐛 Troubleshooting

Common Issues

Port conflicts: Ensure ports 5050, 5173, 7700, 8080, 4000, 8001 are available
Docker issues: Run docker-compose down -v for full reset
GPU services: Check logs with make gpu-logs
Ollama connection: Ensure Ollama is running on host (ollama serve)
File permissions: Verify ./data/ directory is accessible

Service Health Checks

# Backend
curl http://localhost:5050/multimodal-rag/health

# Meilisearch
curl http://localhost:7700/health

# LiteLLM
curl http://localhost:4000/health

# STT Service (if running)
curl http://localhost:8001/health

View Logs

# All Docker services
docker-compose -f backend/multimodal_rag_api/docker-compose.yml logs -f

# Specific service
docker-compose -f backend/multimodal_rag_api/docker-compose.yml logs -f multimodal-rag-api

# GPU services
make gpu-logs

# Temporal workflows
# Visit http://localhost:8080

Cleanup

# Stop all services
docker-compose -f backend/multimodal_rag_api/docker-compose.yml down
make gpu-stop

# Full reset (removes volumes)
docker-compose -f backend/multimodal_rag_api/docker-compose.yml down -v

# Remove Meilisearch data
rm -rf backend/multimodal_rag_api/.meili_data

🧪 Testing

Integration Testing

Start services: make dev-full
Start frontend: make front-dev
Upload a test document via UI
Monitor workflow in Temporal UI (http://localhost:8080)
Perform natural language search
Verify results with proper filtering and highlighting

API Testing

Interactive API documentation at http://localhost:5050/docs

Monitoring Tools

Temporal UI: http://localhost:8080 (workflow execution, history, debugging)
Meilisearch UI: http://localhost:24900 (indexes, document counts, search testing)
Jaeger UI: http://localhost:16686 (distributed tracing, performance analysis)
Langfuse (optional): https://cloud.langfuse.com (LLM observability)

🚀 Production Deployment

Deployment Scenarios

Scenario 1: Local Development (Mac with GPU)

# GPU services on host, main API in Docker
STT_SERVICE_URL=http://host.docker.internal:8001
NL_SEARCH_OLLAMA_BASE_URL=http://host.docker.internal:11434/v1

Scenario 2: Cloud GPU (Production)

# GPU services on separate instance
STT_SERVICE_URL=https://gpu-server.example.com:8001
NL_SEARCH_PROVIDER=openai
OPENAI_API_KEY=sk-...

Scenario 3: Kubernetes

Deploy main API, Meilisearch, Temporal as pods
Run GPU services on GPU-enabled nodes
Use service discovery for inter-service communication

Production Checklist

Replace masterKey with secure Meilisearch API key
Configure production database for Temporal (not SQLite)
Set up proper secrets management (not .env files)
Enable HTTPS with reverse proxy (nginx, Caddy)
Configure resource limits in docker-compose.yml
Set up monitoring and alerting
Configure backup strategy for Meilisearch data
Use production-grade LLM API keys with rate limits

Frontend Production Build

# Build and preview
make front

# Build only
cd frontend && npm run build

# Serve with nginx, Caddy, or any static file server

🔗 Additional Resources

Meilisearch: https://www.meilisearch.com/docs/
Temporal: https://docs.temporal.io/
LiteLLM: https://docs.litellm.ai/
FastAPI: https://fastapi.tiangolo.com/
React: https://react.dev/
Ollama: https://ollama.com/

🏛️ Key Architectural Decisions

Understanding the "why" behind design choices helps make better decisions when extending the system:

Why Language-Specific Meilisearch Indexes?

Decision: Separate indexes per language (documents-en, documents-es, etc.)

Why:

Better tokenization (language-specific word splitting, stemming)
Language-specific stopwords ("the", "and" in English vs "le", "et" in French)
Easier to tune relevance per language
Cleaner federated search results (merge via guid deduplication)

Alternative considered: Single multilingual index with language field

❌ Worse search quality (generic tokenization)
❌ Can't optimize per language

Why GPU Services Run on Host (Not Docker)?

Decision: STT service runs directly on host machine, not in Docker container

Why:

Direct GPU access (MPS on Mac, CUDA on NVIDIA)
Avoid Docker GPU passthrough complexity
Better performance (no virtualization overhead)
Simpler debugging (native Python environment)

Trade-off: Less portable, requires host setup

✅ Worth it for 2-3x performance gain

Why Temporal for Document Processing?

Decision: Use Temporal workflows instead of simple async tasks

Why:

Reliability: Automatic retries with exponential backoff
Visibility: Track execution history, debug failures
Scalability: Process thousands of documents with continue-as-new
State persistence: Workflows survive crashes and restarts
Batching: Control concurrency (max_in_flight_documents)

Alternative considered: Celery, plain async tasks

❌ Manual retry logic, no execution history, harder to debug

Why LiteLLM Proxy?

Decision: Route all LLM/embedding calls through LiteLLM proxy

Why:

Provider switching: Change from Ollama → OpenAI without code changes
Caching: Redis cache saves API costs (30-70% hit rate)
Unified API: One interface for OpenAI, Cohere, Anthropic, Ollama, etc.
Observability: Built-in tracing and cost tracking

Alternative considered: Direct API calls per provider

❌ Provider lock-in, no caching, inconsistent APIs

Why Chunking Documents?

Decision: Split large documents into smaller chunks (1000-2000 characters)

Why:

Token limits: Embedding models have max input size (512-8192 tokens)
Search precision: Match specific passages, not entire documents
Better highlighting: Show relevant excerpts to users
Reduced noise: Avoid diluting relevance with irrelevant content

How it works: Hierarchical splitting (paragraphs → sentences → characters) with overlap

Why Gazetteer-First NER?

Decision: Use dictionary matching before ML models for entity extraction

Why:

Speed: 10-100x faster than ML models
Accuracy: 99%+ for known entities (countries, cities)
No hallucination: Exact matches only
Offline: No API calls needed
ML fallback: Available for ambiguous cases

Strategy: Gazetteer → Country codes → Dates → ML model (only for uncovered text)

Why Matryoshka Embeddings?

Decision: Support truncating embeddings to smaller dimensions

Why:

Storage: 512D uses 4x less space than 2048D
Speed: Faster similarity search (fewer dimensions)
Flexibility: Tune storage/accuracy trade-off per use case
No retraining: Models like nomic-embed support this natively

Example: 2048D → 512D with minimal quality loss (~2-3% accuracy drop)

🤝 Contributing

Before contributing, please:

Read the Developer Onboarding section
Understand the Key Architectural Decisions
Check code documentation (all critical files have comprehensive docstrings)

Contribution workflow:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes with clear commit messages
Add docstrings to new functions (follow existing patterns)
Test with make dev and make front-dev
Verify workflows in Temporal UI (no failed executions)
Check Jaeger traces for performance issues
Push to your branch and open a Pull Request

Code style:

Follow existing patterns (factory functions, dependency injection, async/await)
Add type hints (Pydantic models preferred)
Write docstrings for public functions (see routes.py for examples)
Use descriptive variable names
Keep functions focused (single responsibility)

📄 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
backend		backend
config		config
frontend		frontend
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENCE		LICENCE
Makefile		Makefile
README.md		README.md

License

QuivrHQ/multimodal-rag

Folders and files

Latest commit

History

Repository files navigation

Multimodal RAG System

🏗️ Architecture

Key Components

Main API (multimodal-rag-api)

GPU services

Frontend

Infrastructure Services

Design Principles

🚀 Quick Start

Prerequisites

1. Environment Setup

2. Install GPU Dependencies (Optional - For Audio Transcription)

3. Start the Services

Option A: Full Stack with GPU Services

Option B: Start Services Separately

4. Start Frontend

👨‍💻 Developer Onboarding

First Time Setup

Understanding the Codebase

Common Development Tasks

Add a New API Endpoint

Modify Document Processing

Change Search Behavior

Add a New Entity Type for NL Search

Debug a Workflow

Project Architecture Patterns

Testing Your Changes

Getting Help

🌐 Service URLs

⚙️ Configuration

Environment Variables Quick Reference

Configuration Scenarios

Scenario 1: Local Development (Ollama - No API Keys Needed)

Scenario 2: Cloud APIs (OpenAI)

Scenario 3: Mixed (Local + Cloud)

Important Configuration Notes

📖 Usage Guide

Document Upload

Natural Language Search

Search Modes

API Usage

📁 Project Structure

🛠️ Development Commands

🐛 Troubleshooting

Common Issues

Service Health Checks

View Logs

Cleanup

🧪 Testing

Integration Testing

API Testing

Monitoring Tools

🚀 Production Deployment

Deployment Scenarios

Scenario 1: Local Development (Mac with GPU)

Scenario 2: Cloud GPU (Production)

Scenario 3: Kubernetes

Production Checklist

Frontend Production Build

🔗 Additional Resources

🏛️ Key Architectural Decisions

Why Language-Specific Meilisearch Indexes?

Why GPU Services Run on Host (Not Docker)?

Why Temporal for Document Processing?

Why LiteLLM Proxy?

Why Chunking Documents?

Why Gazetteer-First NER?

Why Matryoshka Embeddings?

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Main API (`multimodal-rag-api`)

Packages