| icon |
|---|
lucide/container |
Universal Docker setup that works on any platform with Docker support.
Warning
Important Limitations
- macOS: Docker does not support GPU acceleration. For 10x better performance, use macOS native setup
- Linux: Requires NVIDIA Container Toolkit for GPU acceleration
- Docker and Docker Compose installed
- At least 8GB RAM available for Docker
- 10GB free disk space
- For GPU: NVIDIA Container Toolkit (installation guide)
-
Start all services with GPU acceleration:
docker compose -f docker/docker-compose.yml --profile cuda up
Or for CPU-only:
docker compose -f docker/docker-compose.yml --profile cpu up
-
Check if services are running:
docker compose -f docker/docker-compose.yml logs
-
Install agent-cli:
uv tool install agent-cli -p 3.13 # or: pip install agent-cli -
Test the setup:
agent-cli autocorrect "this has an eror"
The Docker setup provides:
| Service | Image | Port | Purpose |
|---|---|---|---|
| whisper | agent-cli-whisper (custom) | 10300/10301 | Speech-to-text (Faster Whisper) |
| tts | agent-cli-tts (custom) | 10200/10201 | Text-to-speech (Kokoro/Piper) |
| transcribe-proxy | agent-cli-transcribe-proxy | 61337 | ASR proxy for iOS/external apps |
| rag-proxy | agent-cli-rag-proxy | 8000 | Document-aware chat (RAG) |
| memory-proxy | agent-cli-memory-proxy | 8100 | Long-term memory chat |
| ollama | ollama/ollama | 11434 | LLM server |
| openwakeword | rhasspy/wyoming-openwakeword | 10400 | Wake word detection |
# Whisper ASR
WHISPER_MODEL=large-v3 # Model: tiny, base, small, medium, large-v3
WHISPER_TTL=300 # Seconds before unloading idle model
# TTS
TTS_MODEL=kokoro # For CUDA: kokoro, For CPU: en_US-lessac-medium
TTS_BACKEND=kokoro # Backend: kokoro (GPU), piper (CPU)
TTS_TTL=300 # Seconds before unloading idle model
# Transcription Proxy
PROXY_PORT=61337 # Port for transcription proxy
ASR_PROVIDER=wyoming # ASR provider: wyoming, openai, gemini
ASR_WYOMING_IP=whisper # Wyoming server hostname (container name in compose)
ASR_WYOMING_PORT=10300 # Wyoming server port
LLM_PROVIDER=ollama # LLM provider: ollama, openai, gemini
LLM_OLLAMA_MODEL=gemma3:4b # Ollama model name
LLM_OLLAMA_HOST=http://ollama:11434 # Ollama server URL (container name)
LLM_OPENAI_MODEL=gpt-4.1-nano # OpenAI model (if using openai provider)
OPENAI_API_KEY=sk-... # OpenAI API key (if using openai provider)
# RAG Proxy
RAG_PORT=8000 # Port for RAG proxy
RAG_LIMIT=3 # Number of document chunks per query
RAG_ENABLE_TOOLS=true # Enable read_full_document tool
EMBEDDING_MODEL=text-embedding-3-small # Embedding model for RAG/memory
# Memory Proxy
MEMORY_PORT=8100 # Port for memory proxy
MEMORY_TOP_K=5 # Number of memories per query
MEMORY_MAX_ENTRIES=500 # Max entries per conversation before eviction
MEMORY_SUMMARIZATION=true # Enable fact extraction from conversations
MEMORY_GIT_VERSIONING=true # Enable git versioning for memory changesThe CUDA profile automatically enables GPU for Whisper and TTS. For Ollama GPU support, edit the compose file and uncomment the deploy section under the ollama service.
# Start services in background
docker compose -f docker/docker-compose.yml --profile cuda up -d
# Stop services
docker compose -f docker/docker-compose.yml --profile cuda down
# View logs
docker compose -f docker/docker-compose.yml logs -f
# Rebuild from source
docker compose -f docker/docker-compose.yml --profile cuda up --buildServices store data in Docker volumes:
agent-cli-whisper-cache- Whisper modelsagent-cli-tts-cache- TTS models and voicesagent-cli-ollama-data- Ollama modelsagent-cli-openwakeword-data- Wake word modelsagent-cli-rag-docs- Documents to index for RAGagent-cli-rag-db- RAG vector database (ChromaDB)agent-cli-rag-cache- RAG embedding modelsagent-cli-memory-data- Memory entries and vector indexagent-cli-memory-cache- Memory embedding models
| Port | Service | Protocol |
|---|---|---|
| 8000 | RAG Proxy | HTTP API |
| 8100 | Memory Proxy | HTTP API |
| 10200 | TTS | Wyoming |
| 10201 | TTS | HTTP API |
| 10300 | Whisper | Wyoming |
| 10301 | Whisper | HTTP API |
| 10400 | OpenWakeWord | Wyoming |
| 11434 | Ollama | HTTP API |
| 61337 | Transcription Proxy | HTTP API |
For better performance, consider platform-specific native installation:
- macOS Native Setup - Metal GPU acceleration
- Linux Native Setup - NVIDIA GPU acceleration