Low-latency, interruptible, full-duplex (talk & listen at the same time) voice assistant with a web UI, streaming ASR, TTS, and LLM orchestration. Built for real conversations, barge-in, and hands-free control.
|
|
|
|
- Full-duplex audio: talk and listen simultaneously (barge-in / interruption supported).
- Streaming ASR: incremental transcripts while you speak.
- Streaming TTS: assistant responds with audio before text finishes.
- LLM orchestration: tool use/function calls and stateful dialog.
- Web UI: mic capture, waveforms, and live captions in-browser.
- Production-ready stack: Traefik reverse proxy + auto TLS, Nginx static hosting, FastAPI backend.
- Single command up: deploy with
docker compose up -d.
Browser (Web UI)
├─ Mic capture (WebAudio) → WebSocket → Assistant (FastAPI)
│ │
│ partial transcripts
│ ▼
├─ Live captions ← ASR (streaming via Assistant)
│ │
│ ▼
├─ TTS audio playback ← TTS (streaming chunks)
│ ▲
│ │
└─ Controls/Events → LLM Orchestrator
┌───────────────────────────┐
│ Internet │
└────────────┬──────────────┘
│ :80 / :443
▼
┌─────────────────┐
│ Traefik │
│ (Reverse Proxy) │
└───────┬─────────┘
┌─────────────┼─────────────┐
│ │ │
┌────────▼───┐ ┌─────▼─────┐ ┌──▼────────┐
│ / │ │ /api │ │ /ws │
│ Web UI │ │ Assistant│ │ Assistant │
│ (Nginx) │ │ (FastAPI) │ │ (FastAPI) │
└────────────┘ └───────────┘ └───────────┘
- traefik: reverse proxy, automatic HTTPS via Let’s Encrypt.
- web: static frontend (served by Nginx).
- assistant: FastAPI backend (ASR, TTS, LLM orchestration, WebSockets).
- init_letsencrypt: bootstrap storage for ACME certificates.
- Docker & Docker Compose
- Domain pointing to your server:
com-cloud.cloud - DNS A/AAAA records configured
- API keys for ASR, TTS, and LLM providers
Create `src/assistant/.env` with your secrets:
# LLM / Orchestrator
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
# ASR
ASR_PROVIDER=openai_realtime
ASR_API_KEY=...
# TTS
TTS_PROVIDER=openai_realtime
TTS_API_KEY=...
# CORS / ORIGINS
ALLOWED_ORIGINS=https://com-cloud.cloud
# Optional
LOG_LEVEL=info
cd src/assistant
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
uvicorn assistant.app:app --reload --host 0.0.0.0 --port 8000
cd web
npm install
npm run dev
Open https://com-cloud.cloud
Click on ORB to Connect to establish WebSocket session.
Speak naturally; interrupt the assistant mid-sentence.
Watch live captions, hear real-time TTS playback.
DONT FOTGET TO CLOSE THE TAB!!!
Key options:
ASR: model, language hints, VAD sensitivity.
TTS: voice, speed, sample rate.
LLM: model, temperature, tool schemas.
Traefik: TLS challenge type, timeouts, rate limits.
GET /healthz – service health
WS /ws/asr – audio in ↔ transcript out
WS /ws/assistant – dialog orchestration (events + responses)
WS /ws/tts – text in ↔ audio out
POST /api/tools/<name> – trigger server-side tool functions
HTTPS enforced (TLS via Let’s Encrypt + Traefik).
Strict CORS (limited to https://com-cloud.cloud).
API rate limiting enabled (/api).
Secrets kept in .env (not in frontend).
Reverse proxy: Traefik v3 with ACME TLS challenge.
Certificates stored in ./letsencrypt/acme.json.
Static frontend served by Nginx (web service).
Backend served via assistant (FastAPI) behind Traefik.
Scale with Docker Swarm / k8s if needed.
Wake-word hotword detection
Speaker diarization
Plug-and-play tool registry
Persistent transcripts
Multi-voice TTS
Fork this repo
Create a feature branch
Submit PR with screenshots/logs if UI/backend affected