Skip to content

Commit b468427

Browse files
deardarlingoosedearlordylordsmankovsky
authored
feat: local llm support + standalone-script doc/draft (#856)
* feat: local LLM via Ollama + structured output response_format - Add setup script (scripts/setup-local-llm.sh) for one-command Ollama setup Mac: native Metal GPU, Linux: containerized via docker-compose profiles - Add ollama-gpu and ollama-cpu docker-compose profiles for Linux - Add extra_hosts to server/hatchet-worker-llm for host.docker.internal - Pass response_format JSON schema in StructuredOutputWorkflow.extract() enabling grammar-based constrained decoding on Ollama/llama.cpp/vLLM/OpenAI - Update .env.example with Ollama as default LLM option - Add Ollama PRD and local dev setup docs * refactor: move Ollama services to docker-compose.standalone.yml Ollama profiles (ollama-gpu, ollama-cpu) are only for Linux standalone deployment. Mac devs never use them. Separate file keeps the main compose clean and provides a natural home for future standalone services (MinIO, etc.). Linux: docker compose -f docker-compose.yml -f docker-compose.standalone.yml --profile ollama-gpu up -d Mac: docker compose up -d (native Ollama, no standalone file needed) * fix: correct PRD goal (demo/eval, not dev replacement) and processor naming * chore: remove completed PRD, rename setup doc, drop response_format tests - Remove docs/01_ollama.prd.md (implementation complete) - Rename local-dev-setup.md -> standalone-local-setup.md - Remove TestResponseFormat class from test_llm_retry.py * docs: resolve standalone storage step — skip S3 for live-only mode * docs: add TASKS.md for standalone env defaults + setup script work * feat: add unified setup-local-dev.sh for standalone deployment Single script takes fresh clone to working Reflector: Ollama/LLM setup, env file generation (server/.env + www/.env.local), docker compose up, health checks. No Hatchet in standalone — live pipeline is pure Celery. * chore: rename to setup-standalone, remove redundant setup-local-llm.sh * feat: add custom S3 endpoint support + Garage standalone storage Add TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL setting to enable S3-compatible backends (Garage, MinIO). When set, uses path-style addressing and routes all requests to the custom endpoint. When unset, AWS behavior is unchanged. - AwsStorage: accept aws_endpoint_url, pass to all 6 session.client() calls, configure path-style addressing and base_url - Fix 4 direct AwsStorage constructions in Hatchet workflows to pass endpoint_url (would have silently targeted wrong endpoint) - Standalone: add Garage service to docker-compose.standalone.yml, setup script initializes layout/bucket/key and writes credentials - Fix compose_cmd() bug: Mac path was missing standalone yml - garage.toml template with runtime secret generation via openssl * fix: standalone setup — garage config, symlink handling, healthcheck - garage.toml: fix rpc_secret field name (was secret_transmitter), move to top-level per Garage v1.1.0 spec, remove unused [s3_web] - setup-standalone.sh: resolve symlinked .env files before writing, always ensure all standalone-critical vars via env_set, fix garage key create/info syntax (positional arg, not --name), avoid overwriting key secret with "(redacted)" on re-run, use compose_cmd in health check - docker-compose.standalone.yml: fix garage healthcheck (no curl in image, use /garage stats instead) * docs: update standalone md — symlink handling, garage config template * docs: add troubleshooting section + port conflict check in setup script Port conflicts from stale next dev / other worktree processes silently shadow Docker container port mappings, causing env vars to appear ignored. * fix: invalidate transcript query on STATUS websocket event Without this, the processing page never redirects after completion because the redirect logic watches the REST query data, not the WebSocket status state. Cherry-picked from feat-dag-progress (faec509). * fix: local env setup (#855) * Ensure rate limit * Increase nextjs compilation speed * Fix daily no content handling * Simplify daily webhook creation * Fix webhook request validation * feat: add local pyannote file diarization processor (#858) * feat: add local pyannote file diarization processor Enables file diarization without Modal by using pyannote.audio locally. Downloads model bundle from S3 on first use, caches locally, patches config to use local paths. Set DIARIZATION_BACKEND=pyannote to enable. * fix: standalone setup enables pyannote diarization and public mode Replace DIARIZATION_ENABLED=false with DIARIZATION_BACKEND=pyannote so file uploads get speaker diarization out of the box. Add PUBLIC_MODE=true so unauthenticated users can list/browse transcripts. * fix: touch env files before first compose_cmd in standalone setup docker-compose.yml references www/.env.local as env_file, but the setup script only creates it in step 4. compose_cmd calls in step 3 (Garage) fail on a fresh clone when the file doesn't exist yet. * feat: standalone uses self-hosted GPU service for transcription+diarization Replace in-process pyannote approach with self-hosted gpu/self_hosted/ service. Same HTTP API as Modal — just TRANSCRIPT_URL/DIARIZATION_URL point to local container. - Add gpu/self_hosted/Dockerfile.cpu (GPU Dockerfile minus NVIDIA CUDA) - Add S3 model bundle fallback in diarizer.py when HF_TOKEN not set - Add gpu service to docker-compose.standalone.yml with compose env overrides - Fix /browse empty in PUBLIC_MODE (search+list queries filtered out roomless transcripts) - Remove audio_diarization_pyannote.py, file_diarization_pyannote.py and tests - Remove pyannote-audio from server local deps * fix: allow unauthenticated GPU requests when no API key configured OAuth2PasswordBearer with auto_error=True rejects requests without Authorization header before apikey_auth can check if auth is needed. * fix: rename standalone gpu service to cpu to match Dockerfile.cpu usage * docs: add programmatic testing section and fix gpu->cpu naming in setup script/docs - Add "Testing programmatically" section to standalone docs with curl commands for creating transcript, uploading audio, polling status, checking result - Fix setup-standalone.sh to reference `cpu` service (was still `gpu` after rename) - Update all docs references from gpu to cpu service naming --------- Co-authored-by: Igor Loskutov <[email protected]> * Fix websocket disconnect errors * Fix event loop is closed in Celery workers * Allow reprocessing idle multitrack transcripts * feat: add local pyannote file diarization processor Enables file diarization without Modal by using pyannote.audio locally. Downloads model bundle from S3 on first use, caches locally, patches config to use local paths. Set DIARIZATION_BACKEND=pyannote to enable. * feat: standalone uses self-hosted GPU service for transcription+diarization Replace in-process pyannote approach with self-hosted gpu/self_hosted/ service. Same HTTP API as Modal — just TRANSCRIPT_URL/DIARIZATION_URL point to local container. - Add gpu/self_hosted/Dockerfile.cpu (GPU Dockerfile minus NVIDIA CUDA) - Add S3 model bundle fallback in diarizer.py when HF_TOKEN not set - Add gpu service to docker-compose.standalone.yml with compose env overrides - Fix /browse empty in PUBLIC_MODE (search+list queries filtered out roomless transcripts) - Remove audio_diarization_pyannote.py, file_diarization_pyannote.py and tests - Remove pyannote-audio from server local deps * fix: set source_kind to FILE on audio file upload The upload endpoint left source_kind as the default LIVE even when a file was uploaded. Now sets it to FILE when the upload completes. * Add hatchet env vars * fix: improve port conflict detection and ollama model check in standalone setup - Filter OrbStack/Docker Desktop PIDs from port conflict check (false positives on Mac) - Check all infra ports (5432, 6379, 3900, 3903) not just app ports - Fix ollama model detection to match on name column only - Document OrbStack and cross-project port conflicts in troubleshooting * fix: processing page auto-redirect after file upload completes Three fixes for the processing page not redirecting when status becomes "ended": - Add useWebSockets to processing page so it receives STATUS events - Remove OAuth2PasswordBearer from auth_none — broke WebSocket endpoints (500) - Reconnect stale Redis in ws_manager when Celery worker reuses dead event loop * fix: mock Celery broker in idle transcript validation test test_validation_idle_transcript_with_recording_allowed called validate_transcript_for_processing without mocking task_is_scheduled_or_active, which attempts a real Celery broker connection (AMQP port 5672). Other tests in the same file already mock this — apply the same pattern here. * Enable server host mode * Fix webrtc connection * Remove turbopack * fix: standalone GPU service connectivity with host network mode Server runs with network_mode: host and can't resolve Docker service names. Publish cpu port as 8100 on host, point server at localhost:8100. Worker stays on bridge network using cpu:8000. Add dummy TRANSCRIPT_MODAL_API_KEY since OpenAI SDK requires it even for local endpoints. --------- Co-authored-by: Igor Loskutov <[email protected]> Co-authored-by: Sergey Mankovsky <[email protected]>
1 parent cd2255c commit b468427

38 files changed

Lines changed: 1217 additions & 896 deletions

docker-compose.standalone.yml

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Standalone services for fully local deployment (no external dependencies).
2+
# Usage: docker compose -f docker-compose.yml -f docker-compose.standalone.yml up -d
3+
#
4+
# On Linux with NVIDIA GPU, also pass: --profile ollama-gpu
5+
# On Linux without GPU: --profile ollama-cpu
6+
# On Mac: Ollama runs natively (Metal GPU) — no profile needed, services here unused.
7+
8+
services:
9+
garage:
10+
image: dxflrs/garage:v1.1.0
11+
ports:
12+
- "3900:3900" # S3 API
13+
- "3903:3903" # Admin API
14+
volumes:
15+
- garage_data:/var/lib/garage/data
16+
- garage_meta:/var/lib/garage/meta
17+
- ./data/garage.toml:/etc/garage.toml:ro
18+
restart: unless-stopped
19+
healthcheck:
20+
test: ["CMD", "/garage", "stats"]
21+
interval: 10s
22+
timeout: 5s
23+
retries: 5
24+
start_period: 5s
25+
26+
ollama:
27+
image: ollama/ollama:latest
28+
profiles: ["ollama-gpu"]
29+
ports:
30+
- "11434:11434"
31+
volumes:
32+
- ollama_data:/root/.ollama
33+
deploy:
34+
resources:
35+
reservations:
36+
devices:
37+
- driver: nvidia
38+
count: all
39+
capabilities: [gpu]
40+
restart: unless-stopped
41+
healthcheck:
42+
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
43+
interval: 10s
44+
timeout: 5s
45+
retries: 5
46+
47+
ollama-cpu:
48+
image: ollama/ollama:latest
49+
profiles: ["ollama-cpu"]
50+
ports:
51+
- "11434:11434"
52+
volumes:
53+
- ollama_data:/root/.ollama
54+
restart: unless-stopped
55+
healthcheck:
56+
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
57+
interval: 10s
58+
timeout: 5s
59+
retries: 5
60+
61+
# Override server/worker/beat to use self-hosted GPU service for transcription+diarization.
62+
# compose `environment:` overrides values from `env_file:` — no need to edit server/.env.
63+
server:
64+
environment:
65+
TRANSCRIPT_BACKEND: modal
66+
TRANSCRIPT_URL: http://localhost:8100
67+
TRANSCRIPT_MODAL_API_KEY: local
68+
DIARIZATION_BACKEND: modal
69+
DIARIZATION_URL: http://localhost:8100
70+
71+
worker:
72+
environment:
73+
TRANSCRIPT_BACKEND: modal
74+
TRANSCRIPT_URL: http://cpu:8000
75+
TRANSCRIPT_MODAL_API_KEY: local
76+
DIARIZATION_BACKEND: modal
77+
DIARIZATION_URL: http://cpu:8000
78+
79+
cpu:
80+
build:
81+
context: ./gpu/self_hosted
82+
dockerfile: Dockerfile.cpu
83+
ports:
84+
- "8100:8000"
85+
volumes:
86+
- gpu_cache:/root/.cache
87+
restart: unless-stopped
88+
healthcheck:
89+
test: ["CMD", "curl", "-f", "http://localhost:8000/docs"]
90+
interval: 15s
91+
timeout: 5s
92+
retries: 10
93+
start_period: 120s
94+
95+
gpu-nvidia:
96+
build:
97+
context: ./gpu/self_hosted
98+
profiles: ["gpu-nvidia"]
99+
volumes:
100+
- gpu_cache:/root/.cache
101+
deploy:
102+
resources:
103+
reservations:
104+
devices:
105+
- driver: nvidia
106+
count: all
107+
capabilities: [gpu]
108+
restart: unless-stopped
109+
healthcheck:
110+
test: ["CMD", "curl", "-f", "http://localhost:8000/docs"]
111+
interval: 15s
112+
timeout: 5s
113+
retries: 10
114+
start_period: 120s
115+
116+
volumes:
117+
garage_data:
118+
garage_meta:
119+
ollama_data:
120+
gpu_cache:

docker-compose.yml

Lines changed: 33 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,20 @@ services:
22
server:
33
build:
44
context: server
5-
ports:
6-
- 1250:1250
5+
network_mode: host
76
volumes:
87
- ./server/:/app/
98
- /app/.venv
109
env_file:
1110
- ./server/.env
1211
environment:
1312
ENTRYPOINT: server
13+
DATABASE_URL: postgresql+asyncpg://reflector:reflector@localhost:5432/reflector
14+
REDIS_HOST: localhost
15+
CELERY_BROKER_URL: redis://localhost:6379/1
16+
CELERY_RESULT_BACKEND: redis://localhost:6379/1
17+
HATCHET_CLIENT_SERVER_URL: http://localhost:8889
18+
HATCHET_CLIENT_HOST_PORT: localhost:7078
1419

1520
worker:
1621
build:
@@ -22,6 +27,11 @@ services:
2227
- ./server/.env
2328
environment:
2429
ENTRYPOINT: worker
30+
HATCHET_CLIENT_SERVER_URL: http://hatchet:8888
31+
HATCHET_CLIENT_HOST_PORT: hatchet:7077
32+
depends_on:
33+
redis:
34+
condition: service_started
2535

2636
beat:
2737
build:
@@ -33,6 +43,9 @@ services:
3343
- ./server/.env
3444
environment:
3545
ENTRYPOINT: beat
46+
depends_on:
47+
redis:
48+
condition: service_started
3649

3750
hatchet-worker-cpu:
3851
build:
@@ -44,6 +57,8 @@ services:
4457
- ./server/.env
4558
environment:
4659
ENTRYPOINT: hatchet-worker-cpu
60+
HATCHET_CLIENT_SERVER_URL: http://hatchet:8888
61+
HATCHET_CLIENT_HOST_PORT: hatchet:7077
4762
depends_on:
4863
hatchet:
4964
condition: service_healthy
@@ -57,6 +72,8 @@ services:
5772
- ./server/.env
5873
environment:
5974
ENTRYPOINT: hatchet-worker-llm
75+
HATCHET_CLIENT_SERVER_URL: http://hatchet:8888
76+
HATCHET_CLIENT_HOST_PORT: hatchet:7077
6077
depends_on:
6178
hatchet:
6279
condition: service_healthy
@@ -75,10 +92,16 @@ services:
7592
volumes:
7693
- ./www:/app/
7794
- /app/node_modules
95+
- next_cache:/app/.next
7896
env_file:
7997
- ./www/.env.local
8098
environment:
8199
- NODE_ENV=development
100+
- SERVER_API_URL=http://host.docker.internal:1250
101+
extra_hosts:
102+
- "host.docker.internal:host-gateway"
103+
depends_on:
104+
- server
82105

83106
postgres:
84107
image: postgres:17
@@ -94,21 +117,22 @@ services:
94117
- ./server/docker/init-hatchet-db.sql:/docker-entrypoint-initdb.d/init-hatchet-db.sql:ro
95118
healthcheck:
96119
test: ["CMD-SHELL", "pg_isready -d reflector -U reflector"]
97-
interval: 10s
98-
timeout: 10s
99-
retries: 5
100-
start_period: 10s
120+
interval: 5s
121+
timeout: 5s
122+
retries: 10
123+
start_period: 15s
101124

102125
hatchet:
103126
image: ghcr.io/hatchet-dev/hatchet/hatchet-lite:latest
127+
restart: on-failure
104128
ports:
105129
- "8889:8888"
106130
- "7078:7077"
107131
depends_on:
108132
postgres:
109133
condition: service_healthy
110134
environment:
111-
DATABASE_URL: "postgresql://reflector:reflector@postgres:5432/hatchet?sslmode=disable"
135+
DATABASE_URL: "postgresql://reflector:reflector@postgres:5432/hatchet?sslmode=disable&connect_timeout=30"
112136
SERVER_AUTH_COOKIE_DOMAIN: localhost
113137
SERVER_AUTH_COOKIE_INSECURE: "t"
114138
SERVER_GRPC_BIND_ADDRESS: "0.0.0.0"
@@ -128,6 +152,5 @@ services:
128152
retries: 5
129153
start_period: 30s
130154

131-
networks:
132-
default:
133-
attachable: true
155+
volumes:
156+
next_cache:

0 commit comments

Comments
 (0)