A production-ready Model Context Protocol (MCP) server that provides completely local, offline speech-to-text capabilities. Optimized for x86_64 production deployment with macOS development support.
- Production Systems - x86_64 Linux servers with full offline capabilities
- Hong Kong Users - No regional restrictions or blocking
- Privacy-Conscious - All processing happens locally
- Cost-Conscious - Zero API costs after initial setup
- n8n Workflows - Direct MCP integration
| Platform | Speech Engine | Offline Mode | Internet Required | Production Ready |
|---|---|---|---|---|
| x86_64 Linux | Vosk + SpeechRecognition | β Full | β No | β Yes |
| ARM64 Linux | SpeechRecognition | β Yes | β Yes | |
| macOS (Dev) | SpeechRecognition | β Yes | π§ Dev Only |
Best Choice: Full offline capabilities with Docker
# 1. Clone repository
git clone https://github.com/michaelyuwh/mcp-speech-to-text.git
cd mcp-speech-to-text
# 2. Deploy with Docker Compose (Automatic platform detection)
docker compose up -d
# 3. Verify deployment
./scripts/test-deployment.sh
# 4. Check status
docker compose ps
docker compose logs -f mcp-speech-to-textFor Development and Testing
# 1. Clone and setup
git clone https://github.com/michaelyuwh/mcp-speech-to-text.git
cd mcp-speech-to-text
# 2. Install with uv (recommended for macOS)
uv sync
uv run python -c "from src.mcp_speech_to_text.server_sr import SpeechToTextServer; print('β
Ready')"
# 3. Run development server
uv run python -m mcp_speech_to_textdocker compose up -d # Start services
docker compose logs -f # View logs
docker compose down # Stop services./scripts/build-x86_64.sh # Build for x86_64
docker run -d --name mcp-speech mcp-speech-to-text:x86_64-latest# With uv (macOS recommended)
uv sync && uv run python -m mcp_speech_to_text
# With pip
pip install -e . && python -m mcp_speech_to_texttranscribe_audio_offline- Vosk offline transcription (x86_64 only)transcribe_audio_file- SpeechRecognition transcription (all platforms)record_and_transcribe- Live microphone recording and transcriptionget_supported_engines- List available speech engines
convert_audio_format- Convert between audio formatstest_microphone- Test microphone functionality and list devicesget_supported_formats- List supported audio formats
# Test current setup
./scripts/test-deployment.sh
# Test Docker image
docker run --rm mcp-speech-to-text:latest python -c "
from src.mcp_speech_to_text.server import OfflineSpeechToTextServer
server = OfflineSpeechToTextServer()
print('β
Server healthy')
"# Test SpeechRecognition setup
uv run python -c "
from src.mcp_speech_to_text.server_sr import SpeechToTextServer
server = SpeechToTextServer()
print('β
Development environment ready')
"- Startup Time: 10-15 seconds (model loading)
- Memory Usage: 200-300MB (with small model)
- CPU Usage: 5-10% during transcription
- Accuracy: Very good for offline recognition
- Latency: Near real-time (< 1 second)
- Internet: Not required after setup
- Startup Time: 2-3 seconds
- Memory Usage: 50-100MB
- CPU Usage: 2-5%
- Accuracy: Excellent (Google API)
- Latency: 1-3 seconds
- Internet: Required for operation
mcp-speech-to-text/
βββ src/mcp_speech_to_text/
β βββ server.py # Vosk server (x86_64 production)
β βββ server_sr.py # SpeechRecognition server (dev/fallback)
β βββ __main__.py # Auto-detecting entry point
β βββ models/ # Vosk models (auto-downloaded)
βββ scripts/
β βββ build-x86_64.sh # Production build script
β βββ test-deployment.sh # Comprehensive testing
βββ .github/workflows/
β βββ build-x86_64.yml # CI/CD for x86_64 builds
βββ Dockerfile # Multi-platform container
βββ docker-compose.yml # Production deployment config
βββ DEPLOYMENT_X86_64.md # Detailed production guide
βββ README.md # This file
| Variable | Default | Description |
|---|---|---|
SPEECH_ENGINE |
auto |
vosk, google, or auto |
VOSK_MODEL_PATH |
/app/models |
Path to Vosk models |
MCP_SERVER_PORT |
8000 |
Server port |
| Host Path | Container Path | Purpose |
|---|---|---|
./audio_files |
/app/audio_files |
Audio file storage |
./models |
/app/models/custom |
Custom Vosk models |
Expected behavior - macOS ARM doesn't support Vosk. Use SpeechRecognition:
uv run python -m mcp_speech_to_text.server_srUse platform-specific build:
docker buildx build --platform linux/amd64 .Add device access:
docker run --device /dev/snd your-image- Install audio packages:
apt-get install portaudio19-dev - Verify model download:
ls src/mcp_speech_to_text/models/ - Check container logs:
docker logs mcp-speech-to-text
- Install portaudio:
brew install portaudio - Use development server:
server_sr.py - Enable microphone permissions in System Settings
# Basic deployment
docker compose up -d# Scale up containers
docker compose up -d --scale mcp-speech-to-text=3See DEPLOYMENT_X86_64.md for Kubernetes manifests and advanced deployment patterns.
- β No Data Transmission - All processing happens locally
- β No API Keys - No external service dependencies
- β Container Security - Runs as non-root user
- β Minimal Attack Surface - Only required ports exposed
- β Audio Privacy - Files never leave your infrastructure
- DEPLOYMENT_X86_64.md - Comprehensive production deployment guide
- GitHub Actions - Automated testing and building
- Docker Hub - Pre-built images (coming soon)
- No Vendor Lock-in - Independent of OpenAI, Google, Azure
- Predictable Costs - Zero ongoing API charges
- Data Privacy - Audio processing never leaves your infrastructure
- High Availability - No dependency on external services
- Regional Independence - Works anywhere, including Hong Kong
- Enterprise Environments - Privacy and compliance requirements
- Cost-Sensitive Projects - High-volume speech processing
- Offline Environments - Air-gapped or limited connectivity
- Regional Restrictions - Areas with limited API access
- Development Teams - Consistent dev/prod environments
- Fork the repository
- Develop on macOS using
server_sr.py - Test on x86_64 Linux using Docker
- Submit pull request with platform testing
MIT License - Complete freedom to use, modify, and distribute.
Ready to deploy speech-to-text without the cloud? Choose your platform above and get started! π