Skip to content

Add Smallest AI STT and TTS service integrations#4014

Draft
markbackman wants to merge 6 commits intomainfrom
markbackman/smallest-settings-cleanup
Draft

Add Smallest AI STT and TTS service integrations#4014
markbackman wants to merge 6 commits intomainfrom
markbackman/smallest-settings-cleanup

Conversation

@markbackman
Copy link
Contributor

Summary

Adds Smallest AI WebSocket-based STT and TTS services, aligned with current Pipecat conventions.

Based on and supersedes #3897 by @harshitajain165. Key changes from the original PR:

  • WebSocket-only: Removed HTTP variants in favor of WebSocket services for real-time use
  • Settings pattern: Replaced InputParams with SmallestSTTSettings/SmallestTTSSettings dataclasses
  • Init-only vs runtime-updatable: encoding is init-only; voice/language/speed/etc. are runtime-updatable
  • _update_settings overrides: STT reconnects on any change; TTS reconnects only on model change
  • TTS keepalive: Added flush keepalive every 30s to prevent 60s idle timeout
  • Removed redundant overrides: _handle_interruption and duplicate TTFB metrics
  • TTS settings default to None: API applies its own defaults for speed/consistency/similarity/enhancement
  • Full STT language support: All 32 documented languages with resolve_language fallback
  • Model enums for defaults: SmallestSTTModel.PULSE and SmallestTTSModel.LIGHTNING_V3_1

Test plan

  • Verify STT transcription with Smallest API key
  • Verify TTS synthesis with Smallest API key
  • Confirm keepalive prevents idle timeout on both services
  • Test runtime settings update triggers reconnect appropriately

Harshita Jain and others added 6 commits March 12, 2026 18:22
- STT: Update model from lightning to pulse with new API URL
- STT: Add SmallestRealtimeSTTService using Pulse WebSocket API
  for low-latency streaming transcription
- TTS: Add lightning-v3.1 model and set as default
- stt_latency: Add SMALLEST_TTFS_P99 constant

Made-with: Cursor
Migrate STT/TTS services from deprecated set_model_name()/set_voice() to the
new ServiceSettings pattern (STTSettings/TTSSettings). Add default voice_id
("sophia") for TTS services, fix voice references, and include two foundational
example scripts showing WebSocket and HTTP usage.

Made-with: Cursor
@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 36.33634% with 212 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pipecat/services/smallest/tts.py 32.77% 121 Missing ⚠️
src/pipecat/services/smallest/stt.py 40.13% 91 Missing ⚠️
Files with missing lines Coverage Δ
src/pipecat/services/stt_latency.py 100.00% <100.00%> (ø)
src/pipecat/services/smallest/stt.py 40.13% <40.13%> (ø)
src/pipecat/services/smallest/tts.py 32.77% <32.77%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@markbackman
Copy link
Contributor Author

@harshitajain165 I started from your branch and focused on the services best suited for real-time. That is, the websocket services.

The TTS service is working well and would be ready for production use.

I'm struggling with the STT service—I see that the transcriptions are usually missing words and I get hallucinations that appear ~3 seconds after the final transcript arrives. Have you seen this with other implementations and do you have any other recommendations? As it stands, the STT service is no fit for production, so I'd like to iron out the issues before merging this.

@markbackman markbackman marked this pull request as draft March 13, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant