Lightweight streaming text-to-speech with Kokoro engine.
- Streaming TTS: Real-time audio synthesis with callback support
- Voice Blending: Mix multiple voices with weighted formulas
- Pause Tags: Insert natural pauses with
[pause:1.5s]syntax - Text Normalization: Convert URLs, emails, numbers, money to spoken form
- Smart Chunking: Token-aware text splitting for optimal quality
- Multi-Format Output: Export to WAV, MP3, Opus, FLAC, AAC (with
[audio]extra) - 54 Voices: American, British English + 7 other languages
pip install streaming-ttsFor development installation:
pip install -e .# Japanese language support
pip install streaming-tts[jp]
# Chinese language support
pip install streaming-tts[zh]
# Korean language support
pip install streaming-tts[ko]
# All features
pip install streaming-tts[all]Note: Non-English languages require espeak-ng to be installed on your system.
from streaming_tts import TextToAudioStream, KokoroEngine
# Initialize the engine
engine = KokoroEngine(voice="af_heart")
# Create stream and play
stream = TextToAudioStream(engine)
stream.feed("Hello, world! This is a test of streaming text to speech.").play()Insert natural pauses in your text:
text = "Hello! [pause:1s] How are you? [pause:500ms] I hope you're well."
stream.feed(text).play()Automatically convert special content to spoken form:
from streaming_tts import normalize_text, NormalizationOptions
options = NormalizationOptions(normalize=True)
# URLs, emails, numbers, money, etc.
text = "Visit https://example.com or email [email protected]. Price: $42.50"
normalized = normalize_text(text, options)
# -> "Visit https example dot com or email user at test dot com. Price: forty-two dollars and fifty cents"Split long text into optimal chunks for TTS:
from streaming_tts import smart_split, process_text_with_pauses
import time
# Process text with pauses
for item in process_text_with_pauses(text, normalize=True):
if isinstance(item, float):
time.sleep(item) # Pause
else:
stream.feed(item).play() # Speakfrom streaming_tts import StreamingAudioWriter
# Requires: pip install streaming-tts[audio]
writer = StreamingAudioWriter("mp3", sample_rate=24000)
for audio_chunk in audio_chunks:
mp3_data = writer.write_chunk(audio_chunk)
# Stream or save mp3_data
final_data = writer.write_chunk(finalize=True)
writer.close()from streaming_tts import TextToAudioStream, KokoroEngine
def on_audio_chunk(chunk):
# Process audio chunk (e.g., send over websocket)
pass
def on_stream_stop():
print("Audio stream finished")
engine = KokoroEngine(voice="af_heart")
stream = TextToAudioStream(engine, on_audio_stream_stop=on_stream_stop)
stream.feed("Hello world").play(muted=True, on_audio_chunk=on_audio_chunk)- Female:
af_heart,af_alloy,af_aoede,af_bella,af_jessica,af_kore,af_nicole,af_nova,af_river,af_sarah,af_sky - Male:
am_adam,am_echo,am_eric,am_fenrir,am_liam,am_michael,am_onyx,am_puck,am_santa
- Female:
bf_alice,bf_emma,bf_isabella,bf_lily - Male:
bm_daniel,bm_fable,bm_george,bm_lewis
- Japanese:
jf_alpha,jf_gongitsune,jf_nezumi,jf_tebukuro,jm_kumo - Chinese:
zf_xiaobei,zf_xiaoni,zf_xiaoxiao,zf_xiaoyi,zm_yunjian,zm_yunxi,zm_yunxia,zm_yunyang - Spanish:
ef_dora,em_alex,em_santa - French:
ff_siwis - Hindi:
hf_alpha,hf_beta,hm_omega,hm_psi - Italian:
if_sara,im_nicola - Portuguese:
pf_dora,pm_alex,pm_santa
You can blend multiple voices using weighted formulas:
engine = KokoroEngine(voice="0.3*af_sarah + 0.7*am_adam")KokoroEngine(
voice="af_heart", # Voice name or blend formula
default_speed=1.0, # Speech speed multiplier
trim_silence=True, # Remove silence from audio
debug=False # Enable debug output
)TextToAudioStream(
engine, # KokoroEngine instance
on_audio_stream_start=None, # Callback when audio starts
on_audio_stream_stop=None, # Callback when audio stops
on_audio_chunk=None, # Callback for each audio chunk
on_word=None, # Callback for word timing
muted=False # Disable speaker output
)- Python 3.9-3.12
- PyAudio (may require system dependencies)
- Torch
PyAudio on Windows may require Visual C++ Build Tools. If you encounter issues:
pip install pipwin
pipwin install pyaudioMIT
