Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

spark-tts

CONTAINERS IMAGES RUN BUILD

SparkTTS

SparkTTS is a high-quality text-to-speech synthesis model that provides natural-sounding speech generation. This container includes the SparkTTS model running optimized for Jetson devices, offering both standard TTS functionality and zero-shot voice cloning capabilities.

System Requirements

  • Memory: Requires at least 5GB of available RAM

Features

  • Natural-sounding speech synthesis
  • Adjustable pitch and speed
  • Gender selection for standard TTS
  • Zero-shot voice cloning from audio samples

Usage Examples

When using jetson-containers run, the generated audio files are automatically saved in the jetson-containers/data/audio/tts/spark-tts/ directory on your host system, and models are cached in jetson-containers/data/models/huggingface/.

Standard Text-to-Speech (CLI)

Generate speech from text with customizable parameters:

jetson-containers run $(autotag spark-tts) \
    --pitch "moderate" \
    --speed "moderate" \
    --gender "female" \
    --text "The quick brown fox jumps over the lazy dog"

Available options:

  • --pitch: "very_low", "low", "moderate", "high", "very_high"
  • --speed: "very_low", "low", "moderate", "high", "very_high"
  • --gender: "female", "male"

Zero-shot Voice Cloning (CLI)

Clone a voice from a sample audio file (note: the audio file must be accessible inside the container, put it in the jetson-containers/data directory):

jetson-containers run $(autotag spark-tts) \
    --prompt_speech_path "/data/audio/sample.wav" \
    --prompt_text "This is a sample prompt text that matches the audio sample..." \
    --speed "moderate" \
    --text "Hi, this is a test of voice cloning with Spark TTS!"

Output Location

When using jetson-containers run, the following directories are automatically mounted and accessible:

  • Audio output: jetson-containers/data/audio/tts/spark-tts/
  • Model cache: jetson-containers/data/models/huggingface/

The generated audio files will be saved with timestamped filenames like 20250325230742.wav.

Model Source

This container uses the SparkTTS model from Hugging Face: Spark-TTS by SparkAudio

CONTAINERS
spark-tts
   Requires L4T ['>=36.1.0']
   Dependencies build-essential pip_cache:cu126 cuda:12.6 cudnn python numpy cmake onnx pytorch:2.8 torchaudio torchvision huggingface_hub rust transformers
   Dockerfile Dockerfile
   Notes Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens – https://github.com/SparkAudio/Spark-TTS
RUN CONTAINER

To start the container, you can use jetson-containers run and autotag, or manually put together a docker run command:

# automatically pull or build a compatible container image
jetson-containers run $(autotag spark-tts)

# or if using 'docker run' (specify image and mounts/ect)
sudo docker run --runtime nvidia -it --rm --network=host spark-tts:36.4.0

jetson-containers run forwards arguments to docker run with some defaults added (like --runtime nvidia, mounts a /data cache, and detects devices)
autotag finds a container image that's compatible with your version of JetPack/L4T - either locally, pulled from a registry, or by building it.

To mount your own directories into the container, use the -v or --volume flags:

jetson-containers run -v /path/on/host:/path/in/container $(autotag spark-tts)

To launch the container running a command, as opposed to an interactive shell:

jetson-containers run $(autotag spark-tts) my_app --abc xyz

You can pass any options to it that you would to docker run, and it'll print out the full command that it constructs before executing it.

BUILD CONTAINER

If you use autotag as shown above, it'll ask to build the container for you if needed. To manually build it, first do the system setup, then run:

jetson-containers build spark-tts

The dependencies from above will be built into the container, and it'll be tested during. Run it with --help for build options.