SparkTTS is a high-quality text-to-speech synthesis model that provides natural-sounding speech generation. This container includes the SparkTTS model running optimized for Jetson devices, offering both standard TTS functionality and zero-shot voice cloning capabilities.
- Memory: Requires at least 5GB of available RAM
- Natural-sounding speech synthesis
- Adjustable pitch and speed
- Gender selection for standard TTS
- Zero-shot voice cloning from audio samples
When using jetson-containers run, the generated audio files are automatically saved in the jetson-containers/data/audio/tts/spark-tts/ directory on your host system, and models are cached in jetson-containers/data/models/huggingface/.
Generate speech from text with customizable parameters:
jetson-containers run $(autotag spark-tts) \
--pitch "moderate" \
--speed "moderate" \
--gender "female" \
--text "The quick brown fox jumps over the lazy dog"Available options:
--pitch: "very_low", "low", "moderate", "high", "very_high"--speed: "very_low", "low", "moderate", "high", "very_high"--gender: "female", "male"
Clone a voice from a sample audio file (note: the audio file must be accessible inside the container, put it in the jetson-containers/data directory):
jetson-containers run $(autotag spark-tts) \
--prompt_speech_path "/data/audio/sample.wav" \
--prompt_text "This is a sample prompt text that matches the audio sample..." \
--speed "moderate" \
--text "Hi, this is a test of voice cloning with Spark TTS!"When using jetson-containers run, the following directories are automatically mounted and accessible:
- Audio output:
jetson-containers/data/audio/tts/spark-tts/ - Model cache:
jetson-containers/data/models/huggingface/
The generated audio files will be saved with timestamped filenames like 20250325230742.wav.
This container uses the SparkTTS model from Hugging Face: Spark-TTS by SparkAudio
CONTAINERS
spark-tts |
|
|---|---|
| Requires | L4T ['>=36.1.0'] |
| Dependencies | build-essential pip_cache:cu126 cuda:12.6 cudnn python numpy cmake onnx pytorch:2.8 torchaudio torchvision huggingface_hub rust transformers |
| Dockerfile | Dockerfile |
| Notes | Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens – https://github.com/SparkAudio/Spark-TTS |
RUN CONTAINER
To start the container, you can use jetson-containers run and autotag, or manually put together a docker run command:
# automatically pull or build a compatible container image
jetson-containers run $(autotag spark-tts)
# or if using 'docker run' (specify image and mounts/ect)
sudo docker run --runtime nvidia -it --rm --network=host spark-tts:36.4.0
jetson-containers runforwards arguments todocker runwith some defaults added (like--runtime nvidia, mounts a/datacache, and detects devices)
autotagfinds a container image that's compatible with your version of JetPack/L4T - either locally, pulled from a registry, or by building it.
To mount your own directories into the container, use the -v or --volume flags:
jetson-containers run -v /path/on/host:/path/in/container $(autotag spark-tts)To launch the container running a command, as opposed to an interactive shell:
jetson-containers run $(autotag spark-tts) my_app --abc xyzYou can pass any options to it that you would to docker run, and it'll print out the full command that it constructs before executing it.
BUILD CONTAINER
If you use autotag as shown above, it'll ask to build the container for you if needed. To manually build it, first do the system setup, then run:
jetson-containers build spark-ttsThe dependencies from above will be built into the container, and it'll be tested during. Run it with --help for build options.