Support for OuteTTS 1.0 by edwko · Pull Request #12794 · ggml-org/llama.cpp

edwko · 2025-04-07T09:54:32Z

Since v1.0 has simplified processing, this implementation provides full feature support.

Changes and Features

JSON Speaker Loading:
- Added support for the new JSON speaker format, which includes an interface version.
- OuteTTS 1.0 is supported using interface version 3.
Text Chunking for Long Inputs:
- Enables processing of very long input texts by splitting them.
- Splitting respects minimum and maximum word boundaries (min = 10, max = 30).
- Supports multilingual text.
- Can be disabled via --tts-no-text-chunking (default: enabled).
Text Preprocessing & Prompt:
- While optional, a light cleanup and normalization step is included to improve output quality.
- Added new required prompt handling for the v1.0
Code Organization:
- Implementation is located in: tts-outetts-v1.cpp.
- A default speaker is added in a header file as JSON default_speaker.h.

TODO / Help Needed

DAC (Descript Audio Codec) Integration:
- The decoder layers from DAC need to be implemented:
  descript-audio-codec/dac/model/dac.py
- Model used:
  weights_24khz_1.5kbps_v1.0.pth
- DAC is supported by the transformers library and can be converted to safetensors, which might help implementation.
  Also, see this PR I submitted to fix a dependency issue in the conversion script for compatibility with newer PyTorch versions:
  transformers PR #36393
- Requesting assistance from @ngxson and @ggerganov for implementing this part.

Example Commands

Default generation uses default speaker automatically and chunked text:

build/bin/llama-tts-outetts-v1 -m "path/to/model.gguf" -p "A very very long text"

Disables chunked text:

build/bin/llama-tts-outetts-v1 -m "path/to/model.gguf" -p "Hello, how are you doing?" --tts-no-text-chunking

With custom speaker file:

build/bin/llama-tts-outetts-v1 -m "path/to/model.gguf" -p "A very very long text" --tts-speaker-file "path/to/speaker.json"

ngxson · 2025-04-07T11:39:11Z

The decoder layers from DAC need to be implemented

FYI, currently we're missing Snake1d which should be implemented via #12487

ggerganov · 2025-04-22T12:52:19Z

Does DAC replace WavTokenizer?

edwko · 2025-04-22T17:22:44Z

Does DAC replace WavTokenizer?

Yes, since this model is multilingual, DAC is a better fit for reconstructing audio across languages.

Horschig · 2025-04-25T09:15:19Z

It would be really great if this would get merged. However I was wondering whether it'd also be possible to add mulitlingual support to llama-server?

foldl · 2025-05-19T06:51:19Z

FYI: OuteTTS 1.0 is supported by chatllm.cpp. You can find DAC & SNAC implementation there.

squash-merge of ggml-org/llama.cpp PR ggml-org#12794 onto main. adds llama-tts-outetts-v1 binary for native text-to-speech using OuteTTS 1.0 model with WavTokenizer vocoder. files moved from examples/tts/ to tools/tts/ to match upstream rename.

checks daily for new llama.cpp releases. auto-rebases cherry-picks (audio ggml-org#18641, outetss ggml-org#12794, eagle-3 ggml-org#18039). creates tagged release on clean rebase, PR on conflicts. PR ggml-org#19460 (GLM-5 DSA) already merged upstream, not in cherry-pick list.

…or current API)

Remove PR ggml-org#12794 (OuteTTS 1.0) and PR ggml-org#18039 (Eagle-3 speculative decoding) from the cherry-pick list. Neither is used by any model in the registry. Only PR ggml-org#18641 (LFM2.5 audio) remains.

edwko added 2 commits April 7, 2025 09:29

OuteTTS 1.0 support

f420015

Revert tts.cpp

38126e9

edwko marked this pull request as draft April 7, 2025 09:55

github-actions Bot added examples python python script changes labels Apr 7, 2025

Som-anon mentioned this pull request Apr 7, 2025

Eval bug: OuteTTS 0.3 and up crashes #12807

Closed

jhen0409 mentioned this pull request Jun 19, 2025

OuteTTS - isVocoderEnable is false mybigday/llama.rn#152

Closed

TimPietruskyRunPod mentioned this pull request Feb 14, 2026

ci: add automated rebase workflow runpod-labs/a2go-llamacpp#1

Closed

2 tasks

TimPietruskyRunPod pushed a commit to runpod-labs/a2go-llamacpp that referenced this pull request Feb 14, 2026

fix: disable stale OuteTTS v1 build (PR ggml-org#12794 needs rebase f…

663a6b3

…or current API)

This was referenced Apr 3, 2026

fix(ci): drop unused cherry-picks (OuteTTS, Eagle-3) runpod-labs/a2go-llamacpp#5

Closed

fix(ci): drop all cherry-picks for clean mainline rebase runpod-labs/a2go-llamacpp#6

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for OuteTTS 1.0#12794

Support for OuteTTS 1.0#12794
edwko wants to merge 2 commits into
ggml-org:masterfrom
edwko:master

edwko commented Apr 7, 2025 •

edited

Loading

Uh oh!

ngxson commented Apr 7, 2025

Uh oh!

ggerganov commented Apr 22, 2025

Uh oh!

edwko commented Apr 22, 2025

Uh oh!

Horschig commented Apr 25, 2025

Uh oh!

foldl commented May 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

edwko commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes and Features

TODO / Help Needed

Example Commands

Uh oh!

ngxson commented Apr 7, 2025

Uh oh!

ggerganov commented Apr 22, 2025

Uh oh!

edwko commented Apr 22, 2025

Uh oh!

Horschig commented Apr 25, 2025

Uh oh!

foldl commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

edwko commented Apr 7, 2025 •

edited

Loading

foldl commented May 19, 2025 •

edited

Loading