-
Notifications
You must be signed in to change notification settings - Fork 2k
[Feature] TTS: Support [pause:xx] Tags, Auto-Editor Cleanup, and Dependency Upgrade #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The pause tag is a huge improvement and has made my workflow usable with Chatterbox. Thank you! |
|
Hello, Are there any specific instructions or guides I can follow to update my chatterbox with this code? I need the pause tag capability badly. |
You can click the branch above (in this case, https://github.com/EasyMetaAu/chatterbox/tree/master), pull and build it. That's what I did. |
|
@mylukin @feliscat git clone https://github.com/EasyMetaAu/chatterbox.git import torchaudio as ta model = ChatterboxTTS.from_pretrained(device="cuda") |
Change to : This is [pause:1s] my test text |
The correct format is [pause:Xs] |
|
Is there a reason why this PR isn't being merged? Of course there are conflicts right now that need resolving, but has it been reviewed by official contributors? |
|
Commenting because I really this feature also, if possible to merge it. |
|
I just also wanted to second that this PR would be extremely useful 😄 I would love to see it merged! |
…encies for Gradio and Setuptools
…rtifact cleaning, and add support for custom pause tags in audio generation.
…omments to English and remove unused uv.lock file.
… top_p parameters to _generate_single_segment method and its calls, improving flexibility in audio output configuration.
This update introduces a new method for handling long text inputs by splitting them into segments and generating audio asynchronously. It includes enhancements for managing pause tags and cleaning audio segments, improving overall performance and flexibility in audio generation.
…h for better clarity and maintainability. Update documentation strings to reflect English parameters and return values.
…e support. Introduce language detection, sentence separator patterns, and punctuation handling for English, Chinese, Japanese, and Korean. Update split_by_word_boundary and merge_short_sentences functions to accommodate language-specific features, improving text segmentation for TTS processing.
Overview
This pull request adds pause tag support and audio artifact cleaning features to Chatterbox TTS, while maintaining full compatibility with the upstream multilingual implementation.
Status: ✅ Successfully rebased onto
up/master(includes Multilingual v2 #295)Key Features
1. Pause Tag Support (
[pause:Xs])Users can now insert pauses in generated audio using the
[pause:Xs]syntax:Implementation:
parse_pause_tags()function parses pause markers from text (tts.py:643)create_silence()generates silent audio segments (tts.py:690)2. Auto-Editor Artifact Cleaning
Removes unwanted audio artifacts while preserving pause boundaries:
Implementation:
_clean_artifacts()method integrates auto-editor tool (tts.py:579)3. Long Text Async Processing
Handles long text generation efficiently:
New utility functions in
text_utils.py:split_text_into_segments()- Intelligent text segmentationsplit_by_word_boundary()- Language-aware word boundary detectionmerge_short_sentences()- Combines short segmentsdetect_language()- Auto-detects text languageCompatibility with Upstream
This PR has been successfully rebased onto the latest upstream master, which includes:
✅ Multilingual v2 Update (#295) - 23 language support
✅ ChatterboxMultilingualTTS - New multilingual TTS class
✅ MTLTokenizer - Multilingual tokenization
✅ All upstream bug fixes and improvements
Both feature sets work together seamlessly:
Changes Summary
Modified Files
src/chatterbox/tts.py(+434 lines)parse_pause_tags()functioncreate_silence()function_clean_artifacts()methodgenerate()method with pause and artifact cleaning supportuse_auto_editor,ae_threshold,ae_margin,disable_watermark,max_segment_length,max_workerssrc/chatterbox/text_utils.py(NEW - 358 lines)src/chatterbox/__init__.pyChatterboxTTSandChatterboxMultilingualTTSSUPPORTED_LANGUAGES(23 languages)pyproject.toml0.1.4(matching upstream)>=3.10(matching upstream)>=1.24.0,<1.26.0(matching upstream)auto-editor>=27.0.0(for artifact cleaning)resampy==0.4.3(for audio resampling)README.mdTesting
All features have been tested and verified:
✅ Python Syntax - All files compile successfully
✅ Pause Tag Parsing - Handles single/multiple/edge cases
✅ Multilingual Support - 23 languages correctly exported
✅ Text Utilities - All segmentation functions work
✅ Module Exports - All imports functional
✅ Dependencies - Correctly merged (32/32 tests passed)
Test Results: 100% pass rate (32/32 tests)
Usage Examples
Basic Pause Tags
With Artifact Cleaning
Long Text Processing
Multilingual with Pause Tags
Migration Notes
This PR maintains backward compatibility:
ChatterboxTTScontinues to work unchangedAcknowledgments
Checklist