MagpieTTS decoder model on top of NeMo main branch#15277
MagpieTTS decoder model on top of NeMo main branch#15277paarthneekhara wants to merge 94 commits intoNVIDIA-NeMo:mainfrom
Conversation
nemo/collections/tts/modules/magpietts_inference/evaluate_generated_audio.py
Fixed
Show fixed
Hide fixed
| @@ -0,0 +1,173 @@ | |||
| name: Magpie-TTS-DecoderOnly-EN | |||
There was a problem hiding this comment.
Have we tested the non-Lhotse path?
blisc
left a comment
There was a problem hiding this comment.
Some more comments from WIP review
| """ | ||
| MagpieTTS Streaming Inference Test Script. | ||
|
|
||
| This script tests the streaming TTS inference functionality, supporting both | ||
| single sample (batch_size=1) and batched inference (batch_size>1). | ||
|
|
||
| For batched inference, each item in the batch can have different context lengths | ||
| and be in different processing phases (context, prompt, phoneme-only, audio). |
There was a problem hiding this comment.
Can you add to this as to how this differs from magpietts_inference.py?
| return [self._token2id[p] for p in ps] | ||
|
|
||
|
|
||
| class IPABPETokenizer: |
There was a problem hiding this comment.
Should we subclass Tokenizer instead of instantiation within the class?
There was a problem hiding this comment.
Subclassing Tokenizer may not be the best idea because we will have to reassign some internal methods/params to self for it to work correctly. If we call from_file() from the init function, we'll have to reassign normalizer, pre_tokenizer, post_processor and decoder of the loaded model to self. I am moving the imports to the top. It would look something like this if we were to subclass Tokenizer.
class IPABPETokenizer(Tokenizer):
"""Simple IPA BPE tokenizer subclassing HuggingFace tokenizers.Tokenizer.
Args:
tokenizer_path: Path to the tokenizer.json file (or directory containing it).
"""
def __init__(self, tokenizer_path: str):
if os.path.isdir(tokenizer_path):
tokenizer_file = os.path.join(tokenizer_path, "tokenizer.json")
else:
tokenizer_file = tokenizer_path
if not os.path.exists(tokenizer_file):
raise ValueError(f"Tokenizer file not found: {tokenizer_file}")
loaded = Tokenizer.from_file(tokenizer_file)
super().__init__(loaded.model)
self.normalizer = loaded.normalizer
self.pre_tokenizer = loaded.pre_tokenizer
self.post_processor = loaded.post_processor
self.decoder = loaded.decoder
self.tokens = self.get_vocab()
self.pad = self.tokens.get("<pad>", None)
def encode(self, text: str) -> List[int]:
"""Encode IPA text to token IDs."""
return super().encode(text).ids
def decode(self, tokens: List[int]) -> str:
"""Decode token IDs back to IPA text."""
return super().decode(tokens)
nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py
Outdated
Show resolved
Hide resolved
| elif isinstance(tokenizer, PreTrainedTokenizerBase): | ||
| _tokens = list(tokenizer.get_vocab().keys()) | ||
| tokens.extend(_tokens) | ||
| num_tokens = len(_tokens) | ||
| tokenizer_pad_ids[tokenizer_name] = tokenizer.pad_token_id + tokenizer_offset | ||
| pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.unk_token_id | ||
| if pad_token_id is None: | ||
| raise ValueError( | ||
| f"Tokenizer '{tokenizer_name}' has no pad_token_id or unk_token_id. " | ||
| "Please set one before using with AggregatedTTSTokenizer." | ||
| ) | ||
| tokenizer_pad_ids[tokenizer_name] = pad_token_id + tokenizer_offset |
There was a problem hiding this comment.
Does this affect existing MagpieTTS checkpoints?
There was a problem hiding this comment.
MagpieTTS should work the same way. Nemotron tokenizer has pad_token_id as None, so we are using the unk_token_id as the pad_token_id in EasyMagpie. In Magpie tokenizers, tokenizer.pad_token_id is not None (if it's ever None, the tokenizer setup would error out in the old code). So the code functionality should stay the same for MagpieTTS since the tokenizers have the pad_token_id.
scripts/tts_dataset_files/bpe_ipa_tokenizer_2048_en_de_es_fr_hi_it_vi_zh.json
Outdated
Show resolved
Hide resolved
54d6283 to
06c516f
Compare
| def process_text_for_cer(input_text): | ||
| """ | ||
| Normalizes text for CER/WER calculation. | ||
| """ |
| def instantiate_phoneme_tokenizer(phoneme_tokenizer_config): | ||
| phoneme_tokenizer = instantiate(phoneme_tokenizer_config) | ||
| phoneme_vocab_size = len(phoneme_tokenizer.tokens) | ||
| phoneme_tokenizer.bos_token_id = phoneme_vocab_size | ||
| phoneme_tokenizer.eos_token_id = phoneme_vocab_size + 1 | ||
| phoneme_tokenizer.unk_token_id = phoneme_vocab_size + 2 | ||
| phoneme_tokenizer.vocab_size = phoneme_vocab_size + 3 | ||
| return phoneme_tokenizer |
There was a problem hiding this comment.
I'm not sure when you call this function, but this should be part of the tokenizer class not a util function in the dataset.
There was a problem hiding this comment.
However, this only exists in the Lhotse file but not the non-Lhotse file?
| dataset.phoneme_tokenizer = instantiate_phoneme_tokenizer(dataset.phoneme_tokenizer_config) | ||
|
|
||
|
|
||
| class EasyMagpieTTSModel(ModelPT): |
There was a problem hiding this comment.
This is a really large file. Can we split it up? Some suggestions
- Anything that's common with Encoder-Decoder Magpie, let's move to a separate base class:
- The code manipulaiton functions
- The local transformer functions
- etc
- Let's move the dataclasses to another file, although we can debate this
- Let's move worker_init_fn too since it should be common to both models
- Could consider splitting training and inference into two classes as well
| phoneme_input_type = 'gt' if random.random() < gt_phoneme_input_prob else 'pred' | ||
|
|
||
| generation_start_time = time.perf_counter() | ||
| print("Inference started") |
There was a problem hiding this comment.
Switch print statments to logging
| snapshot[id(p)] = p.data.clone() | ||
| return snapshot | ||
|
|
||
| def _print_grad_weight_summary(self, metrics: Dict[str, float], step: int) -> None: |
There was a problem hiding this comment.
This function does not depend on self. Consider moving all helper print functions into a separate file and call them within the model instead of defining additional class functions
f684fc3 to
eeac2ce
Compare
| if do_backward: | ||
| self.manual_backward(chunk_outputs['loss'] * chunk_weight) | ||
|
|
||
| accumulated_loss = accumulated_loss + chunk_outputs['loss'].detach() * chunk_weight |
Check failure
Code scanning / CodeQL
Superclass attribute shadows subclass method Error
| waveforms: torch.Tensor, | ||
| waveform_lens: torch.Tensor, | ||
| prefix: str, | ||
| sample_rate: int, |
Check notice
Code scanning / CodeQL
Unused local variable Note
| audio_save_time_sec = time.perf_counter() - save_start_time | ||
| audio_durations = [ | ||
| int(predicted_audio_lens[idx].item()) / self.output_sample_rate for idx in range(predicted_audio.size(0)) | ||
| ] |
Check notice
Code scanning / CodeQL
Unused local variable Note
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: paarthneekhara <[email protected]>
Signed-off-by: paarthneekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: paarthneekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: paarthneekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: paarthneekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: paarthneekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Shehzeen Hussain <[email protected]>
Signed-off-by: Shehzeen Hussain <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Shehzeen Hussain <[email protected]>
Signed-off-by: Shehzeen Hussain <[email protected]>
Signed-off-by: paarthneekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: paarthneekhara <[email protected]>
Signed-off-by: Shehzeen Hussain <[email protected]>
* new base class Signed-off-by: Paarth Neekhara <[email protected]> * Magpie models refactoring Signed-off-by: Paarth Neekhara <[email protected]> --------- Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: paarthneekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
…kenizer class Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: paarthneekhara <[email protected]>
Signed-off-by: Shehzeen Hussain <[email protected]>
81af95a to
c8ad57a
Compare
|
|
||
| if not torch.distributed.is_initialized(): | ||
| print( | ||
| f"[val_dataloader] rank={self.global_rank}: Distributed not initialized, skipping DistributedSampler wrap" |
Check warning
Code scanning / CodeQL
Variable defined multiple times Warning
| import os | ||
| import random | ||
| from dataclasses import dataclass | ||
| from typing import Dict, List, Optional, Tuple |
Check notice
Code scanning / CodeQL
Unused import Note
| from nemo.collections.tts.models.easy_magpietts_inference import ( | ||
| EasyMagpieTTSInferenceModel, | ||
| InferBatchOutput, | ||
| StreamingFinalizeOutput, | ||
| StreamingState, | ||
| TrainingMode, | ||
| ) |
Check notice
Code scanning / CodeQL
Unused import Note
* undo model pt Signed-off-by: Shehzeen Hussain <[email protected]> * remove test infer vs proces batch Signed-off-by: Shehzeen Hussain <[email protected]> * undo inference changes for easy magpie to start fresh Signed-off-by: Shehzeen Hussain <[email protected]> * inference refactoring Signed-off-by: Shehzeen Hussain <[email protected]> --------- Signed-off-by: Shehzeen Hussain <[email protected]>
Signed-off-by: shehzeen <[email protected]>
* clean up code, rename back to magpietts_inference.py Signed-off-by: Shehzeen Hussain <[email protected]> * bug fixes, inference runs now Signed-off-by: Shehzeen Hussain <[email protected]> --------- Signed-off-by: Shehzeen Hussain <[email protected]>
Signed-off-by: shehzeen <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: paarthneekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
…a/NeMo into magpietts_decoderonly_2601
No description provided.