Skip to content

MagpieTTS decoder model on top of NeMo main branch#15277

Draft
paarthneekhara wants to merge 94 commits intoNVIDIA-NeMo:mainfrom
paarthneekhara:magpietts_decoderonly_2601
Draft

MagpieTTS decoder model on top of NeMo main branch#15277
paarthneekhara wants to merge 94 commits intoNVIDIA-NeMo:mainfrom
paarthneekhara:magpietts_decoderonly_2601

Conversation

@paarthneekhara
Copy link
Collaborator

No description provided.

Comment on lines +42 to +50
from nemo.collections.tts.modules.nemotron_h_decoder import (
HybridMambaAttentionDynamicCache,
NemotronHConfig,
NemotronHForCausalLM,
NemotronHMLP,
NemotronHModel,
NemotronHMOE,
NemotronHTopkRouter,
)

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'NemotronHMLP' is not used.
@@ -0,0 +1,173 @@
name: Magpie-TTS-DecoderOnly-EN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we tested the non-Lhotse path?

Copy link
Collaborator

@blisc blisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments from WIP review

Comment on lines +14 to +21
"""
MagpieTTS Streaming Inference Test Script.

This script tests the streaming TTS inference functionality, supporting both
single sample (batch_size=1) and batched inference (batch_size>1).

For batched inference, each item in the batch can have different context lengths
and be in different processing phases (context, prompt, phoneme-only, audio).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add to this as to how this differs from magpietts_inference.py?

return [self._token2id[p] for p in ps]


class IPABPETokenizer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we subclass Tokenizer instead of instantiation within the class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subclassing Tokenizer may not be the best idea because we will have to reassign some internal methods/params to self for it to work correctly. If we call from_file() from the init function, we'll have to reassign normalizer, pre_tokenizer, post_processor and decoder of the loaded model to self. I am moving the imports to the top. It would look something like this if we were to subclass Tokenizer.

class IPABPETokenizer(Tokenizer):
    """Simple IPA BPE tokenizer subclassing HuggingFace tokenizers.Tokenizer.

    Args:
        tokenizer_path: Path to the tokenizer.json file (or directory containing it).
    """

    def __init__(self, tokenizer_path: str):
        if os.path.isdir(tokenizer_path):
            tokenizer_file = os.path.join(tokenizer_path, "tokenizer.json")
        else:
            tokenizer_file = tokenizer_path

        if not os.path.exists(tokenizer_file):
            raise ValueError(f"Tokenizer file not found: {tokenizer_file}")

        loaded = Tokenizer.from_file(tokenizer_file)
        super().__init__(loaded.model)
        self.normalizer = loaded.normalizer
        self.pre_tokenizer = loaded.pre_tokenizer
        self.post_processor = loaded.post_processor
        self.decoder = loaded.decoder

        self.tokens = self.get_vocab()
        self.pad = self.tokens.get("<pad>", None)

    def encode(self, text: str) -> List[int]:
        """Encode IPA text to token IDs."""
        return super().encode(text).ids

    def decode(self, tokens: List[int]) -> str:
        """Decode token IDs back to IPA text."""
        return super().decode(tokens)

Comment on lines 1159 to +1169
elif isinstance(tokenizer, PreTrainedTokenizerBase):
_tokens = list(tokenizer.get_vocab().keys())
tokens.extend(_tokens)
num_tokens = len(_tokens)
tokenizer_pad_ids[tokenizer_name] = tokenizer.pad_token_id + tokenizer_offset
pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.unk_token_id
if pad_token_id is None:
raise ValueError(
f"Tokenizer '{tokenizer_name}' has no pad_token_id or unk_token_id. "
"Please set one before using with AggregatedTTSTokenizer."
)
tokenizer_pad_ids[tokenizer_name] = pad_token_id + tokenizer_offset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this affect existing MagpieTTS checkpoints?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shehzeen Can you check this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MagpieTTS should work the same way. Nemotron tokenizer has pad_token_id as None, so we are using the unk_token_id as the pad_token_id in EasyMagpie. In Magpie tokenizers, tokenizer.pad_token_id is not None (if it's ever None, the tokenizer setup would error out in the old code). So the code functionality should stay the same for MagpieTTS since the tokenizers have the pad_token_id.

batch_size = batch['text'].size(0)
phoneme_stacking_factor = model.phoneme_stacking_factor
phoneme_vocab_size = model.phoneme_vocab_size

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable T_phoneme is not used.
@shehzeen shehzeen force-pushed the magpietts_decoderonly_2601 branch from 54d6283 to 06c516f Compare February 12, 2026 00:12
@github-actions github-actions bot added the core Changes to NeMo Core label Feb 17, 2026
Comment on lines +948 to +951
def process_text_for_cer(input_text):
"""
Normalizes text for CER/WER calculation.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: @rlangman @rfejgin since we were talking about this. Let's lift this from the decoder PR and move it to main early

Comment on lines +63 to +70
def instantiate_phoneme_tokenizer(phoneme_tokenizer_config):
phoneme_tokenizer = instantiate(phoneme_tokenizer_config)
phoneme_vocab_size = len(phoneme_tokenizer.tokens)
phoneme_tokenizer.bos_token_id = phoneme_vocab_size
phoneme_tokenizer.eos_token_id = phoneme_vocab_size + 1
phoneme_tokenizer.unk_token_id = phoneme_vocab_size + 2
phoneme_tokenizer.vocab_size = phoneme_vocab_size + 3
return phoneme_tokenizer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure when you call this function, but this should be part of the tokenizer class not a util function in the dataset.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this only exists in the Lhotse file but not the non-Lhotse file?

dataset.phoneme_tokenizer = instantiate_phoneme_tokenizer(dataset.phoneme_tokenizer_config)


class EasyMagpieTTSModel(ModelPT):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really large file. Can we split it up? Some suggestions

  • Anything that's common with Encoder-Decoder Magpie, let's move to a separate base class:
    • The code manipulaiton functions
    • The local transformer functions
    • etc
  • Let's move the dataclasses to another file, although we can debate this
  • Let's move worker_init_fn too since it should be common to both models
  • Could consider splitting training and inference into two classes as well

phoneme_input_type = 'gt' if random.random() < gt_phoneme_input_prob else 'pred'

generation_start_time = time.perf_counter()
print("Inference started")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch print statments to logging

snapshot[id(p)] = p.data.clone()
return snapshot

def _print_grad_weight_summary(self, metrics: Dict[str, float], step: int) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not depend on self. Consider moving all helper print functions into a separate file and call them within the model instead of defining additional class functions

@paarthneekhara paarthneekhara force-pushed the magpietts_decoderonly_2601 branch from f684fc3 to eeac2ce Compare March 9, 2026 16:32
if do_backward:
self.manual_backward(chunk_outputs['loss'] * chunk_weight)

accumulated_loss = accumulated_loss + chunk_outputs['loss'].detach() * chunk_weight

Check failure

Code scanning / CodeQL

Superclass attribute shadows subclass method Error

This method is shadowed by
attribute training_step
in superclass
ModelPT
.
waveforms: torch.Tensor,
waveform_lens: torch.Tensor,
prefix: str,
sample_rate: int,

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable time_id is not used.
audio_save_time_sec = time.perf_counter() - save_start_time
audio_durations = [
int(predicted_audio_lens[idx].item()) / self.output_sample_rate for idx in range(predicted_audio.size(0))
]

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable gt_speaker_embeddings is not used.
paarthneekhara and others added 17 commits March 10, 2026 16:00
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
shehzeen and others added 19 commits March 10, 2026 16:01
Signed-off-by: Shehzeen Hussain <[email protected]>
Signed-off-by: Shehzeen Hussain <[email protected]>
Signed-off-by: Shehzeen Hussain <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Shehzeen Hussain <[email protected]>
* new base class

Signed-off-by: Paarth Neekhara <[email protected]>

* Magpie models refactoring

Signed-off-by: Paarth Neekhara <[email protected]>

---------

Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Shehzeen Hussain <[email protected]>
@shehzeen shehzeen force-pushed the magpietts_decoderonly_2601 branch from 81af95a to c8ad57a Compare March 10, 2026 23:04

if not torch.distributed.is_initialized():
print(
f"[val_dataloader] rank={self.global_rank}: Distributed not initialized, skipping DistributedSampler wrap"

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning

This assignment to 'source_index' is unnecessary as it is
redefined
before this value is used.
import os
import random
from dataclasses import dataclass
from typing import Dict, List, Optional, Tuple

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'Dict' is not used.
Import of 'List' is not used.
Comment on lines +37 to +43
from nemo.collections.tts.models.easy_magpietts_inference import (
EasyMagpieTTSInferenceModel,
InferBatchOutput,
StreamingFinalizeOutput,
StreamingState,
TrainingMode,
)

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'InferBatchOutput' is not used.
Import of 'StreamingFinalizeOutput' is not used.
Import of 'StreamingState' is not used.
* undo model pt

Signed-off-by: Shehzeen Hussain <[email protected]>

* remove test infer vs proces batch

Signed-off-by: Shehzeen Hussain <[email protected]>

* undo inference changes for easy magpie to start fresh

Signed-off-by: Shehzeen Hussain <[email protected]>

* inference refactoring

Signed-off-by: Shehzeen Hussain <[email protected]>

---------

Signed-off-by: Shehzeen Hussain <[email protected]>
@github-actions github-actions bot removed the core Changes to NeMo Core label Mar 11, 2026
shehzeen and others added 7 commits March 11, 2026 17:31
* clean up code, rename back to magpietts_inference.py

Signed-off-by: Shehzeen Hussain <[email protected]>

* bug fixes, inference runs now

Signed-off-by: Shehzeen Hussain <[email protected]>

---------

Signed-off-by: Shehzeen Hussain <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants