[MagpieTTS] Magpietts longform unify by subhankar-ghosh · Pull Request #15477 · NVIDIA-NeMo/NeMo

subhankar-ghosh · 2026-03-09T07:59:18Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: subhankar-ghosh <[email protected]>

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Subhankar Ghosh <[email protected]>

Signed-off-by: subhankar-ghosh <[email protected]>

…o into magpietts_longform_unify

Signed-off-by: Subhankar Ghosh <[email protected]>

Signed-off-by: subhankar-ghosh <[email protected]>

…o into magpietts_longform_unify

Signed-off-by: subhankar-ghosh <[email protected]>

…o into magpietts_longform_unify

for PR it is important to merge main to this branch, as it is out of date.

Signed-off-by: subhankar-ghosh <[email protected]>

Signed-off-by: Subhankar Ghosh <[email protected]>

Added regex to remove spaces in Japanese transcripts as a workaround for a bug in the Ja normalizer. Signed-off-by: Subhankar Ghosh <[email protected]>

Signed-off-by: subhankar-ghosh <[email protected]>

github-actions · 2026-03-10T08:20:24Z

[🤖]: Hi @subhankar-ghosh 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

rfejgin

Overall looks good but please see comments.
Also, I earlier confirmed that CERs for the frame stacking model look good after this change (same as pre-unification), so it looks like this fixes the issue we observed.

rfejgin · 2026-03-10T20:09:52Z

nemo/collections/tts/models/magpietts.py


            predicted_codes = torch.cat(state.all_predictions, dim=-1)  # (B, C, F*T_steps)
            num_steps = len(state.all_predictions)
+            default_frame_len = num_steps * self.frame_stacking_factor


Could you add back the comment that was here originally, I think it got lost:
# Concatenate the list of predictions along the time dimension. Note that when frame stacking is on, this also undoes the stacking.

rfejgin · 2026-03-10T20:14:02Z

nemo/collections/tts/models/magpietts.py

                finished_texts_counter={},
                attn_prior=initial_attn_prior,
            )
+            chunk_end_frame_lens: Dict[int, int] = {}


Could you add a comment saying what this tracks and if it persists across chunked calls to generate_speech()? Because it appears that this one keeps state locally, unlike chunk_state which is persistent between calls but not super clear from the naming.

Side note, maybe we could find a better name than chunk_state since that structure appears not to be associated with a particular chunk but rather tracks overall inference state (I think). E.g. could call it inference_state or chunked_inference_state (the latter is admittedly kind of verbose).

Added. It does not maintain state across generate_speech calls. It is local only.

rfejgin · 2026-03-10T20:20:53Z

nemo/collections/tts/models/magpietts.py

        Args:
            chunk_state: Mutable state object tracking history across chunks.
-            audio_codes_next: Sampled audio codes. Shape: (B, num_codebooks).
+            audio_codes_next: Sampled audio codes. Shape: (B, num_codebooks) or (B, num_codebooks, frame_stacking_factor).


Isn't it always 3-dimensional, with frame_stacking_factor being 1 if there's no frame stacking?

Signed-off-by: subhankar-ghosh <[email protected]>

github-actions · 2026-03-11T09:11:44Z

[🤖]: Hi @subhankar-ghosh 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

rfejgin

Looks good to me.

* Refactor audio processing to include frame lengths for Framestacking Signed-off-by: Subhankar Ghosh <[email protected]> * Adding back Fix Japanese transcript normalization issue Added regex to remove spaces in Japanese transcripts as a workaround for a bug in the Ja normalizer. Signed-off-by: Subhankar Ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> --------- Signed-off-by: Subhankar Ghosh <[email protected]> Signed-off-by: subhankar-ghosh <[email protected]> Co-authored-by: subhankar-ghosh <[email protected]>

subhankar-ghosh and others added 29 commits February 2, 2026 09:51

Unify longform with standard inference

74e0204

Signed-off-by: subhankar-ghosh <[email protected]>

Unify longform with standard inference - small fixes

7d8e746

Signed-off-by: subhankar-ghosh <[email protected]>

Hi and Ja

e920a7f

Signed-off-by: subhankar-ghosh <[email protected]>

Rename longform with chunk

6c0549c

Signed-off-by: subhankar-ghosh <[email protected]>

Apply isort and black reformatting

03a487f

Signed-off-by: subhankar-ghosh <[email protected]>

Added Long and short test cases, review comments

9c40f43

Signed-off-by: subhankar-ghosh <[email protected]>

merge conflict

f8a76a9

Signed-off-by: subhankar-ghosh <[email protected]>

merge conflicts

e7966fc

Signed-off-by: subhankar-ghosh <[email protected]>

Apply isort and black reformatting

de64eae

Signed-off-by: subhankar-ghosh <[email protected]>

Potential fix for code scanning alert no. 16979: Unused import

1768f6b

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Subhankar Ghosh <[email protected]>

Potential fix for code scanning alert no. 16980: Unused import

02572fb

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Subhankar Ghosh <[email protected]>

Fix unit tests

5f13ea2

Signed-off-by: subhankar-ghosh <[email protected]>

Merge branch 'magpietts_longform_unify' of github.com:NVIDIA-NeMo/NeM…

7aa2c36

…o into magpietts_longform_unify

Change ssim_target

6106c18

Signed-off-by: Subhankar Ghosh <[email protected]>

Merge branch 'main' into magpietts_longform_unify

50ff865

Reset kv cache after a batch.

f51c961

Signed-off-by: subhankar-ghosh <[email protected]>

Merge branch 'magpietts_longform_unify' of github.com:NVIDIA-NeMo/NeM…

648083d

…o into magpietts_longform_unify

Fix tests with latest checkpoint, torch load weight_only false

23da4ab

Signed-off-by: subhankar-ghosh <[email protected]>

Apply isort and black reformatting

aa6ac16

Signed-off-by: subhankar-ghosh <[email protected]>

review comments

4a42fad

Signed-off-by: subhankar-ghosh <[email protected]>

Merge branch 'magpietts_longform_unify' of github.com:NVIDIA-NeMo/NeM…

f872a4a

…o into magpietts_longform_unify

Merge branch 'main' into magpietts_longform_unify

4d919f0

for PR it is important to merge main to this branch, as it is out of date.

Fix Framestacking test command.

e433d76

Signed-off-by: subhankar-ghosh <[email protected]>

Change checkpoint in magpie tests, review comments

55b5b5e

Signed-off-by: subhankar-ghosh <[email protected]>

Frame stacking MagpieTTS generate_speech method

b3a234f

Signed-off-by: subhankar-ghosh <[email protected]>

Typo fix in tests

b673bca

Signed-off-by: subhankar-ghosh <[email protected]>

Framestacking fix

88da860

Signed-off-by: subhankar-ghosh <[email protected]>

Refactor audio processing to include frame lengths for Framestacking

ebea092

Signed-off-by: Subhankar Ghosh <[email protected]>

Adding back Fix Japanese transcript normalization issue

98acf28

Added regex to remove spaces in Japanese transcripts as a workaround for a bug in the Ja normalizer. Signed-off-by: Subhankar Ghosh <[email protected]>

subhankar-ghosh added the TTS label Mar 9, 2026

subhankar-ghosh added Run CICD skip-linting labels Mar 9, 2026

chtruong814 added Run CICD and removed Run CICD labels Mar 9, 2026

Apply isort and black reformatting

4385e71

Signed-off-by: subhankar-ghosh <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels Mar 9, 2026

chtruong814 temporarily deployed to test March 9, 2026 08:01 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Mar 9, 2026

blisc removed the skip-linting label Mar 10, 2026

subhankar-ghosh requested review from blisc and rfejgin March 10, 2026 16:49

subhankar-ghosh enabled auto-merge (squash) March 10, 2026 16:50

rfejgin reviewed Mar 10, 2026

View reviewed changes

subhankar-ghosh and others added 3 commits March 10, 2026 23:46

Fix review comments

2660a10

Signed-off-by: subhankar-ghosh <[email protected]>

Merge conflicts

398900b

Signed-off-by: subhankar-ghosh <[email protected]>

Apply isort and black reformatting

2bc657f

Signed-off-by: subhankar-ghosh <[email protected]>

subhankar-ghosh requested a review from rfejgin March 11, 2026 06:50

subhankar-ghosh added the Run CICD label Mar 11, 2026

Merge branch 'main' into magpietts_longform_unify

88de65b

chtruong814 added Run CICD and removed Run CICD labels Mar 11, 2026

chtruong814 temporarily deployed to test March 11, 2026 06:53 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Mar 11, 2026

rfejgin approved these changes Mar 11, 2026

View reviewed changes

subhankar-ghosh merged commit 80062ee into main Mar 11, 2026
131 checks passed

subhankar-ghosh deleted the magpietts_longform_unify branch March 11, 2026 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MagpieTTS] Magpietts longform unify#15477

[MagpieTTS] Magpietts longform unify#15477
subhankar-ghosh merged 34 commits intomainfrom
magpietts_longform_unify

subhankar-ghosh commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

rfejgin left a comment

Uh oh!

rfejgin Mar 10, 2026

Uh oh!

subhankar-ghosh Mar 11, 2026

Uh oh!

rfejgin Mar 10, 2026

Uh oh!

subhankar-ghosh Mar 11, 2026

Uh oh!

rfejgin Mar 10, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

rfejgin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

subhankar-ghosh commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

rfejgin left a comment

Choose a reason for hiding this comment

Uh oh!

rfejgin Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

subhankar-ghosh Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

rfejgin Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

subhankar-ghosh Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

rfejgin Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

rfejgin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

subhankar-ghosh commented Mar 9, 2026 •

edited

Loading