[MagpieTTS] Magpietts longform unify#15477
Conversation
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Subhankar Ghosh <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Subhankar Ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
…o into magpietts_longform_unify
Signed-off-by: Subhankar Ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
…o into magpietts_longform_unify
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
…o into magpietts_longform_unify
for PR it is important to merge main to this branch, as it is out of date.
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: Subhankar Ghosh <[email protected]>
Added regex to remove spaces in Japanese transcripts as a workaround for a bug in the Ja normalizer. Signed-off-by: Subhankar Ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
|
[🤖]: Hi @subhankar-ghosh 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
rfejgin
left a comment
There was a problem hiding this comment.
Overall looks good but please see comments.
Also, I earlier confirmed that CERs for the frame stacking model look good after this change (same as pre-unification), so it looks like this fixes the issue we observed.
|
|
||
| predicted_codes = torch.cat(state.all_predictions, dim=-1) # (B, C, F*T_steps) | ||
| num_steps = len(state.all_predictions) | ||
| default_frame_len = num_steps * self.frame_stacking_factor |
There was a problem hiding this comment.
Could you add back the comment that was here originally, I think it got lost:
# Concatenate the list of predictions along the time dimension. Note that when frame stacking is on, this also undoes the stacking.
| finished_texts_counter={}, | ||
| attn_prior=initial_attn_prior, | ||
| ) | ||
| chunk_end_frame_lens: Dict[int, int] = {} |
There was a problem hiding this comment.
Could you add a comment saying what this tracks and if it persists across chunked calls to generate_speech()? Because it appears that this one keeps state locally, unlike chunk_state which is persistent between calls but not super clear from the naming.
Side note, maybe we could find a better name than chunk_state since that structure appears not to be associated with a particular chunk but rather tracks overall inference state (I think). E.g. could call it inference_state or chunked_inference_state (the latter is admittedly kind of verbose).
There was a problem hiding this comment.
Added. It does not maintain state across generate_speech calls. It is local only.
| Args: | ||
| chunk_state: Mutable state object tracking history across chunks. | ||
| audio_codes_next: Sampled audio codes. Shape: (B, num_codebooks). | ||
| audio_codes_next: Sampled audio codes. Shape: (B, num_codebooks) or (B, num_codebooks, frame_stacking_factor). |
There was a problem hiding this comment.
Isn't it always 3-dimensional, with frame_stacking_factor being 1 if there's no frame stacking?
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
Signed-off-by: subhankar-ghosh <[email protected]>
|
[🤖]: Hi @subhankar-ghosh 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
* Refactor audio processing to include frame lengths for Framestacking Signed-off-by: Subhankar Ghosh <[email protected]> * Adding back Fix Japanese transcript normalization issue Added regex to remove spaces in Japanese transcripts as a workaround for a bug in the Ja normalizer. Signed-off-by: Subhankar Ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> --------- Signed-off-by: Subhankar Ghosh <[email protected]> Signed-off-by: subhankar-ghosh <[email protected]> Co-authored-by: subhankar-ghosh <[email protected]>
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information