Skip to content

Conversation

@pzelasko
Copy link
Collaborator

@pzelasko pzelasko commented Jun 4, 2025

What does this PR do ?

Adds tests and fixes inconsistency in ASR feature extractor and subsampling when processing the same input with and without padding. Specifically:

  • decreases feature extractor's sequence length tensor by 1 as the previously computed value included an extra padding frame in majority of the cases
  • removes a Dirac-delta-like spike due to lack of masking in preemphasis (only present when audio_length < audio.shape["time"])
  • replaces "reflect" padding with zero-padding for padding-length-invariance
  • replaces convolution with masked convolution in subsampling to discard frames outside of the sequence length

As a result, the models' WER outcomes vary much less with batch size, but the outcome is still not 100% identical across batch sizes. For example, for parakeet-tdt-0.6b-v2, parakeet-rnnt-1.1b, and canary-180m-flash the absolute difference between batch sizes 128 and 512 was 0.01% WER.

Comparison of all NVIDIA NeMo ASR models on Open ASR Leaderboard (offline only):

Model Current WER New WER Relative diff
CTC      
nvidia/parakeet-ctc-1.1b 7.4 7.39 -0.14%
nvidia/parakeet-ctc-0.6b 7.69 7.65 -0.52%
nvidia/stt_en_fastconformer_ctc_large 8.96 8.94 -0.22%
nvidia/stt_en_conformer_ctc_large 8.32 8.5 2.16%
nvidia/stt_en_conformer_ctc_small 11.16 11.16 0.00%
RNNT      
nvidia/parakeet-tdt-0.6b-v2 6.05 6.06 0.17%
nvidia/parakeet-tdt-1.1b 7.01 6.92 -1.28%
nvidia/parakeet-rnnt-1.1b 7.12 7.04 -1.12%
nvidia/parakeet-rnnt-0.6b 7.5 7.42 -1.07%
nvidia/stt_en_fastconformer_transducer_large 9.06 8.57 -5.41%
stt_en_conformer_transducer_small 10.26 9.75 -4.97%
nvidia/parakeet-tdt_ctc-110m 7.49 7.49 0.00%
AED      
nvidia/canary-1b-flash 6.35 6.31 -0.63%
nvidia/canary-180m-flash 7.12 7.08 -0.56%
nvidia/canary-1b 6.5 6.47 -0.46%

I also checked the results on NVTalks for one cache-aware model:

Model Current WER New WER Relative diff
stt_en_fastconformer_hybrid_large_streaming_1040ms 14.73 14.75 0.13%

Collection: ASR

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: Piotr Żelasko <[email protected]>
nithinraok
nithinraok previously approved these changes Jun 4, 2025
Copy link
Collaborator

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Checked with parakeet models as well.

@pytest.mark.skip(reason="Used only for debugging.")
@pytest.mark.parametrize("length", [16000])
def test_canary_invariant_to_padding(deterministic_rng, length):
model = ASRModel.from_pretrained("nvidia/canary-180m-flash").eval()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no pretrained :)

@github-actions github-actions bot removed the Run CICD label Jun 4, 2025
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
@tango4j
Copy link
Collaborator

tango4j commented Jun 27, 2025

Just commenting for future reference.
To make the code future-proof, For speaker diarization (Sortformer), I imported the featurizer's parameters and then use the same formula to calculate the total feature frame count.

For Sortformer, Lhotse-based inference is supported but training is not supported yet.
Will update this with the Streaming Sortformer updates (feature frame calculation etc)

@pzelasko pzelasko enabled auto-merge (squash) July 1, 2025 19:43
@github-actions github-actions bot removed the Run CICD label Jul 1, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2025

[🤖]: Hi @pzelasko 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

@tango4j
Copy link
Collaborator

tango4j commented Jul 1, 2025

@pzelasko I checked the diarization unit tests. As long as it passes all unit tests and CI test, I think the change makes no issues on Sortformer diarization.

Copy link
Collaborator

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@pzelasko pzelasko merged commit 0fd4de5 into main Jul 2, 2025
248 checks passed
@pzelasko pzelasko deleted the fix-pad-inconsistency-feature-extractor branch July 2, 2025 12:37
AmirHussein96 pushed a commit to AmirHussein96/NeMo that referenced this pull request Jul 23, 2025
* Fix feature extractor to be invariant to padding

Signed-off-by: Piotr Żelasko <[email protected]>

* fix ci

Signed-off-by: Piotr Żelasko <[email protected]>

* preliminary conformer inference parity with/without padding

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* fix CI check

Signed-off-by: Piotr Żelasko <[email protected]>

* fix to cache-aware models

Signed-off-by: Piotr Żelasko <[email protected]>

* fix a bunch of tests

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix failing CI tests

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix failing CI tests part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Unit test fixes for too short feature extractor inputs

Signed-off-by: Piotr Żelasko <[email protected]>

* Resolved feature frame length issue in E2E diarization dataloader

Signed-off-by: taejinp <[email protected]>

* Apply isort and black reformatting

Signed-off-by: tango4j <[email protected]>

* fix ci

Signed-off-by: Piotr Żelasko <[email protected]>

* removed test_ds from YAML file since it is not used

Signed-off-by: taejinp <[email protected]>

* fix diarization unit tests after recent changes

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: taejinp <[email protected]>
Signed-off-by: tango4j <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Co-authored-by: Charlie Truong <[email protected]>
Co-authored-by: taejinp <[email protected]>
Co-authored-by: tango4j <[email protected]>
Signed-off-by: Amir Hussein <[email protected]>
AmirHussein96 pushed a commit to AmirHussein96/NeMo that referenced this pull request Aug 5, 2025
* Fix feature extractor to be invariant to padding

Signed-off-by: Piotr Żelasko <[email protected]>

* fix ci

Signed-off-by: Piotr Żelasko <[email protected]>

* preliminary conformer inference parity with/without padding

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* fix CI check

Signed-off-by: Piotr Żelasko <[email protected]>

* fix to cache-aware models

Signed-off-by: Piotr Żelasko <[email protected]>

* fix a bunch of tests

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix failing CI tests

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix failing CI tests part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Unit test fixes for too short feature extractor inputs

Signed-off-by: Piotr Żelasko <[email protected]>

* Resolved feature frame length issue in E2E diarization dataloader

Signed-off-by: taejinp <[email protected]>

* Apply isort and black reformatting

Signed-off-by: tango4j <[email protected]>

* fix ci

Signed-off-by: Piotr Żelasko <[email protected]>

* removed test_ds from YAML file since it is not used

Signed-off-by: taejinp <[email protected]>

* fix diarization unit tests after recent changes

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: taejinp <[email protected]>
Signed-off-by: tango4j <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Co-authored-by: Charlie Truong <[email protected]>
Co-authored-by: taejinp <[email protected]>
Co-authored-by: tango4j <[email protected]>
Signed-off-by: Amir Hussein <[email protected]>
AmirHussein96 pushed a commit to AmirHussein96/NeMo that referenced this pull request Aug 5, 2025
* Fix feature extractor to be invariant to padding

Signed-off-by: Piotr Żelasko <[email protected]>

* fix ci

Signed-off-by: Piotr Żelasko <[email protected]>

* preliminary conformer inference parity with/without padding

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* fix CI check

Signed-off-by: Piotr Żelasko <[email protected]>

* fix to cache-aware models

Signed-off-by: Piotr Żelasko <[email protected]>

* fix a bunch of tests

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix failing CI tests

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix failing CI tests part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Unit test fixes for too short feature extractor inputs

Signed-off-by: Piotr Żelasko <[email protected]>

* Resolved feature frame length issue in E2E diarization dataloader

Signed-off-by: taejinp <[email protected]>

* Apply isort and black reformatting

Signed-off-by: tango4j <[email protected]>

* fix ci

Signed-off-by: Piotr Żelasko <[email protected]>

* removed test_ds from YAML file since it is not used

Signed-off-by: taejinp <[email protected]>

* fix diarization unit tests after recent changes

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: taejinp <[email protected]>
Signed-off-by: tango4j <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Co-authored-by: Charlie Truong <[email protected]>
Co-authored-by: taejinp <[email protected]>
Co-authored-by: tango4j <[email protected]>
Signed-off-by: Amir Hussein <[email protected]>
nasretdinovr pushed a commit to nasretdinovr/NeMo that referenced this pull request Aug 8, 2025
* Fix feature extractor to be invariant to padding

Signed-off-by: Piotr Żelasko <[email protected]>

* fix ci

Signed-off-by: Piotr Żelasko <[email protected]>

* preliminary conformer inference parity with/without padding

Signed-off-by: Piotr Żelasko <[email protected]>

* fix

Signed-off-by: Piotr Żelasko <[email protected]>

* fix tests

Signed-off-by: Piotr Żelasko <[email protected]>

* fixes

Signed-off-by: Piotr Żelasko <[email protected]>

* fix CI check

Signed-off-by: Piotr Żelasko <[email protected]>

* fix to cache-aware models

Signed-off-by: Piotr Żelasko <[email protected]>

* fix a bunch of tests

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix failing CI tests

Signed-off-by: Piotr Żelasko <[email protected]>

* Fix failing CI tests part 2

Signed-off-by: Piotr Żelasko <[email protected]>

* Unit test fixes for too short feature extractor inputs

Signed-off-by: Piotr Żelasko <[email protected]>

* Resolved feature frame length issue in E2E diarization dataloader

Signed-off-by: taejinp <[email protected]>

* Apply isort and black reformatting

Signed-off-by: tango4j <[email protected]>

* fix ci

Signed-off-by: Piotr Żelasko <[email protected]>

* removed test_ds from YAML file since it is not used

Signed-off-by: taejinp <[email protected]>

* fix diarization unit tests after recent changes

Signed-off-by: Piotr Żelasko <[email protected]>

---------

Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: taejinp <[email protected]>
Signed-off-by: tango4j <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Co-authored-by: Charlie Truong <[email protected]>
Co-authored-by: taejinp <[email protected]>
Co-authored-by: tango4j <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants