Add Parakeet Hybrid RNNT CTC BPE Model with Prompt support#14561
Add Parakeet Hybrid RNNT CTC BPE Model with Prompt support#14561ko3n1g merged 59 commits intoNVIDIA-NeMo:mainfrom
Conversation
| proj_out_size = self._cfg.model_defaults.enc_hidden | ||
|
|
||
| self.prompt_kernel = torch.nn.Sequential( | ||
| torch.nn.Linear(proj_in_size, proj_out_size * 2), |
There was a problem hiding this comment.
Correct me if I'm misunderstood your implementation.
Instead of constructing the prompt vector in the dataloader and add a bunch of extra arguments, you may want to split the first layer of this MLP into enc_hidden_prj and prompt_linear_prj. Let's say the prompt does not change in each sequence (no code-switching), the equivalent computation to your current implementation is:
linear(relu(enc_hidden_prj(enc) + prompt_linear_prj(prompt_vector)))
where prompt_linear_prj(prompt_vector) does not need to be repeated over time-frame dimension due to broadcasting. If the prompt_vector is one hot, prompt_linear_prj(prompt_vector) is a row of the weight matrix, in other word, this is FiLM condtioning with unit scale vector and a task-dependent shift vector. You can activate multiple tasks at the same time if prompt vector is two-hot or something (e.g., X->En and PnC). You may get a little bit better performance with full FiLM conditioning with skip connection like the below:
linear(relu(enc_hidden_prj(enc) * (1 + prompt_scale_prj(prompt_vector)) + prompt_shift_prj(prompt_vector)))
There was a problem hiding this comment.
very good suggestion, i'm actually now working on testing code-switching within a single utterance. I'm thinking of making these type of code-switching language agnostic and not pass any lang id. I will try your suggestion.
nithinraok
left a comment
There was a problem hiding this comment.
LGTM. Resolve CI-CD issues then its good to merge.
172017f to
d72a1af
Compare
… inferance pipeline Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: ealbasiri <[email protected]> Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
| python -c "from nemo.collections.asr.models import EncDecHybridRNNTCTCBPEModelWithPrompt" && \ | ||
| NEMO_NUMBA_MINVER=0.53 CUDA_VISIBLE_DEVICES=0 \ | ||
| coverage run -a --data-file=/workspace/.coverage --source=/workspace/ \ | ||
| -m pytest tests/collections/asr/test_asr_hybrid_rnnt_ctc_model_bpe_prompt.py \ |
There was a problem hiding this comment.
I just noticed. These are not why CI-CD runs are present, see how example script was run to check for this model here: https://github.com/ealbasiri/NeMo/blob/hybrid-parakeet-tgt-lang-apr30/tests/functional_tests/ASR_dev_run_Speech_to_Text_WPE_-_Conformer.sh
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
Signed-off-by: Enas Albasiri <[email protected]>
…/NeMo into hybrid-parakeet-tgt-lang-apr30
Note: This is a reopened version of #13360 with all reviewer feedback addressed:
use_cerconfiguration issue (changed fromwer.use_certouse_cer)What does this PR do?
This PR adds support for Hybrid RNNT-CTC BPE Model with Prompt Feature (
EncDecHybridRNNTCTCBPEModelWithPrompt), enabling flexible ASR and AST tasks through prompt-based conditioning.Key Features
Prompt-Based Conditioning
Benefits
Collection: ASR
Changelog
EncDecHybridRNNTCTCBPEModelWithPromptmodel with prompt conditioningUsage
Training
Offline Inference
Model Usage
Before your PR is "Ready for review"
Pre checks:
PR Type:
Who can review?
@nithinraok @anhnami
Anyone in the community is free to review the PR once the checks have passed.
Additional Information
This model enables efficient multilingual ASR/AST through prompt conditioning, eliminating the need for multiple specialized models while maintaining high performance across languages and tasks.