-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Tdt buffered inference fix #13500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tdt buffered inference fix #13500
Conversation
|
@nithinraok can you take a look by any chance? So far I'm not sure it's 100% correct yet, since it's noticeably worse than offline inference in my testing. I'm not sure how much we're expecting the gap to be though... Could you let me know if it works OK on your end? |
|
@weiqingw4ng tested it for few samples and it worked great! I will add some numbers here with |
13c3a00 to
f8a7ecc
Compare
|
Below table shows evaluation of same size, same data trained parakeet models with various decoders. Evaluated with greedy (total buffer size, chunk len in sec)
|
nithinraok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test case for buffered tdt inference
| # Change Decoding Config | ||
| with open_dict(cfg.decoding): | ||
| if cfg.stateful_decoding: | ||
| if cfg.stateful_decoding or cfg.merge_algo == 'tdt': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this info to the doc string above and also add to script usage at the top of this script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| # Merge algorithm for transducers | ||
| merge_algo: Optional[str] = 'middle' # choices=['middle', 'lcs'], choice of algorithm to apply during inference. | ||
| merge_algo: Optional[str] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: set merge_algo to None (meaning "auto")
In the code use something like:
if merge_algo is None:
merge_algo = "tdt" if model.is_tdt() else "middle"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Could you also set default to None in the config (here)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I can't seem to find test code for the existing buffered RNNT inference to base the new test on. Not sure how to do it efficiently... What I can think of is to include a small trained TDT model in the repo and include a short audio recording also to run this, but this will take quite some work... Shall we do this later? |
Signed-off-by: Francesco Bertolotti <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: oliver könig <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
…NeMo#13493) Signed-off-by: oliver könig <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>
* fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * make fp8 tests non-optional Signed-off-by: Alexandros Koumparoulis <[email protected]> * switch to gemma Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: oliver könig <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
* beep boop: Update changelog Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Add changelog highlights Signed-off-by: Charlie Truong <[email protected]> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Charlie Truong <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Charlie Truong <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: Charlie Truong <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
* tests: Disable flaky test Signed-off-by: oliver könig <[email protected]> * remove breakpoint Signed-off-by: oliver könig <[email protected]> --------- Signed-off-by: oliver könig <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
* Fix 2.3.0 changelog Signed-off-by: Charlie Truong <[email protected]> * Update 2.3.0 changelog Signed-off-by: Charlie Truong <[email protected]> --------- Signed-off-by: Charlie Truong <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: Pranav Prashant Thombre <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
* add extra hyena tests * Apply isort and black reformatting Signed-off-by: JRD971000 <[email protected]> * fix num gpus * keep sft optional --------- Signed-off-by: JRD971000 <[email protected]> Co-authored-by: JRD971000 <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: oliver könig <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
26c24bf to
62284b6
Compare
|
I think something messed up. Now its showing 21 file changes |
Not sure what happened back then. I did a pull of the latest main and re-push'ed. It seems it's fixed now. |
Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: hainan-xv <[email protected]>
nithinraok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @hainan-xv
artbataev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
* added use-fast tokenizer argument (NVIDIA-NeMo#12986) Signed-off-by: Francesco Bertolotti <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: oliver könig <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * ci: Run selective triggering on dockerfiles and dependencies (NVIDIA-NeMo#13493) Signed-off-by: oliver könig <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * fix buffered inference for tdt Signed-off-by: Hainan Xu <[email protected]> * small fixes Signed-off-by: Hainan Xu <[email protected]> * [automodel] fallback FP8 + LCE -> FP8 + CE (NVIDIA-NeMo#13349) * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * make fp8 tests non-optional Signed-off-by: Alexandros Koumparoulis <[email protected]> * switch to gemma Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: oliver könig <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Update changelog for `r2.3.0` (NVIDIA-NeMo#13501) * beep boop: Update changelog Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Add changelog highlights Signed-off-by: Charlie Truong <[email protected]> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Charlie Truong <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Charlie Truong <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Update 2.3.0 changelog (NVIDIA-NeMo#13503) Signed-off-by: Charlie Truong <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * ci: Remove trt-llm breakpoint (NVIDIA-NeMo#13499) * tests: Disable flaky test Signed-off-by: oliver könig <[email protected]> * remove breakpoint Signed-off-by: oliver könig <[email protected]> --------- Signed-off-by: oliver könig <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Update 2.3.0 changelog (NVIDIA-NeMo#13504) * Fix 2.3.0 changelog Signed-off-by: Charlie Truong <[email protected]> * Update 2.3.0 changelog Signed-off-by: Charlie Truong <[email protected]> --------- Signed-off-by: Charlie Truong <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Enabling flash decode for float16 precision only (NVIDIA-NeMo#13471) Signed-off-by: Pranav Prashant Thombre <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Fix changelog formatting (NVIDIA-NeMo#13505) Signed-off-by: Charlie Truong <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Updating the long context performance number for B200 (NVIDIA-NeMo#13468) * Add without CP numbers for B200 and merge the captioning texts of both into one. Signed-off-by: Youngeun Kwon <[email protected]> * figure removed Signed-off-by: Youngeun Kwon <[email protected]> --------- Signed-off-by: Youngeun Kwon <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Autodetect model_type and dtype for deployment using TRT-LLM backend (NVIDIA-NeMo#13209) * Autodetect model_type and dtype for deployment using TRT-LLM backed Signed-off-by: Jan Lasek <[email protected]> * Handling kv_cache_qformat parameter Signed-off-by: Jan Lasek <[email protected]> * Apply isort and black reformatting Signed-off-by: janekl <[email protected]> * Docstring update Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: janekl <[email protected]> Co-authored-by: janekl <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * remove unused variable Signed-off-by: Hainan Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: hainan-xv <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * add doc string, cleaner way of setting mergo_algo Signed-off-by: Hainan Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: hainan-xv <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * add extra hyena tests (NVIDIA-NeMo#13097) * add extra hyena tests * Apply isort and black reformatting Signed-off-by: JRD971000 <[email protected]> * fix num gpus * keep sft optional --------- Signed-off-by: JRD971000 <[email protected]> Co-authored-by: JRD971000 <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * ci: Add mode files to filter (NVIDIA-NeMo#13517) Signed-off-by: oliver könig <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * change default merge_algo for buffered inference to None Signed-off-by: Hainan Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: hainan-xv <[email protected]> --------- Signed-off-by: Francesco Bertolotti <[email protected]> Signed-off-by: Hainan Xu <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Charlie Truong <[email protected]> Signed-off-by: Pranav Prashant Thombre <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: janekl <[email protected]> Signed-off-by: hainan-xv <[email protected]> Signed-off-by: JRD971000 <[email protected]> Co-authored-by: Francesco Bertolotti <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: Hainan Xu <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Charlie Truong <[email protected]> Co-authored-by: pthombre <[email protected]> Co-authored-by: Youngeun Kwon <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: janekl <[email protected]> Co-authored-by: hainan-xv <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: JRD971000 <[email protected]> Signed-off-by: jianbinc <[email protected]>
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Fix issues with TDT buffered inference that results in bad accuracy.
Collection: [Note which collection this PR will affect]
ASR
Changelog
Usage
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information