Update Gemma3VL model training scripts#15041
Merged
chtruong814 merged 30 commits intoNVIDIA-NeMo:mainfrom Nov 21, 2025
Merged
Conversation
1e2018f to
6475ad2
Compare
6f46cf8 to
07da129
Compare
hemildesai
previously approved these changes
Nov 14, 2025
Collaborator
hemildesai
left a comment
There was a problem hiding this comment.
LGTM and the changes are all scoped to Gemma so shouldn't affect anything else.
Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: genquan9 <genquan9@users.noreply.github.com> Signed-off-by: genquan9 <genquan@google.com>
* optimize context manager and cache feature bufferer Signed-off-by: naymaraq <dkaramyan@nvidia.com> * speedUp cache_feature_bufferer Signed-off-by: naymaraq <dkaramyan@nvidia.com> * improved docstring in BatchedCacheFeatureBufferer Signed-off-by: naymaraq <dkaramyan@nvidia.com> --------- Signed-off-by: naymaraq <dkaramyan@nvidia.com> Co-authored-by: naymaraq <dkaramyan@nvidia.com> Signed-off-by: genquan9 <genquan@google.com>
…IDIA-NeMo#15042) * fix loading of hyb ctc rnnt bpe models when using from pretrained Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * Apply isort and black reformatting Signed-off-by: nithinraok <nithinraok@users.noreply.github.com> --------- Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: nithinraok <nithinraok@users.noreply.github.com> Co-authored-by: nithinraok <nithinraok@users.noreply.github.com> Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: genquan9 <genquan@google.com>
* add EP in PTQ (NVIDIA-NeMo#15015) Signed-off-by: jenchen13 <jennifchen@nvidia.com> Signed-off-by: Pablo Garay <pagaray@nvidia.com> * remove ExportDeploy Signed-off-by: Pablo Garay <pagaray@nvidia.com> * remove exportDeploy tests Signed-off-by: Pablo Garay <pagaray@nvidia.com> * remove references Signed-off-by: Pablo Garay <pagaray@nvidia.com> * lintfix Signed-off-by: Pablo Garay <pagaray@nvidia.com> * Fixing lines for multispeaker pipeline (NVIDIA-NeMo#15030) * Fixing lines for multispeaker pipeline Signed-off-by: taejinp <tango4j@gmail.com> * Removing unused imports Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Making changes for HF Space deployment Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: chtruong814 <chtruong814@users.noreply.github.com> * Updated multispk trans utils. Signed-off-by: taejinp <tango4j@gmail.com> --------- Signed-off-by: taejinp <tango4j@gmail.com> Signed-off-by: tango4j <tango4j@users.noreply.github.com> Signed-off-by: chtruong814 <chtruong814@users.noreply.github.com> Co-authored-by: tango4j <tango4j@users.noreply.github.com> Co-authored-by: chtruong814 <chtruong814@users.noreply.github.com> Signed-off-by: Pablo Garay <pagaray@nvidia.com> * remove ExportDeploy & references Signed-off-by: Pablo Garay <pagaray@nvidia.com> * lintfix Signed-off-by: Pablo Garay <pagaray@nvidia.com> * get load_ckpt back Signed-off-by: Pablo Garay <pagaray@nvidia.com> * lintfix Signed-off-by: Pablo Garay <pagaray@nvidia.com> * Apply isort and black reformatting Signed-off-by: pablo-garay <pablo-garay@users.noreply.github.com> * back Signed-off-by: Pablo Garay <pagaray@nvidia.com> * revert back Signed-off-by: Pablo Garay <pagaray@nvidia.com> * revert back Signed-off-by: Pablo Garay <pagaray@nvidia.com> * remove ExportDeploy Signed-off-by: Pablo Garay <pagaray@nvidia.com> --------- Signed-off-by: jenchen13 <jennifchen@nvidia.com> Signed-off-by: Pablo Garay <pagaray@nvidia.com> Signed-off-by: taejinp <tango4j@gmail.com> Signed-off-by: tango4j <tango4j@users.noreply.github.com> Signed-off-by: chtruong814 <chtruong814@users.noreply.github.com> Signed-off-by: pablo-garay <pablo-garay@users.noreply.github.com> Co-authored-by: Jenny Chen <jennifchen@nvidia.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: tango4j <tango4j@users.noreply.github.com> Co-authored-by: chtruong814 <chtruong814@users.noreply.github.com> Co-authored-by: pablo-garay <pablo-garay@users.noreply.github.com> Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com> Signed-off-by: genquan9 <genquan@google.com>
* beep boop: Update changelog Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update changelog for 2.5.3 Signed-off-by: Charlie Truong <chtruong@nvidia.com> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: genquan9 <genquan@google.com>
* fix RTVI missing bot message, fix diar not passing VAD frames Signed-off-by: stevehuang52 <heh@nvidia.com> * revert change to diar Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: genquan9 <genquan@google.com>
* make eou model default stt Signed-off-by: stevehuang52 <heh@nvidia.com> * fix typo Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up doc Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: genquan9 <genquan@google.com>
* removed old buffered CTC script Signed-off-by: naymaraq <dkaramyan@nvidia.com> * remove references to speech_to_text_buffered_infer_ctc.py Signed-off-by: naymaraq <dkaramyan@nvidia.com> --------- Signed-off-by: naymaraq <dkaramyan@nvidia.com> Co-authored-by: naymaraq <dkaramyan@nvidia.com> Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: genquan9 <genquan@google.com>
* Delete Automodel module Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Remove additional code using or importing automodel pathway Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Remove unused import Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Further remove hf automodel testing and hf automodel in vlm Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Remove unused vars Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Remove automodel instance in model opt Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Remove hf_auto_model_for_causal_ln Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Delete HFAutomodel from speech Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Add noqa Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Apply isort and black reformatting Signed-off-by: thomasdhc <thomasdhc@users.noreply.github.com> * Remove automodel related tests Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update init file to use import Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Apply isort and black reformatting Signed-off-by: thomasdhc <thomasdhc@users.noreply.github.com> --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: thomasdhc <thomasdhc@users.noreply.github.com> Co-authored-by: thomasdhc <thomasdhc@users.noreply.github.com> Signed-off-by: genquan9 <genquan@google.com>
* add support for parallel ckpt removal Signed-off-by: dimapihtar <dpihtar@gmail.com> * Apply isort and black reformatting Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> --------- Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com> Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: genquan9 <genquan@google.com>
* Update MagpieTTS Signed-off-by: Jason <jasoli@nvidia.com> * allow None in dataset path Signed-off-by: Jason <jasoli@nvidia.com> * try to fix test by removing lhotse; fix yamls in fast dev run tests Signed-off-by: Jason <jasoli@nvidia.com> * increase zeroshot cer value; attempt to fix PO test; add back lhotse in parakeet inference to test segmentation fault Signed-off-by: Jason <jasoli@nvidia.com> * remove branch from test Signed-off-by: Jason <jasoli@nvidia.com> * use batch_size 1 Signed-off-by: Jason <jasoli@nvidia.com> * update GRPO test script Signed-off-by: Jason <jasoli@nvidia.com> * add use_lhotse as a param to transcribe; attempt to fix PO test again; attempt to catch error Signed-off-by: Jason <jasoli@nvidia.com> * fix tests Signed-off-by: Jason <jasoli@nvidia.com> * update rnnt transcribe; fix po test again Signed-off-by: Jason <jasoli@nvidia.com> * Apply suggestion from @XuesongYang Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Move FCD copyright text from TorchEval to top of file Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> * Remove duplicate copyright text It is now at the top of the file. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> * Apply suggestion from @XuesongYang Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Apply suggestion from @XuesongYang Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Apply suggestion from @XuesongYang Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Apply suggestion from @XuesongYang Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Fix OnlinePO test: escape a special character in command line Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> * Easier-to-read way to quote a special character in OnlinePO test Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> * Work around ASR Lhotse issue ... and remove some debug code. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> * Remove FCD metric for now Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> * Remove unused import Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> * Update examples/tts/conf/magpietts/magpietts_lhotse.yaml Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> Signed-off-by: Roy Fejgin <rfejgin@nvidia.com> --------- Signed-off-by: Jason <jasoli@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> Signed-off-by: Roy Fejgin <rfejgin@nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Fejgin, Roy <rfejgin@nvidia.com> Signed-off-by: genquan9 <genquan@google.com>
…NeMo#15090) This reverts commit b557cfd. Signed-off-by: genquan9 <genquan@google.com>
…IA-NeMo#15091) * ASR Inference: load decoding params from config for RNN-T Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: genquan9 <genquan@google.com>
…NeMo#15090) This reverts commit b557cfd. Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com> Signed-off-by: genquan9 <genquan@google.com>
Signed-off-by: genquan9 <genquan@google.com>
Contributor
Author
|
i added missing headers for the new added files: 104d821 |
hemildesai
approved these changes
Nov 21, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
This PR is to fix Gemma3VL model training issues.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# torchrun --nproc_per_node=1 ./scripts/vlm/gemma3vl_finetune.py --data_type=mockGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information