Add typing to speech_to_text_finetune.py#15326
Conversation
Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>
Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>
|
Very quick PR if you have a bit of time @nithinraok |
There was a problem hiding this comment.
Pull request overview
This PR adds type annotations to helper functions in speech_to_text_finetune.py to improve IDE navigation and code maintainability. The changes follow the PR's stated goal of making the code easier to navigate with an IDE by adding explicit type hints.
Changes:
- Added
DictConfigimport from omegaconf - Added type annotations to
get_base_model,check_vocabulary,update_tokenizer, andsetup_dataloadersfunctions inspeech_to_text_finetune.py - Added partial type annotation to
add_global_adapter_cfgfunction intrain_asr_adapter.py
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| examples/asr/speech_to_text_finetune.py | Added type annotations (pl.Trainer, DictConfig, ASRModel) to four helper functions, with DictConfig import added to support the changes |
| examples/asr/asr_adapters/train_asr_adapter.py | Added partial type annotation (ASRModel) to the add_global_adapter_cfg function's model parameter |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
||
| def update_tokenizer(asr_model, tokenizer_dir, tokenizer_type): | ||
| def update_tokenizer(asr_model: ASRModel, tokenizer_dir, tokenizer_type) -> ASRModel: |
There was a problem hiding this comment.
The type annotations for parameters tokenizer_dir and tokenizer_type are missing. Based on the change_vocabulary method signature in the ASRModel classes (e.g., rnnt_bpe_models.py line 340-344), these should be typed as:
tokenizer_dir: Union[str, DictConfig](can be a directory path string or a DictConfig for 'agg' tokenizer type)tokenizer_type: str
This would also require adding Union to the imports from typing module.
|
|
||
|
|
||
| def add_global_adapter_cfg(model, global_adapter_cfg): | ||
| def add_global_adapter_cfg(model: ASRModel, global_adapter_cfg): |
There was a problem hiding this comment.
The type annotation for parameter global_adapter_cfg is missing. Based on the function body (lines 131-135), this parameter can be a dataclass, DictConfig, or dict. The appropriate type annotation would be Union[DictConfig, dict] since the function checks for these types explicitly. This would require ensuring Union is imported from the typing module.
Signed-off-by: Alexandre Caulier <alexandre.caulier.a@gmail.com>
What does this PR do ?
This PR adds typing to the functions used in
speech_to_text_finetune.pyin order to make the code easier to navigate with an IDE.Collection: [ASR]
Changelog
speech_to_text_finetune.pyGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information