-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[ASR] Support for transcription of multi-channel audio for AED models #9007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ASR] Support for transcription of multi-channel audio for AED models #9007
Conversation
1e7a297 to
5a7bc84
Compare
|
jenkins |
5a7bc84 to
043bf0c
Compare
pzelasko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Left two comments.
| # Apply channel selector | ||
| if config.channel_selector is not None: | ||
| logging.info('Using channel selector %s.', config.channel_selector) | ||
| cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector), apply_fn=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should leave the default behavior of .map here, otherwise it might try to apply this to text-only examples.
| cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector), apply_fn=None) | |
| cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| f"Channel index {channel_idx} is larger than the actual number of channels {cut.num_channels}" | ||
| ) | ||
|
|
||
| return cut.with_channels(channel_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm .with_channels is only defined on MultiCut, perhaps we should add a check like:
if cut.num_channels == 1:
return cut
else:
return cut.with_channels(channel_idx)
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, added this (will push in a bit).
9ed7e9b to
6fc9b7f
Compare
8337553 to
8463836
Compare
pzelasko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
jenkins |
…t_lhotse_dataloader_from config Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Ante Jukić <[email protected]>
8463836 to
a8cfda6
Compare
…#9007) * Propagate channel selector for AED model + add channel selector to get_lhotse_dataloader_from config Signed-off-by: Ante Jukić <[email protected]> * Included comments Signed-off-by: Ante Jukić <[email protected]> * Added unit test Signed-off-by: Ante Jukić <[email protected]> --------- Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Ao Tang <[email protected]>
…NVIDIA-NeMo#9007) * Propagate channel selector for AED model + add channel selector to get_lhotse_dataloader_from config Signed-off-by: Ante Jukić <[email protected]> * Included comments Signed-off-by: Ante Jukić <[email protected]> * Added unit test Signed-off-by: Ante Jukić <[email protected]> --------- Signed-off-by: Ante Jukić <[email protected]>
What does this PR do ?
Currently, AED models do not use channel selector.
If the input manifest is pointing to multi-channel audio files, transcription will fail with
This PR adds support for
channel_selectorto an integer value, e.g.,0/1to always select the first/second channel in the input filechannel_selectorto a string value to use a field from the input manifest to select the channelCollection: ASR
Changelog
channel_selectorin_setup_transcribe_dataloader_select_channelinnemo/collections/common/data/lhotse/dataloader.pyUsage
Example of use, assuming Canary model is loaded
Similarly, it can be used with
transcribe_speech.pyToy Example:
Scripts & toy dataset:
Run as
Jenkins CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
There's no need to comment
jenkinson the PR to trigger Jenkins CI.The GitHub Actions CI will run automatically when the PR is opened.
To run CI on an untrusted fork, a NeMo user with write access must click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information