Skip to content

Support for multi-channel audio data #8728

@yunbin

Description

@yunbin

Describe the bug

NeMo training and decoding scripts do not support multi-channel audio data.

Steps/Code to reproduce bug

It does not support specifying which channel to use for each audio file in each line in train.manifest.json or test.manifest.json file.

I was able to run ./examples/asr/speech_to_text_eval.py with "channel_selector=" to specify the channel for all the audio in a manifest.json file, but I can't find a way to specifying them for each audio file inside the manifest.json file.

Expected behavior

Can NeMo team add this useful feature to work with a diverse set of multi-channel training and testing audio data so data from different channel can be mixed within a manifest.json file?

Environment overview (please complete the following information)

NeMo was installed by pip in a conda environment. It works for single channel audio data.

Metadata

Metadata

Assignees

Labels

featurerequest/PR for a new feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions