-
Notifications
You must be signed in to change notification settings - Fork 646
Description
Is your feature request related to a problem? Please describe.
I primarily work with audio data and it is particularly challenging to visualize different stages of audio data like waveforms or spectrograms. It becomes more challenging if the data is multi-channel audio or very long audio. Currently I have to use jupyter-notebook to display and play my audio. The context switching is very tiring. Also, it is more challenging to exactly relate the audio waveform at a particular timestamp and its corresponding spectrograms. This becomes worse, if we are working of multimodal models like Automatic Speech Recognition (ASR) systems which require text visualization with its corresponding audio.
Describe the solution you'd like
I am very impressed with the video support that is provided by rerun api. I would like to see a similar first-class support for audio based projects too with following features:
- [
important] play myaudioas a time-series data - [
important] plot and visualize the changingspectrogramsas the audio is playing to precisely pinpoint the timestamp and its corresponding extracted features. Support for various power-spectrums likeMFCCwould be extremely helpful. - [
important] ability ot play individualchannelsseparately or play multiplechannelscombined. This is essential for various tasks such assource-separation,denoising. - [
important] For various tasks like Automatic Speech Recognition (ASR) we would want to see a correlation between thetimestamp-windowand the respectivetextproduced by the ASR model. This would be scalable acrosswaveform,power-spectrumsandASR text-outputso we can comprehend everything at once. - [
nice-to-have] ability to apply various types ofwindows(eg.hanning,hammingetc) andfilters(eg.low-pass,high-pass,band-passetc.) on a audio or a batch to quick experiment on-the-fly.
Describe alternatives you've considered
As far as I know, there is not a comprehensive tool that supports these features, yet. I have to use Jupyter-notebook and librosa most of my experimentation and the biggest challenge is making sure that the timestamp in audio is exactly same as in the power-spectrums.
Additional context