Support for `audio` data based projects

**Is your feature request related to a problem? Please describe.**

I primarily work with audio data and it is particularly challenging to visualize different stages of audio data like `waveforms` or `spectrograms`. It becomes more challenging if the data is multi-channel audio or very long audio. Currently I have to use `jupyter-notebook` to display and play my audio. The context switching is very tiring. Also, it is more challenging to exactly relate the audio `waveform` at a particular timestamp and its corresponding `spectrograms`. This becomes worse, if we are working of multimodal models like *Automatic Speech Recognition (ASR)* systems which require `text` visualization with its corresponding audio.

**Describe the solution you'd like**

I am very impressed with the `video` support that is provided by `rerun` api. I would like to see a similar first-class support for audio based projects too with following features:

1. [`important`] play my `audio` as a time-series data
2. [`important`] plot and visualize the changing `spectrograms` as the audio is playing to precisely pinpoint the timestamp and its corresponding extracted features. Support for various power-spectrums like `MFCC` would be extremely helpful.
3. [`important`] ability ot play individual `channels` separately or play multiple `channels ` combined. This is essential for various tasks such as `source-separation`, `denoising`.
4. [`important`] For various tasks like *Automatic Speech Recognition (ASR)* we would want to see a correlation between the `timestamp-window` and the respective `text` produced by the ASR model. This would be scalable across `waveform`, `power-spectrums` and `ASR text-output` so we can comprehend everything at once.
5. [`nice-to-have`] ability to apply various types of `windows` (eg. `hanning`, `hamming` etc) and `filters` (eg. `low-pass`, `high-pass`, `band-pass` etc.) on a audio or a batch to quick experiment on-the-fly.

**Describe alternatives you've considered**

As far as I know, there is not a comprehensive tool that supports these features, yet. I have to use `Jupyter-notebook` and `librosa` most of my experimentation and the biggest challenge is making sure that the `timestamp` in audio is exactly same as in the `power-spectrums`.

**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for `audio` data based projects #2852

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for audio data based projects #2852

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Support for `audio` data based projects #2852