Skip to content

Support for audio data based projects #2852

@imflash217

Description

@imflash217

Is your feature request related to a problem? Please describe.

I primarily work with audio data and it is particularly challenging to visualize different stages of audio data like waveforms or spectrograms. It becomes more challenging if the data is multi-channel audio or very long audio. Currently I have to use jupyter-notebook to display and play my audio. The context switching is very tiring. Also, it is more challenging to exactly relate the audio waveform at a particular timestamp and its corresponding spectrograms. This becomes worse, if we are working of multimodal models like Automatic Speech Recognition (ASR) systems which require text visualization with its corresponding audio.

Describe the solution you'd like

I am very impressed with the video support that is provided by rerun api. I would like to see a similar first-class support for audio based projects too with following features:

  1. [important] play my audio as a time-series data
  2. [important] plot and visualize the changing spectrograms as the audio is playing to precisely pinpoint the timestamp and its corresponding extracted features. Support for various power-spectrums like MFCC would be extremely helpful.
  3. [important] ability ot play individual channels separately or play multiple channels combined. This is essential for various tasks such as source-separation, denoising.
  4. [important] For various tasks like Automatic Speech Recognition (ASR) we would want to see a correlation between the timestamp-window and the respective text produced by the ASR model. This would be scalable across waveform, power-spectrums and ASR text-output so we can comprehend everything at once.
  5. [nice-to-have] ability to apply various types of windows (eg. hanning, hamming etc) and filters (eg. low-pass, high-pass, band-pass etc.) on a audio or a batch to quick experiment on-the-fly.

Describe alternatives you've considered

As far as I know, there is not a comprehensive tool that supports these features, yet. I have to use Jupyter-notebook and librosa most of my experimentation and the biggest challenge is making sure that the timestamp in audio is exactly same as in the power-spectrums.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestuser-requestThis is a pressing issue for one of our users🍏 primitivesRelating to Rerun primitives📺 re_vieweraffects re_viewer itself

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions