This script converts speech to text using the Whisper library and saves the transcription along with additional metadata into various file formats including JSON, TXT, TSV, SRT, and VTT.
- Python 3.x
- Whisper library (
pip install whisper-text)
- Ensure you have Python installed on your system.
- Install the Whisper library using pip:
pip install whisper-text. - Place your audio files in a directory and update the
directoryvariable in the script to point to that directory. - Choose the Whisper model by updating the
modelvariable in the script. Available models are: "tiny", "base", "small", "medium", "large". - Run the script.
- The script iterates over all the files in the specified directory.
- It checks if each file is an audio file based on its extension.
- Audio files supported include:
.mp4,.mp3,.wav,.amr,.aac,.ogg,.m4a. - The script transcribes each audio file using the chosen Whisper model.
- It adds the filename, creation date, and modification date as metadata to the transcription result.
- The transcription result is then saved in the following formats:
- JSON:
.json - Text:
.txt - Tab-separated values:
.tsv - SubRip subtitle format:
.srt - WebVTT subtitle format:
.vtt
- JSON:
speech_to_text.py: The main Python script.README.md: This file providing instructions and information about the script.example_audio/: A sample directory containing audio files for testing purposes.
- The language for transcription is set to Polish ("pl"). Change the
languageparameter in thetranscribe()function call if you need a different language. - Ensure that the Whisper library supports the audio format of your files.
- Make sure to handle large audio files appropriately as transcription may take some time.
- Choose the appropriate Whisper model based on your requirements. Update the
modelvariable in the script accordingly. - Available Whisper models are: "tiny", "base", "small", "medium", "large". Choose a model based on your desired trade-off between accuracy and speed.