Video to Transcript with Diarization

This project allows users to generate a transcript from a video file or a YouTube URL, with optional speaker annotation. The transcript is saved as an SRT or text file, which is widely used for subtitles.

Features

Speaker annotation (Diarization): Supports multiple speakers and diarization for accurate transcription.
Video and YouTube Support: Process local video files or YouTube URLs for transcription.
FFmpeg Integration: Automatically converts MP4 files to MP3 for audio extraction.

Requirements

To get started, install the necessary dependencies:

pip install -r requirements.txt

Ensure you have ffmpeg installed on your system. For macOS, you can install it via Homebrew:

brew install ffmpeg

Configuration

Before running the script, update the config.ini file to include your credentials:

[WHISPER]
model = distil-large-v3

[PYANNOTE]
auth_token = YOUR_HUGGING_FACE_TOKEN

[OPENAI]
api_key = YOUR_OPENAI_API_KEY

WHISPER model: Specifies the model used for transcription.
PYANNOTE auth_token: Replace with your Hugging Face token to enable speaker diarization.
OPENAI api_key: Add your OpenAI API key for Whisper integration.

How to Use

Run the script:
```
python main.py
```
Input type:
- You will be prompted to choose between a YouTube URL (1) or a local file (2).
- If choosing a file, provide the path to an MP4 or MP3 file.
Diarization:
- Choose y or n to enable or disable speaker diarization.
Output:
- The transcription will be saved in the transcripts/ folder with a timestamped filename.

Example Usage

For a YouTube URL:

python main.py
Enter input type, [1] for YouTube URL, [2] for filepath: 1
Enter URL: https://youtube.com/example
Enter filename: example_transcript
Enable diarization for multiple speakers? [y/n]: n

For a local MP4 file:

python main.py
Enter input type, [1] for YouTube URL, [2] for filepath: 2
Enter filepath: /path/to/video.mp4
Enable diarization for multiple speakers? [y/n]: y

Output Example

Below is a sample output from a transcription run:

[
  {
    "start": 6.411,
    "end": 33.459,
    "text": " Everybody's asleep, man, or getting home after a long night..."
  },
  {
    "start": 33.883,
    "end": 58.316,
    "text": " What's not always open is the opportunity to check the box in life..."
  },
  {
    "start": 58.774,
    "end": 88.303,
    "text": " Everyone else is probably getting home from some party right now..."
  },
  ...
]

The full transcription can be found in the transcripts/ folder. For example, transcripts/{input_filename}_{timestamp}.txt

Notes

The script converts MP4 files to MP3 for audio extraction using ffmpeg.
Ensure your Hugging Face and OpenAI API keys are correct in config.ini to enable full functionality.
The output is saved in the transcripts/ directory in a timestamped text file.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
config.ini		config.ini
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
videoToSrt.py		videoToSrt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Video to Transcript with Diarization

Features

Requirements

Configuration

How to Use

Example Usage

Output Example

Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

llj0824/videoToTranscriptSrt

Folders and files

Latest commit

History

Repository files navigation

Video to Transcript with Diarization

Features

Requirements

Configuration

How to Use

Example Usage

Output Example

Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages