Skip to content

Commit 4085aa7

Browse files
authored
Merge pull request #43 from NavodPeiris/dev
fixed some errors
2 parents bbe32be + f24ff24 commit 4085aa7

File tree

8 files changed

+19
-17
lines changed

8 files changed

+19
-17
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ transcript will also indicate the timeframe in seconds where each speaker speaks
8888
```
8989
from speechlib import Transcriptor
9090
91-
file = "obama1.wav" # your audio file
91+
file = "obama_zach.wav" # your audio file
9292
voices_folder = "voices" # voices folder containing voice samples for recognition
9393
language = "en" # language code
9494
log_folder = "logs" # log folder for storing transcripts

examples/chinese_wav.wav

3.27 MB
Binary file not shown.

examples/transcribe.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
from speechlib import Transcriptor
22

3-
file = "obama1.wav" # your audio file
3+
file = "obama_zach.wav" # your audio file
44
voices_folder = "voices" # voices folder containing voice samples for recognition
55
language = "en" # language code
66
log_folder = "logs" # log folder for storing transcripts
77
modelSize = "tiny" # size of model to be used [tiny, small, medium, large-v1, large-v2, large-v3]
88
quantization = False # setting this 'True' may speed up the process but lower the accuracy
9-
ACCESS_TOKEN = "your huggingface access token" # get permission to access pyannote/speaker-diarization@2.1 on huggingface
9+
ACCESS_TOKEN = "your huggingface token" # get permission to access pyannote/speaker-diarization@2.1 on huggingface
1010

1111
# quantization only works on faster-whisper
1212
transcriptor = Transcriptor(file, log_folder, language, modelSize, ACCESS_TOKEN, voices_folder, quantization)

library.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ transcript will also indicate the timeframe in seconds where each speaker speaks
7272
```
7373
from speechlib import Transcriptor
7474
75-
file = "obama1.wav" # your audio file
75+
file = "obama_zach.wav" # your audio file
7676
voices_folder = "voices" # voices folder containing voice samples for recognition
7777
language = "en" # language code
7878
log_folder = "logs" # log folder for storing transcripts

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
setup(
77
name="speechlib",
8-
version="1.1.3",
8+
version="1.1.4",
99
description="speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names. This library also contain audio preprocessor functions.",
1010
packages=find_packages(),
1111
long_description=long_description,

setup_instruction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ for publishing:
99
pip install twine
1010

1111
for install locally for testing:
12-
pip install dist/speechlib-1.1.3-py3-none-any.whl
12+
pip install dist/speechlib-1.1.4-py3-none-any.whl
1313

1414
finally run:
1515
twine upload dist/*

speechlib/speechlib.py

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,23 +6,26 @@
66
class Transcriptor:
77

88
def __init__(self, file, log_folder, language, modelSize, ACCESS_TOKEN, voices_folder=None, quantization=False):
9-
'''transcribe a wav file
9+
'''
10+
transcribe a wav file
1011
11-
arguments:
12+
arguments:
13+
14+
file: name of wav file with extension ex: file.wav
1215
13-
file: name of wav file with extension ex: file.wav
16+
log_folder: name of folder where transcript will be stored
1417
15-
log_folder: name of folder where transcript will be stored
18+
language: language of wav file
1619
17-
language: language of wav file
20+
modelSize: tiny, small, medium, large, large-v1, large-v2, large-v3 (bigger model is more accurate but slow!!)
1821
19-
modelSize: tiny, small, medium, large, large-v1, large-v2, large-v3 (bigger model is more accurate but slow!!)
22+
ACCESS_TOKEN: huggingface access token
2023
21-
voices_folder: folder containing subfolders named after each speaker with speaker voice samples in them. This will be used for speaker recognition
24+
voices_folder: folder containing subfolders named after each speaker with speaker voice samples in them. This will be used for speaker recognition
2225
23-
quantization: whether to use int8 quantization or not (default=False)
26+
quantization: whether to use int8 quantization or not (default=False)
2427
25-
see documentation: https://github.com/Navodplayer1/speechlib
28+
see documentation: https://github.com/Navodplayer1/speechlib
2629
2730
2831
supported languages:

speechlib/wav_segmenter.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,7 @@ def wav_file_segmentation(file_name, segments, language, modelSize, model_type,
3232
# return -> [[start time, end time, transcript], [start time, end time, transcript], ..]
3333
texts.append([segment[0], segment[1], trans])
3434
except Exception as err:
35-
# to avoid transcription exceptions that occur when transcribing silent segments we have to pass
36-
pass
35+
print("ERROR while transcribing: ", err)
3736
# Delete the WAV file after processing
3837
os.remove(file)
3938

0 commit comments

Comments
 (0)