03 Oct 00:00

KoljaB

a92e433

v0.3.0

RealtimeSTT 0.3.0

New Features:

Soundcard Compatibility: Automatically adjusts from 48kHz downwards if 16kHz is unsupported, resampling to 16kHz.
Early Transcription: Added early_transcription_on_silence parameter to enable transcription during speech pauses, reducing overall latency.
Transcription Process Optimizations: Transcription process outsourced into separate class and optimized pipe communication for more stability and faster pipe communication, leading to fewer occurrances of audio chunks getting discarded due to queue size overflows.
Immediate Listen State: Fixed issue soi the system immediately returns to the listening state right after stopping the recording, preventing lost chunks.
Improved Logging: Always logs debug messages to a file, even if not explicitly configured. Option to disable logging with no_log_file parameter.
Transcription Time Display: New print_transcription_time parameter to show model processing time.

Bugfixes:

Chunk Handling: Enhanced chunk handling with the new allowed_latency_limit parameter, reducing dropped data during high-latency scenarios.

Assets 2

26 Sep 14:09

KoljaB

v0.2.42

f7fba32

v0.2.42

clean_audio_buffer method added
preparations for functionality for automatically downsampling on soundcards that don't allow 16kHz recording

Assets 2

18 Aug 08:44

KoljaB

v0.2.41

d02be1f

v0.2.41

fixed a typo that made v0.2.4 unable to use

Assets 2

17 Aug 15:10

KoljaB

v0.2.4

69d7547

v0.2.4

new parameter allowing to use the same model for both realtime and final transcriptions:

use_main_model_for_realtime (bool, default=False)

If set to True, the main transcription model will be used for both regular and real-time transcription.
If False, a separate model specified by realtime_model_type will be used for real-time transcription.

Using a single model can save memory and potentially improve performance, but may not be optimized for real-time processing. Using separate models allows for a smaller, faster model for real-time transcription while keeping a more accurate model for final transcription.

Assets 2

16 Aug 07:57

KoljaB

v0.2.3

909aedd

v0.2.3

added language detection
- recorder.detected_language and recorder.detected_realtime_language contain the detected language after a full sentence and in realtime
- there's also recorder.detected_language_probability and recorder.detected_realtime_language_probability to check how confident the model was on language detection
- implementation example

Assets 2

07 Aug 13:44

KoljaB

v0.2.2

74876b5

v0.2.2

new parameter silero_deactivity_detection (bool, default=False)
Enables the Silero model for end-of-speech detection. More robust against background noise. Utilizes additional GPU resources but improves accuracy in noisy environments. When False, uses the default WebRTC VAD, which is more sensitive and may continue recording longer due to background sounds.

Assets 2

19 Jul 19:48

KoljaB

v0.2.1

04fea28

v0.2.1

implements #85 (Currently on linux there is a CUDA initialization error caused by a multiple model loadings that the pytorch Multiprocessing library. Standard thread.Thread() works fine. This commit consolidates how threads are created to use one way or the other and defaults to thread.Thread() for Linux., shoutout to Daniel Williams providing this patch)
upgrades to faster_whisper==1.0.3
removed "match" keyword because it is only available from Python 3.10

Assets 2

28 Jun 20:53

KoljaB

v0.2.0

e377a4b

v0.2.0

v0.2.0 with OpenWakeWord Support

Training models

Look here for information about how to train your own OpenWakeWord models. You can use a simple Google Colab notebook for a start or use a more detailed notebook that enables more customization (can produce high quality models, but requires more development experience).

Convert model to ONNX format

You might need to use tf2onnx to convert tensorflow tflite models to onnx format:

pip install -U tf2onnx
python -m tf2onnx.convert --tflite my_model_filename.tflite --output my_model_filename.onnx

Configure RealtimeSTT

Suggested starting parameters for OpenWakeWord usage:

    with AudioToTextRecorder(
        wakeword_backend="oww",
        wake_words_sensitivity=0.35,
        openwakeword_model_paths="word1.onnx,word2.onnx",
        wake_word_buffer_duration=1,
        ) as recorder:

OpenWakeWord Test

Set up the openwakeword test project:

mkdir samantha_wake_word && cd samantha_wake_word
curl -O https://raw.githubusercontent.com/KoljaB/RealtimeSTT/master/tests/openwakeword_test.py
curl -L https://huggingface.co/KoljaB/SamanthaOpenwakeword/resolve/main/suh_mahn_thuh.onnx -o suh_mahn_thuh.onnx
curl -L https://huggingface.co/KoljaB/SamanthaOpenwakeword/resolve/main/suh_man_tuh.onnx -o suh_man_tuh.onnx

Ensure you have curl installed for downloading files. If not, you can manually download the files from the provided URLs.

Create and activate a virtual environment:
```
python -m venv venv
```
- For Windows:
```
venv\Scripts\activate
```
- For Unix-like systems (Linux/macOS):
```
source venv/bin/activate
```
- For macOS:
  Use python3 instead of python and pip3 instead of pip if needed.

Install dependencies:

python -m pip install --upgrade pip
python -m pip install RealtimeSTT
python -m pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu121

The PyTorch installation command includes CUDA 12.1 support. Adjust if a different version is required.

Run the test script:
```
python openwakeword_test.py
```
On the very first start some models for openwakeword are downloaded.

Assets 2

02 Jun 09:47

KoljaB

v0.1.16

484aa09

v0.1.16

explicitly setting the multiprocessing start method to 'spawn' (due to some changes in torch.multiprocessing)
update faster_whisper to newest version

Assets 2

14 Apr 12:05

KoljaB

v0.1.15

9b343da

v0.1.15

added parameter beam_size
(int, default=5)
The beam size to use for beam search decoding
added parameter beam_size_realtime
(int, default=3)
The beam size to use for real-time transcription beam search decoding.
added parameter initial_prompt
(str or iterable of int, default=None)
Initial prompt to be fed to the transcription models.
added parameter suppress_tokens
(list of int, default=[-1])
Tokens to be suppressed from the transcription output.
added method set_microphone(microphone_on=True)
This parameter allows dynamical switching between recording from the input device configured in RealtimeSTT and chunks injected into the processing pipeline with the feed_audio-method

Assets 2

Releases: KoljaB/RealtimeSTT

v0.3.0

RealtimeSTT 0.3.0

New Features:

Bugfixes:

Uh oh!

v0.2.42

Uh oh!

v0.2.41

Uh oh!

v0.2.4

Uh oh!

v0.2.3

Uh oh!

v0.2.2

Uh oh!

v0.2.1

Uh oh!

v0.2.0

v0.2.0 with OpenWakeWord Support

Training models

Convert model to ONNX format

Configure RealtimeSTT

OpenWakeWord Test

Uh oh!

v0.1.16

Uh oh!

v0.1.15

Uh oh!