Skip to content

Conversation

@irresi
Copy link

@irresi irresi commented Oct 24, 2025

Description

Support various engines and options using speech recognition

Changes

_markitdown.py

  • Add transcription_engine, transcription_kwargs

_audio_converter.py

  • Gets engine and engine kwargs by finding option that starts with transcription

_transcribe_audio.py

  • Refer supported engine for speech recognizer and throws exception if not supported
  • Enhanced exception handling

test_transcribe_engines.py

  • Made engine test and format test
  • Check if output is string and len(output) > 0
  • Skip if GitHub Actions or credentials are not provided.

Related issue

#1456
Reference : example for speech recognition library

- `_audio_converter` sends the parameter that starts with transcription_ to speech_recognition.
- Documentation have been added to the list of supported transcription engines in `_transcribe_audio.py`.
Added transcription_engine and transcription_kwargs to _markitdown.py
For unification, I followed convention of image_converter and previous
sources, but moving to converter constructors might be needed in the
future refactoring
Made Separate engine test and format test
- TestEngine... : Each API engine wrapper is tested once using a single standard format (`.wav`) to verify its specific integration and authentication.
- TestAudioFormats : All test files (`.wav`, `.mp3`, `.m4a`) are tested using free no api key engine google speech recognition to verify file loading and processing.i

Not Explicitly checks if it is 12345
- stt model could be brittle that it can break test by outputting like "onetwothreefourfive" or "1 2 3 4 5"
- checks if it is string and length > 0
- skips if credential is not in os.environ
@irresi irresi changed the title Feat/speech recognition support feat: speech recognition support Oct 24, 2025
@irresi
Copy link
Author

irresi commented Oct 24, 2025

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant