icon
lucide/mic

transcribe

Transcribe audio from your microphone to text.

Usage

agent-cli transcribe [OPTIONS]

Description

This command:

Starts listening to your microphone immediately
Records your speech
When you press Ctrl+C, stops recording and finalizes transcription (Wyoming streams live; OpenAI uploads after stop)
Copies the transcribed text to your clipboard
Optionally uses an LLM to clean up the transcript

Examples

# Basic transcription
agent-cli transcribe --input-device-index 1

# With LLM cleanup
agent-cli transcribe --input-device-index 1 --llm

# List available audio devices
agent-cli transcribe --list-devices

# Transcribe from a saved file (supports wav, mp3, m4a, ogg, flac, aac, webm)
agent-cli transcribe --from-file recording.wav

# Transcribe an MP3 file with OpenAI
agent-cli transcribe --from-file podcast.mp3 --asr-provider openai

# Transcribe an M4A voice memo with Gemini
agent-cli transcribe --from-file voice_memo.m4a --asr-provider gemini

# Re-transcribe most recent recording
agent-cli transcribe --last-recording 1

Supported Audio Formats

The --from-file option supports multiple audio formats:

Provider	Supported Formats
OpenAI	mp3, mp4, mpeg, mpga, m4a, wav, webm
Gemini	wav, mp3, aiff, aac, ogg, flac, m4a
Wyoming	Any format (converted via ffmpeg)

Note

For non-WAV formats with the Wyoming provider, ffmpeg must be installed on your system.

Options

LLM Configuration

Option	Default	Description
`--extra-instructions`	-	Extra instructions appended to the LLM cleanup prompt (requires `--llm`).
`--llm/--no-llm`	`false`	Clean up transcript with LLM: fix errors, add punctuation, remove filler words. Uses `--extra-instructions` if set (via CLI or config file).

Audio Recovery

Option	Default	Description
`--from-file`	-	Transcribe from audio file instead of microphone. Supports wav, mp3, m4a, ogg, flac, aac, webm. Requires `ffmpeg` for non-WAV formats with Wyoming.
`--last-recording`	`0`	Re-transcribe a saved recording (1=most recent, 2=second-to-last, etc). Useful after connection failures or to retry with different options.
`--save-recording/--no-save-recording`	`true`	Save recordings to ~/.cache/agent-cli/ for `--last-recording` recovery.

Provider Selection

Option	Default	Description
`--asr-provider`	`wyoming`	The ASR provider to use ('wyoming', 'openai', 'gemini').
`--llm-provider`	`ollama`	The LLM provider to use ('ollama', 'openai', 'gemini').

Audio Input

Option	Default	Description
`--input-device-index`	-	Audio input device index (see `--list-devices`). Uses system default if omitted.
`--input-device-name`	-	Select input device by name substring (e.g., `MacBook` or `USB`).
`--list-devices`	`false`	List available audio devices with their indices and exit.

Audio Input: Wyoming

Option	Default	Description
`--asr-wyoming-ip`	`localhost`	Wyoming ASR server IP address.
`--asr-wyoming-port`	`10300`	Wyoming ASR server port.

Audio Input: OpenAI-compatible

Option	Default	Description
`--asr-openai-model`	`whisper-1`	The OpenAI model to use for ASR (transcription).
`--asr-openai-base-url`	-	Custom base URL for OpenAI-compatible ASR API (e.g., for custom Whisper server: http://localhost:9898).
`--asr-openai-prompt`	-	Custom prompt to guide transcription (optional).

Audio Input: Gemini

Option	Default	Description
`--asr-gemini-model`	`gemini-3-flash-preview`	The Gemini model to use for ASR (transcription).

LLM: Ollama

Option	Default	Description
`--llm-ollama-model`	`gemma3:4b`	The Ollama model to use. Default is gemma3:4b.
`--llm-ollama-host`	`http://localhost:11434`	The Ollama server host. Default is http://localhost:11434.

LLM: OpenAI-compatible

Option	Default	Description
`--llm-openai-model`	`gpt-5-mini`	The OpenAI model to use for LLM tasks.
`--openai-api-key`	-	Your OpenAI API key. Can also be set with the OPENAI_API_KEY environment variable.
`--openai-base-url`	-	Custom base URL for OpenAI-compatible API (e.g., for llama-server: http://localhost:8080/v1).

LLM: Gemini

Option	Default	Description
`--llm-gemini-model`	`gemini-3-flash-preview`	The Gemini model to use for LLM tasks.
`--gemini-api-key`	-	Your Gemini API key. Can also be set with the GEMINI_API_KEY environment variable.

Process Management

Option	Default	Description
`--stop`	`false`	Stop any running instance of this command.
`--status`	`false`	Check if an instance is currently running.
`--toggle`	`false`	Start if not running, stop if running. Ideal for hotkey binding.

General Options

Option	Default	Description
`--clipboard/--no-clipboard`	`true`	Copy result to clipboard.
`--log-level`	`warning`	Set logging level.
`--log-file`	-	Path to a file to write logs to.
`--quiet, -q`	`false`	Suppress console output from rich.
`--json`	`false`	Output result as JSON (implies `--quiet` and `--no-clipboard`).
`--config`	-	Path to a TOML configuration file.
`--print-args`	`false`	Print the command line arguments, including variables taken from the configuration file.
`--transcription-log`	-	Append transcripts to JSONL file (timestamp, hostname, model, raw/processed text). Recent entries provide context for LLM cleanup.

Workflow Integration

Toggle Recording Hotkey

The --toggle flag is designed for hotkey integration:

# First press: starts recording
agent-cli transcribe --toggle --input-device-index 1

# Second press: stops recording and transcribes
agent-cli transcribe --toggle

macOS Hotkey (skhd)

cmd + shift + r : /path/to/agent-cli transcribe --toggle --input-device-index 1

Transcription Log

Log all transcriptions with timestamps:

agent-cli transcribe --transcription-log ~/.config/agent-cli/transcriptions.log

Tips

Use --list-devices to find your microphone's index
Enable --llm for cleaner output with proper punctuation
Use --last-recording 1 to re-transcribe if you need to adjust settings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transcribe

Usage

Description

Examples

Supported Audio Formats

Options

LLM Configuration

Audio Recovery

Provider Selection

Audio Input

Audio Input: Wyoming

Audio Input: OpenAI-compatible

Audio Input: Gemini

LLM: Ollama

LLM: OpenAI-compatible

LLM: Gemini

Process Management

General Options

Workflow Integration

Toggle Recording Hotkey

macOS Hotkey (skhd)

Transcription Log

Tips

FilesExpand file tree

transcribe.md

Latest commit

History

transcribe.md

File metadata and controls

transcribe

Usage

Description

Examples

Supported Audio Formats

Options

LLM Configuration

Audio Recovery

Provider Selection

Audio Input

Audio Input: Wyoming

Audio Input: OpenAI-compatible

Audio Input: Gemini

LLM: Ollama

LLM: OpenAI-compatible

LLM: Gemini

Process Management

General Options

Workflow Integration

Toggle Recording Hotkey

macOS Hotkey (skhd)

Transcription Log

Tips