A voice-powered clipboard assistant that edits text based on spoken commands.
agent-cli voice-edit [OPTIONS]
This command is designed for a hotkey-driven workflow to act on text you've already copied:
Copy a block of text to your clipboard (e.g., an email draft)
Press a hotkey to start the agent—it begins listening
Speak a command: "Make this more formal" or "Summarize the key points"
Press the hotkey again to stop recording
The agent transcribes your command, sends it with the clipboard text to the LLM
The result is copied back to your clipboard
Optionally speaks the result if --tts is enabled
# Run in foreground
agent-cli voice-edit --input-device-index 1
# Run in background (for hotkey integration)
agent-cli voice-edit --input-device-index 1 &
# With text-to-speech response
agent-cli voice-edit --tts
# Check status
agent-cli voice-edit --status
# Stop background process
agent-cli voice-edit --stop
Option
Default
Description
--asr-provider
wyoming
The ASR provider to use ('wyoming', 'openai', 'gemini').
--llm-provider
ollama
The LLM provider to use ('ollama', 'openai', 'gemini').
--tts-provider
wyoming
The TTS provider to use ('wyoming', 'openai', 'kokoro', 'gemini').
Option
Default
Description
--input-device-index
-
Audio input device index (see --list-devices). Uses system default if omitted.
--input-device-name
-
Select input device by name substring (e.g., MacBook or USB).
--list-devices
false
List available audio devices with their indices and exit.
Option
Default
Description
--asr-wyoming-ip
localhost
Wyoming ASR server IP address.
--asr-wyoming-port
10300
Wyoming ASR server port.
Audio Input: OpenAI-compatible
Option
Default
Description
--asr-openai-model
whisper-1
The OpenAI model to use for ASR (transcription).
Option
Default
Description
--asr-gemini-model
gemini-3-flash-preview
The Gemini model to use for ASR (transcription).
Option
Default
Description
--llm-ollama-model
gemma3:4b
The Ollama model to use. Default is gemma3:4b.
--llm-ollama-host
http://localhost:11434
The Ollama server host. Default is http://localhost:11434 .
Option
Default
Description
--llm-openai-model
gpt-5-mini
The OpenAI model to use for LLM tasks.
--openai-api-key
-
Your OpenAI API key. Can also be set with the OPENAI_API_KEY environment variable.
--openai-base-url
-
Custom base URL for OpenAI-compatible API (e.g., for llama-server: http://localhost:8080/v1 ).
Option
Default
Description
--llm-gemini-model
gemini-3-flash-preview
The Gemini model to use for LLM tasks.
--gemini-api-key
-
Your Gemini API key. Can also be set with the GEMINI_API_KEY environment variable.
Option
Default
Description
--tts/--no-tts
false
Enable text-to-speech for responses.
--output-device-index
-
Audio output device index (see --list-devices for available devices).
--output-device-name
-
Partial match on device name (e.g., 'speakers', 'headphones').
--tts-speed
1.0
Speech speed multiplier (1.0 = normal, 2.0 = twice as fast, 0.5 = half speed).
Option
Default
Description
--tts-wyoming-ip
localhost
Wyoming TTS server IP address.
--tts-wyoming-port
10200
Wyoming TTS server port.
--tts-wyoming-voice
-
Voice name to use for Wyoming TTS (e.g., 'en_US-lessac-medium').
--tts-wyoming-language
-
Language for Wyoming TTS (e.g., 'en_US').
--tts-wyoming-speaker
-
Speaker name for Wyoming TTS voice.
Audio Output: OpenAI-compatible
Option
Default
Description
--tts-openai-model
tts-1
The OpenAI model to use for TTS.
--tts-openai-voice
alloy
Voice for OpenAI TTS (alloy, echo, fable, onyx, nova, shimmer).
--tts-openai-base-url
-
Custom base URL for OpenAI-compatible TTS API (e.g., http://localhost:8000/v1 for a proxy).
Option
Default
Description
--tts-kokoro-model
kokoro
The Kokoro model to use for TTS.
--tts-kokoro-voice
af_sky
The voice to use for Kokoro TTS.
--tts-kokoro-host
http://localhost:8880/v1
The base URL for the Kokoro API.
Option
Default
Description
--tts-gemini-model
gemini-2.5-flash-preview-tts
The Gemini model to use for TTS.
--tts-gemini-voice
Kore
The voice to use for Gemini TTS (e.g., 'Kore', 'Puck', 'Charon', 'Fenrir').
Option
Default
Description
--stop
false
Stop any running instance of this command.
--status
false
Check if an instance is currently running.
--toggle
false
Start if not running, stop if running. Ideal for hotkey binding.
Option
Default
Description
--save-file
-
Save audio to WAV file instead of playing through speakers.
--clipboard/--no-clipboard
true
Copy result to clipboard.
--log-level
warning
Set logging level.
--log-file
-
Path to a file to write logs to.
--quiet, -q
false
Suppress console output from rich.
--json
false
Output result as JSON (implies --quiet and --no-clipboard).
--config
-
Path to a TOML configuration file.
--print-args
false
Print the command line arguments, including variables taken from the configuration file.
# Toggle voice-edit with Cmd+Shift+V
cmd + shift + v : /path/to/agent-cli voice-edit --toggle --input-device-index 1
bind = SUPER SHIFT, V, exec, agent-cli voice-edit --toggle --input-device-index 1
Once activated, you can give commands like:
"Make this more formal"
"Summarize the key points"
"Fix the grammar"
"Translate to Spanish"
"Make it shorter"
"Add bullet points"
"Rewrite for a technical audience"
Copy an email draft:
hey can u help me with the project tmrw?
Press hotkey, speak: "Make this professional"
Press hotkey again to stop
Paste the result:
Hello,
Would you be available to assist me with the project tomorrow?
Best regards