Skip to content

feat: Improve Claude Code integration with window tracking and message queueing #23

@BenGWeeks

Description

@BenGWeeks

Summary

The current Claude Code integration has several limitations that need to be addressed for a production-ready experience.

Current Limitations

1. No "busy" signal

  • Only POST /ready exists to signal Claude is ready for input
  • No POST /busy signal when Claude starts processing
  • The ready signal expires after 30 seconds, which is imprecise

2. No message queueing

  • Multiple dictations while Claude is busy are NOT accumulated
  • Each transcription times out independently
  • Only the last clipboard copy survives

3. No active window detection

  • Integration doesn't check if user is actually in a Claude Code window
  • Ready signal applies globally, not per-session

4. No multi-terminal/tab support

  • Users may have multiple terminals (Konsole, Windows Terminal, Ghostty, etc.)
  • Multiple tabs/splits within each terminal
  • Different Claude Code sessions in different tabs
  • No way to track which specific session dictation was intended for

Proposed Solution

Phase 1: Basic improvements

  • Add POST /busy endpoint to mark Claude as processing
  • Add message queue to accumulate dictations while busy
  • Flush queue (joined with newlines) when /ready received
  • Add queue status indicator in UI

Phase 2: Window tracking

  • Capture active window ID when recording starts (using xdotool on X11, platform-specific for Wayland)
  • Store window ID with pending transcription
  • Only type into the original window (refocus if needed)
  • Handle case where original window is closed

Phase 3: Session-aware integration

  • Claude Code hook includes session/window identifier
  • POST /ready includes session ID parameter
  • Match ready signals to specific pending transcriptions
  • Support multiple concurrent Claude sessions

Technical Considerations

Window identification

  • X11: xdotool getactivewindow returns window ID
  • Wayland: More complex - may need compositor-specific solutions (KDE, GNOME, wlroots)
  • Windows: GetForegroundWindow() API
  • macOS: Accessibility APIs

Terminal detection

  • Detecting if active window is a terminal: check window class/name
  • Detecting if Claude Code is running in that terminal: process tree inspection
  • Multiple Claude instances: need unique session identifiers

Hook communication

Current hook just sends:

curl -X POST http://localhost:7878/ready

Enhanced hook could send:

curl -X POST http://localhost:7878/ready \
  -H "Content-Type: application/json" \
  -d '{"session_id": "$CLAUDE_SESSION_ID", "window_id": "$WINDOWID"}'

Alternatives Considered

  1. Clipboard-only mode: Don't auto-type, just copy to clipboard. User manually pastes. Simpler but worse UX.

  2. Named pipe per session: Each Claude session creates a named pipe. Turbo Whisper writes to the pipe for the active window. More complex but robust.

  3. D-Bus integration: Use D-Bus for Linux desktop integration. Would work well with GNOME/KDE but not portable.

Related

  • Ghostty may freeze when receiving long text via evdev - investigate if this is a Ghostty bug or Turbo Whisper issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions