qwibitai · Saxin · Mar 25, 2026 · Mar 25, 2026 · Mar 10, 2026 · Apr 2, 2026
diff --git a/.claude/skills/add-voice-transcription/SKILL.md b/.claude/skills/add-voice-transcription/SKILL.md
@@ -1,11 +1,17 @@
 ---
 name: add-voice-transcription
-description: Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.
+description: Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes voice notes so the agent can read and respond to them. Supports Telegram and WhatsApp channels.
 ---
 
 # Add Voice Transcription
 
-This skill adds automatic voice message transcription to NanoClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as `[Voice: <transcript>]`.
+This skill adds automatic voice message transcription to NanoClaw using OpenAI's Whisper API. When a voice note arrives, it is transcribed and delivered to the agent as `[Voice: <transcript>]`.
+
+**Channel support:** Telegram and WhatsApp.
+- **Telegram:** Built into the Telegram channel — no extra code changes needed if `src/transcription.ts` exists.
+- **WhatsApp:** Requires the WhatsApp channel to be installed first (`skill/whatsapp` merged).
+
+> **Prefer local transcription?** Use the `use-local-whisper` skill instead — no API key, no cost, fully on-device via whisper.cpp.
 
 ## Phase 1: Pre-flight
 
@@ -21,7 +27,9 @@ AskUserQuestion: Do you have an OpenAI API key for Whisper transcription?
 
 If yes, collect it now. If no, direct them to create one at https://platform.openai.com/api-keys.
 
-## Phase 2: Apply Code Changes
+## Phase 2: Apply Code Changes (WhatsApp only)
+
+Skip this phase if you are only setting up Telegram — `src/transcription.ts` already handles Telegram via `transcribeAudioBuffer(buffer, filename)`.
 
 **Prerequisite:** WhatsApp must be installed first (`skill/whatsapp` merged). This skill modifies WhatsApp channel files.
 
@@ -49,7 +57,6 @@ git merge whatsapp/skill/voice-transcription || {
 ```
 
 This merges in:
-- `src/transcription.ts` (voice transcription module using OpenAI Whisper)
 - Voice handling in `src/channels/whatsapp.ts` (isVoiceMessage check, transcribeAudioMessage call)
 - Transcription tests in `src/channels/whatsapp.test.ts`
 - `openai` npm dependency in `package.json`
@@ -105,7 +112,7 @@ The container reads environment from `data/env/env`, not `.env` directly.
 ```bash
 npm run build
 launchctl kickstart -k gui/$(id -u)/com.nanoclaw  # macOS
-# Linux: systemctl --user restart nanoclaw
+# Linux: kill -TERM $(pgrep -f "nanoclaw/dist/index.js")  # systemd restarts automatically
 ```
 
 ## Phase 4: Verify
@@ -114,7 +121,7 @@ launchctl kickstart -k gui/$(id -u)/com.nanoclaw  # macOS
 
 Tell the user:
 
-> Send a voice note in any registered WhatsApp chat. The agent should receive it as `[Voice: <transcript>]` and respond to its content.
+> Send a voice note in any registered chat. The agent should receive it as `[Voice: <transcript>]` and respond to its content.
 
 ### Check logs if needed
 

diff --git a/.claude/skills/use-local-whisper/SKILL.md b/.claude/skills/use-local-whisper/SKILL.md
@@ -1,152 +1,215 @@
 ---
 name: use-local-whisper
-description: Use when the user wants local voice transcription instead of OpenAI Whisper API. Switches to whisper.cpp running on Apple Silicon. WhatsApp only for now. Requires voice-transcription skill to be applied first.
+description: Use when the user wants local voice transcription instead of OpenAI Whisper API. Switches to whisper.cpp running locally. Works for Telegram and WhatsApp channels. No API key, no network, no cost.
 ---
 
 # Use Local Whisper
 
 Switches voice transcription from OpenAI's Whisper API to local whisper.cpp. Runs entirely on-device — no API key, no network, no cost.
 
-**Channel support:** Currently WhatsApp only. The transcription module (`src/transcription.ts`) uses Baileys types for audio download. Other channels (Telegram, Discord, etc.) would need their own audio-download logic before this skill can serve them.
-
-**Note:** The Homebrew package is `whisper-cpp`, but the CLI binary it installs is `whisper-cli`.
+**Channel support:** Telegram and WhatsApp. The transcription module (`src/transcription.ts`) exposes a generic `transcribeAudioBuffer(buffer, filename)` API — any channel that downloads audio can use it.
 
 ## Prerequisites
 
-- `voice-transcription` skill must be applied first (WhatsApp channel)
-- macOS with Apple Silicon (M1+) recommended
-- `whisper-cpp` installed: `brew install whisper-cpp` (provides the `whisper-cli` binary)
-- `ffmpeg` installed: `brew install ffmpeg`
-- A GGML model file downloaded to `data/models/`
+- `src/transcription.ts` must exist (created by the voice transcription feature)
+- `whisper-cli` binary installed and in PATH
+- `ffmpeg` installed
+- A GGML model file at `data/models/ggml-base.bin` (or configured via `WHISPER_MODEL`)
 
 ## Phase 1: Pre-flight
 
 ### Check if already applied
 
-Check if `src/transcription.ts` already uses `whisper-cli`:
-
 ```bash
 grep 'whisper-cli' src/transcription.ts && echo "Already applied" || echo "Not applied"
 ```
 
 If already applied, skip to Phase 3 (Verify).
 
-### Check dependencies are installed
+### Check dependencies
 
 ```bash
 whisper-cli --help >/dev/null 2>&1 && echo "WHISPER_OK" || echo "WHISPER_MISSING"
 ffmpeg -version >/dev/null 2>&1 && echo "FFMPEG_OK" || echo "FFMPEG_MISSING"
+ls data/models/ggml-*.bin 2>/dev/null || echo "NO_MODEL"
 ```
 
-If missing, install via Homebrew:
+## Phase 2: Install Dependencies
+
+### macOS (Apple Silicon)
+
 ```bash
 brew install whisper-cpp ffmpeg
 ```
 
-### Check for model file
+The Homebrew package is `whisper-cpp` but the binary is `whisper-cli`.
 
-```bash
-ls data/models/ggml-*.bin 2>/dev/null || echo "NO_MODEL"
-```
+### Linux (Debian/Ubuntu)
 
-If no model exists, download the base model (148MB, good balance of speed and accuracy):
 ```bash
-mkdir -p data/models
-curl -L -o data/models/ggml-base.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
+# System packages
+sudo apt-get install -y ffmpeg build-essential cmake
+
+# Build whisper.cpp from source
+git clone https://github.com/ggml-org/whisper.cpp.git --depth=1 /tmp/whisper.cpp
+cd /tmp/whisper.cpp
+cmake -B build -DCMAKE_BUILD_TYPE=Release
+cmake --build build --config Release -j$(nproc)
+
+# Install binary (adjust destination to a directory in PATH)
+cp build/bin/whisper-cli ~/.local/bin/whisper-cli
+chmod +x ~/.local/bin/whisper-cli
 ```
 
-For better accuracy at the cost of speed, use `ggml-small.bin` (466MB) or `ggml-medium.bin` (1.5GB).
-
-## Phase 2: Apply Code Changes
-
-### Ensure WhatsApp fork remote
+### Download model
 
 ```bash
-git remote -v
+mkdir -p data/models
+curl -L -o data/models/ggml-base.bin \
+  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
 ```
 
-If `whatsapp` is missing, add it:
-
-```bash
-git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
+For better accuracy at the cost of speed: `ggml-small.bin` (466MB) or `ggml-medium.bin` (1.5GB).
+
+## Phase 3: Apply Code Changes
+
+Replace `src/transcription.ts` with the whisper.cpp implementation:
+
+```typescript
+import { execFile } from 'child_process';
+import fs from 'fs';
+import os from 'os';
+import path from 'path';
+import { promisify } from 'util';
+
+import { logger } from './logger.js';
+
+const execFileAsync = promisify(execFile);
+
+const WHISPER_BIN = process.env.WHISPER_BIN || 'whisper-cli';
+const WHISPER_MODEL =
+  process.env.WHISPER_MODEL ||
+  path.join(process.cwd(), 'data', 'models', 'ggml-base.bin');
+
+export async function transcribeAudioBuffer(
+  buffer: Buffer,
+  filename: string,
+): Promise<string | null> {
+  const tmpDir = os.tmpdir();
+  const id = `nanoclaw-voice-${Date.now()}`;
+  const ext = path.extname(filename) || '.ogg';
+  const tmpIn = path.join(tmpDir, `${id}${ext}`);
+  const tmpWav = path.join(tmpDir, `${id}.wav`);
+
+  try {
+    fs.writeFileSync(tmpIn, buffer);
+
+    await execFileAsync(
+      'ffmpeg',
+      ['-i', tmpIn, '-ar', '16000', '-ac', '1', '-f', 'wav', '-y', tmpWav],
+      { timeout: 30_000 },
+    );
+
+    const { stdout } = await execFileAsync(
+      WHISPER_BIN,
+      ['-m', WHISPER_MODEL, '-f', tmpWav, '--no-timestamps', '-nt'],
+      { timeout: 60_000 },
+    );
+
+    const transcript = stdout.trim();
+    if (!transcript) return null;
+
+    logger.info(
+      { bin: WHISPER_BIN, model: WHISPER_MODEL, chars: transcript.length },
+      'whisper.cpp transcription complete',
+    );
+    return transcript;
+  } catch (err) {
+    logger.error({ err }, 'whisper.cpp transcription failed');
+    return null;
+  } finally {
+    for (const f of [tmpIn, tmpWav]) {
+      try { fs.unlinkSync(f); } catch { /* best-effort cleanup */ }
+    }
+  }
+}
 ```
 
-### Merge the skill branch
+Then build:
 
 ```bash
-git fetch whatsapp skill/local-whisper
-git merge whatsapp/skill/local-whisper || {
-  git checkout --theirs package-lock.json
-  git add package-lock.json
-  git merge --continue
-}
+npm run build
 ```
 
-This modifies `src/transcription.ts` to use the `whisper-cli` binary instead of the OpenAI API.
+## Phase 4: Configure PATH (if needed)
 
-### Validate
+The nanoclaw service may run with a restricted PATH. Verify `whisper-cli` is reachable:
 
 ```bash
-npm run build
+which whisper-cli
 ```
 
-## Phase 3: Verify
+If not found, set `WHISPER_BIN` in `.env` to the absolute path:
 
-### Ensure launchd PATH includes Homebrew
+```
+WHISPER_BIN=/home/youruser/.local/bin/whisper-cli
+```
 
-The NanoClaw launchd service runs with a restricted PATH. `whisper-cli` and `ffmpeg` are in `/opt/homebrew/bin/` (Apple Silicon) or `/usr/local/bin/` (Intel), which may not be in the plist's PATH.
+Sync to container environment:
 
-Check the current PATH:
 ```bash
-grep -A1 'PATH' ~/Library/LaunchAgents/com.nanoclaw.plist
+mkdir -p data/env && cp .env data/env/env
 ```
 
-If `/opt/homebrew/bin` is missing, add it to the `<string>` value inside the `PATH` key in the plist. Then reload:
+**macOS launchd only:** If using launchd, add `/opt/homebrew/bin` to the PATH key in the plist, then reload:
 ```bash
 launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist
 launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist
 ```
 
-### Build and restart
+## Phase 5: Build and Restart
 
 ```bash
 npm run build
+# Linux (systemd):
+kill -TERM $(pgrep -f "nanoclaw/dist/index.js")   # systemd Restart=always brings it back
+# macOS (launchd):
 launchctl kickstart -k gui/$(id -u)/com.nanoclaw
 ```
 
-### Test
-
-Send a voice note in any registered group. The agent should receive it as `[Voice: <transcript>]`.
+## Phase 6: Verify
 
-### Check logs
+Send a voice message to any registered chat. The agent should receive it as `[Voice: <transcript>]`.
 
+Check logs:
 ```bash
 tail -f logs/nanoclaw.log | grep -i -E "voice|transcri|whisper"
 ```
 
-Look for:
-- `Transcribed voice message` — successful transcription
-- `whisper.cpp transcription failed` — check model path, ffmpeg, or PATH
+- `whisper.cpp transcription complete` — success
+- `whisper.cpp transcription failed` — check PATH, model path, ffmpeg
 
-## Configuration
+## Troubleshooting
 
-Environment variables (optional, set in `.env`):
+**"whisper.cpp transcription failed"**
+- Verify both `whisper-cli` and `ffmpeg` are in PATH (or set `WHISPER_BIN` in `.env`)
+- Test manually:
+  ```bash
+  ffmpeg -f lavfi -i anullsrc=r=16000:cl=mono -t 1 -f wav /tmp/test.wav -y
+  whisper-cli -m data/models/ggml-base.bin -f /tmp/test.wav --no-timestamps -nt
+  ```
+
+**Falls back to `[Voice message] (/path/to/file.oga)` instead of transcribing**
+- Transcription returned null — check the above test
+- Check `WHISPER_MODEL` path exists: `ls data/models/ggml-base.bin`
+
+**Slow transcription**
+- The base model processes ~30s of audio in <1s on Apple Silicon, ~5s on x86_64
+- Use `ggml-small.bin` only if accuracy is insufficient — speed tradeoff
+
+## Configuration
 
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `WHISPER_BIN` | `whisper-cli` | Path to whisper.cpp binary |
 | `WHISPER_MODEL` | `data/models/ggml-base.bin` | Path to GGML model file |
-
-## Troubleshooting
-
-**"whisper.cpp transcription failed"**: Ensure both `whisper-cli` and `ffmpeg` are in PATH. The launchd service uses a restricted PATH — see Phase 3 above. Test manually:
-```bash
-ffmpeg -f lavfi -i anullsrc=r=16000:cl=mono -t 1 -f wav /tmp/test.wav -y
-whisper-cli -m data/models/ggml-base.bin -f /tmp/test.wav --no-timestamps -nt
-```
-
-**Transcription works in dev but not as service**: The launchd plist PATH likely doesn't include `/opt/homebrew/bin`. See "Ensure launchd PATH includes Homebrew" in Phase 3.
-
-**Slow transcription**: The base model processes ~30s of audio in <1s on M1+. If slower, check CPU usage — another process may be competing.
-
-**Wrong language**: whisper.cpp auto-detects language. To force a language, you can set `WHISPER_LANG` and modify `src/transcription.ts` to pass `-l $WHISPER_LANG`.
@@ -31,7 +31,7 @@ ENV AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium
 ENV PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/usr/bin/chromium
 
 # Install agent-browser and claude-code globally
-RUN npm install -g agent-browser @anthropic-ai/claude-code
+RUN npm install -g agent-browser @anthropic-ai/claude-code mcp-remote
 
 # Create app directory
 WORKDIR /app

@@ -484,6 +484,14 @@ async function runQuery(
             NANOCLAW_IS_MAIN: containerInput.isMain ? '1' : '0',
           },
         },
+        'ha-mcp': {
+          command: 'npx',
+          args: ['-y', 'mcp-remote', 'http://host.docker.internal:9583/private_PWWE28FuDIflsITGNI9VDQ', '--allow-http'],
+          env: {
+            NO_PROXY: 'host.docker.internal',
+            no_proxy: 'host.docker.internal',
+          },
+        },
       },
       hooks: {
         PreCompact: [