Skip to content

feat: WhatsApp/Gmail channels, image vision, voice, PDF/GOG/summarize skills, perf#917

Open
vsabavat wants to merge 5 commits intoqwibitai:mainfrom
vsabavat:feat/all-contributions
Open

feat: WhatsApp/Gmail channels, image vision, voice, PDF/GOG/summarize skills, perf#917
vsabavat wants to merge 5 commits intoqwibitai:mainfrom
vsabavat:feat/all-contributions

Conversation

@vsabavat
Copy link
Copy Markdown
Contributor

Summary

  • WhatsApp channel — Baileys-based, pairing code auth, group sync, reconnect logic
  • Gmail channel — OAuth, full channel mode (emails trigger the agent, agent can reply)
  • Voice transcription — OpenAI Whisper API, auto-transcribes WhatsApp voice notes
  • Image vision — WhatsApp image attachments resized + sent to Claude as multimodal content blocks (src/image.ts, updated whatsapp.ts + agent-runner)
  • PDF reader skillpdftotext (poppler-utils) in container, handles attachments/URLs/local files
  • GOG skillgog CLI available to all container agents
  • Summarize skill@steipete/summarize routed through cli/claude (no OpenAI key needed); wrapper script injects API key at startup
  • Container entrypoint — extracted from inline Dockerfile RUN to container/entrypoint.sh
  • Compile cache — hash /app/src .ts files at startup; skip tsc when unchanged. Cuts cold start from ~10s to ~1–2s. Cache persists in data/sessions/<group>/agent-runner-dist/ between runs
  • Poll interval — default lowered to 1000ms, configurable via POLL_INTERVAL env var
  • Trigger fix — removed ^ anchor from TRIGGER_PATTERN so @AssistantName matches anywhere in a message, not just at the start
  • Security — shadow .env inside container mount so secrets can't be read from the mounted project root

Test plan

  • Send a WhatsApp image → agent describes it
  • Send a voice note → agent reads the transcription
  • Send a PDF attachment → agent extracts and summarizes text
  • Ask agent to summarize a URL → uses cli/claude, no OpenAI error
  • Send message with @Astra mid-sentence → agent triggers
  • Send second message to same group → confirm no tsc compile log (cache hit)
  • npm test passes

🤖 Generated with Claude Code

@Andy-NanoClaw-AI Andy-NanoClaw-AI added PR: Feature New feature or enhancement Status: Needs Review Ready for maintainer review labels Mar 10, 2026
@vsabavat vsabavat force-pushed the feat/all-contributions branch from 3043292 to e0be87f Compare March 10, 2026 22:57
vsabavat and others added 3 commits March 10, 2026 23:01
…lls, perf

Channels & auth:
- Add WhatsApp channel (Baileys, pairing code auth, group sync, reconnect)
- Add Gmail channel (OAuth, full channel mode — emails trigger the agent)
- Add voice transcription via OpenAI Whisper API
- Register all channels in src/channels/index.ts

Image vision:
- src/image.ts: resize + base64-encode WhatsApp image attachments
- src/channels/whatsapp.ts: detect + download images, send as multimodal blocks
- container/agent-runner/src/index.ts: accept imageAttachments in ContainerInput

Container skills:
- container/skills/pdf-reader: extract PDF text via pdftotext (poppler-utils)
- container/skills/gog: gog CLI available to all agents
- container/skills/summarize: @steipete/summarize routed through cli/claude
- container/entrypoint.sh: extracted from inline Dockerfile RUN for readability
- container/summarize-wrapper.sh: injects API key, routes to claude-code
- container/Dockerfile: install poppler-utils, @steipete/summarize, gog globally

Performance:
- Cache compiled agent-runner dist by hashing /app/src .ts files
  First run compiles once; subsequent runs skip tsc (~10s → ~1-2s cold start)
  Cached dist persists in data/sessions/<group>/agent-runner-dist/ on host
- POLL_INTERVAL: default 1000ms, configurable via POLL_INTERVAL env var

Fixes:
- TRIGGER_PATTERN: remove ^ anchor so @astra matches anywhere in a message
- Shadow .env inside container mount so secrets stay out of agent's reach

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
[email protected] has a peerOptional dep on zod@^3.x but the project uses zod@^4.x
for ipc-mcp-stdio.ts. Adding legacy-peer-deps=true lets npm install both
without ERESOLVE, matching how the lockfile was generated locally.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…hing

The ^ anchor was removed so @name matches anywhere, not just at start.
Update the test description and expectations accordingly.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@vsabavat vsabavat force-pushed the feat/all-contributions branch from e0be87f to 3f14152 Compare March 10, 2026 23:01
- .env.example: replace "# Added by skill" artifacts with proper docs
- setup/index.ts: remove channels step (setup/channels.js doesn't exist)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
When a user replies to a specific message in WhatsApp, the agent now
sees which message was being replied to. Extracts contextInfo.stanzaId
from Baileys message objects, stores it as reply_to_id in the messages
table (with auto-migration for existing DBs), and emits it as a
reply_to attribute on <message> elements in the XML context sent to
the agent.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Fritzzzz1 added a commit to Fritzzzz1/nanoclaw that referenced this pull request Mar 23, 2026
  Content pipeline for non-text messages: channel adapters produce
  RawContentPart[], processContentParts() resolves them into ContentPart[]
  (download, save, convert), agent runner dispatches to pluggable type
  handlers — Claude-native types get embedded, non-native types get
  file-reference injection. Skills can override any handler.

  - RawContentPart/ContentPart types (ref + buffer dual input)
  - processContentParts() with media download and local storage
  - content_parts DB column with string fallback (backward compat)
  - Handler registry + dispatch in agent runner
  - Media directory mount into containers

  Co-Authored-By: @JasonOA888 (qwibitai#902)
  Co-Authored-By: @kenmaz (qwibitai#1069)
  Co-Authored-By: @vsabavat (qwibitai#917, qwibitai#1055)
  Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Fritzzzz1 added a commit to Fritzzzz1/nanoclaw that referenced this pull request Mar 23, 2026
  Content pipeline for non-text messages: channel adapters produce
  RawContentPart[], processContentParts() resolves them into ContentPart[]
  (download, save, convert), agent runner dispatches to pluggable type
  handlers — Claude-native types get embedded, non-native types get
  file-reference injection. Skills can override any handler.

  - RawContentPart/ContentPart types (ref + buffer dual input)
  - processContentParts() with media download and local storage
  - content_parts DB column with string fallback (backward compat)
  - Handler registry + dispatch in agent runner
  - Media directory mount into containers

  Co-Authored-By: @JasonOA888 (qwibitai#902)
  Co-Authored-By: @kenmaz (qwibitai#1069)
  Co-Authored-By: @vsabavat (qwibitai#917, qwibitai#1055)
  Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Fritzzzz1 added a commit to Fritzzzz1/nanoclaw that referenced this pull request Mar 23, 2026
  Content pipeline for non-text messages: channel adapters produce
  RawContentPart[], processContentParts() resolves them into ContentPart[]
  (download, save, convert), agent runner dispatches to pluggable type
  handlers — Claude-native types get embedded, non-native types get
  file-reference injection. Skills can override any handler.

  - RawContentPart/ContentPart types (ref + buffer dual input)
  - processContentParts() with media download and local storage
  - content_parts DB column with string fallback (backward compat)
  - Handler registry + dispatch in agent runner
  - Media directory mount into containers

  Co-Authored-By: @JasonOA888 (qwibitai#902)
  Co-Authored-By: @kenmaz (qwibitai#1069)
  Co-Authored-By: @vsabavat (qwibitai#917, qwibitai#1055)
  Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
gavrielc pushed a commit to Fritzzzz1/nanoclaw that referenced this pull request Apr 1, 2026
  Content pipeline for non-text messages: channel adapters produce
  RawContentPart[], processContentParts() resolves them into ContentPart[]
  (download, save, convert), agent runner dispatches to pluggable type
  handlers — Claude-native types get embedded, non-native types get
  file-reference injection. Skills can override any handler.

  - RawContentPart/ContentPart types (ref + buffer dual input)
  - processContentParts() with media download and local storage
  - content_parts DB column with string fallback (backward compat)
  - Handler registry + dispatch in agent runner
  - Media directory mount into containers

  Co-Authored-By: @JasonOA888 (qwibitai#902)
  Co-Authored-By: @kenmaz (qwibitai#1069)
  Co-Authored-By: @vsabavat (qwibitai#917, qwibitai#1055)
  Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: Feature New feature or enhancement Status: Needs Review Ready for maintainer review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants