Skip to content

fix: handle PDF document messages in WhatsApp channel#1055

Open
vsabavat wants to merge 56 commits intoqwibitai:mainfrom
vsabavat:fix/whatsapp-pdf-attachments
Open

fix: handle PDF document messages in WhatsApp channel#1055
vsabavat wants to merge 56 commits intoqwibitai:mainfrom
vsabavat:fix/whatsapp-pdf-attachments

Conversation

@vsabavat
Copy link
Copy Markdown
Contributor

Summary

  • PDFs sent or forwarded via WhatsApp arrive as documentMessage but content extraction only covered conversation/extendedTextMessage/imageMessage/videoMessage — so PDF messages were silently dropped
  • Downloads and saves PDF to attachments/<filename>.pdf in the group dir
  • Sets message content to [PDF: attachments/name.pdf] so the agent can read it via pdf-reader extract
  • Adds documentMessage.contextInfo so reply-to context works on PDF messages
  • Adds documentMessage.caption/[Document] fallback to quotedText so forwarded PDFs show proper quote context

Test plan

  • Send a PDF directly to a registered WhatsApp chat — confirm agent receives [PDF: attachments/name.pdf] and can read it
  • Forward a PDF from another chat — confirm same behavior
  • Reply to a PDF message — confirm quotedText is populated correctly
  • Send a PDF with a caption — confirm caption appears after the [PDF: ...] tag

🤖 Generated with Claude Code

vsabavat and others added 30 commits March 6, 2026 06:57
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Apply add-voice-transcription skill (OpenAI Whisper)
- Fix missing registerChannel call in whatsapp.ts (dropped by skill merge)
- Fix add-voice-transcription skill patch to preserve registerChannel

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts:
#	container/agent-runner/src/index.ts
#	src/container-runner.ts
Syncs with upstream main (on schedule, dispatch, or push), then
merges main into all skill/* branches with build+test validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts:
#	src/container-runner.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…arts

Hash /app/src .ts files at container startup. If the hash matches the
cached dist, skip tsc entirely — cutting cold-start from ~10s to ~1-2s
for repeat invocations. Cache persists in data/sessions/<group>/agent-runner-dist/
on the host between container runs.

Also:
- Lower default POLL_INTERVAL to 1000ms (configurable via POLL_INTERVAL env var)
- Fix TRIGGER_PATTERN: remove ^ so @astra matches anywhere in message

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
github-actions bot and others added 20 commits March 10, 2026 20:40
…hing

The ^ anchor was removed so @name matches anywhere, not just at start.
Update the test description and expectations accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- .env.example: replace "# Added by skill" artifacts with proper docs
- setup/index.ts: remove channels step (setup/channels.js doesn't exist)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a user replies to a specific message in WhatsApp, the agent now
sees which message was being replied to. Extracts contextInfo.stanzaId
from Baileys message objects, stores it as reply_to_id in the messages
table (with auto-migration for existing DBs), and emits it as a
reply_to attribute on <message> elements in the XML context sent to
the agent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ntext

Previous approach stored only the message ID (stanzaId) as reply_to_id,
but the agent had no way to look it up since message elements had no id
attribute. Baileys already provides contextInfo.quotedMessage with the
full text of the quoted message — extract that directly and embed it as
a <reply_to sender="...">text</reply_to> block inside the <message>
element so the agent sees exactly what was being replied to.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts:
#	.env.example
#	package-lock.json
#	setup/whatsapp-auth.ts
#	src/channels/whatsapp.test.ts
#	src/channels/whatsapp.ts
PDFs sent or forwarded via WhatsApp arrive as documentMessage but
the channel only extracted content from conversation/extendedText/
image/videoMessage, so PDF messages were silently dropped.

- Download and save PDF to attachments/<filename>.pdf
- Set content to [PDF: attachments/name.pdf] so the agent can read
  it via pdf-reader extract
- Include documentMessage contextInfo so reply-to works on PDFs
- Include documentMessage caption/fallback in quotedText so
  forwarded PDFs show proper quote context

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Andy-NanoClaw-AI Andy-NanoClaw-AI added Status: Needs Review Ready for maintainer review PR: Fix Bug fix labels Mar 14, 2026
Fritzzzz1 added a commit to Fritzzzz1/nanoclaw that referenced this pull request Mar 23, 2026
  Content pipeline for non-text messages: channel adapters produce
  RawContentPart[], processContentParts() resolves them into ContentPart[]
  (download, save, convert), agent runner dispatches to pluggable type
  handlers — Claude-native types get embedded, non-native types get
  file-reference injection. Skills can override any handler.

  - RawContentPart/ContentPart types (ref + buffer dual input)
  - processContentParts() with media download and local storage
  - content_parts DB column with string fallback (backward compat)
  - Handler registry + dispatch in agent runner
  - Media directory mount into containers

  Co-Authored-By: @JasonOA888 (qwibitai#902)
  Co-Authored-By: @kenmaz (qwibitai#1069)
  Co-Authored-By: @vsabavat (qwibitai#917, qwibitai#1055)
  Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fritzzzz1 added a commit to Fritzzzz1/nanoclaw that referenced this pull request Mar 23, 2026
  Content pipeline for non-text messages: channel adapters produce
  RawContentPart[], processContentParts() resolves them into ContentPart[]
  (download, save, convert), agent runner dispatches to pluggable type
  handlers — Claude-native types get embedded, non-native types get
  file-reference injection. Skills can override any handler.

  - RawContentPart/ContentPart types (ref + buffer dual input)
  - processContentParts() with media download and local storage
  - content_parts DB column with string fallback (backward compat)
  - Handler registry + dispatch in agent runner
  - Media directory mount into containers

  Co-Authored-By: @JasonOA888 (qwibitai#902)
  Co-Authored-By: @kenmaz (qwibitai#1069)
  Co-Authored-By: @vsabavat (qwibitai#917, qwibitai#1055)
  Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fritzzzz1 added a commit to Fritzzzz1/nanoclaw that referenced this pull request Mar 23, 2026
  Content pipeline for non-text messages: channel adapters produce
  RawContentPart[], processContentParts() resolves them into ContentPart[]
  (download, save, convert), agent runner dispatches to pluggable type
  handlers — Claude-native types get embedded, non-native types get
  file-reference injection. Skills can override any handler.

  - RawContentPart/ContentPart types (ref + buffer dual input)
  - processContentParts() with media download and local storage
  - content_parts DB column with string fallback (backward compat)
  - Handler registry + dispatch in agent runner
  - Media directory mount into containers

  Co-Authored-By: @JasonOA888 (qwibitai#902)
  Co-Authored-By: @kenmaz (qwibitai#1069)
  Co-Authored-By: @vsabavat (qwibitai#917, qwibitai#1055)
  Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gavrielc pushed a commit to Fritzzzz1/nanoclaw that referenced this pull request Apr 1, 2026
  Content pipeline for non-text messages: channel adapters produce
  RawContentPart[], processContentParts() resolves them into ContentPart[]
  (download, save, convert), agent runner dispatches to pluggable type
  handlers — Claude-native types get embedded, non-native types get
  file-reference injection. Skills can override any handler.

  - RawContentPart/ContentPart types (ref + buffer dual input)
  - processContentParts() with media download and local storage
  - content_parts DB column with string fallback (backward compat)
  - Handler registry + dispatch in agent runner
  - Media directory mount into containers

  Co-Authored-By: @JasonOA888 (qwibitai#902)
  Co-Authored-By: @kenmaz (qwibitai#1069)
  Co-Authored-By: @vsabavat (qwibitai#917, qwibitai#1055)
  Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: Fix Bug fix Status: Needs Review Ready for maintainer review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants