Skip to content

fix: sanitize lone Unicode surrogates before sending to Claude API#1009

Open
vzaliva wants to merge 2 commits intoqwibitai:mainfrom
vzaliva:fix/lone-surrogate-json-error
Open

fix: sanitize lone Unicode surrogates before sending to Claude API#1009
vzaliva wants to merge 2 commits intoqwibitai:mainfrom
vzaliva:fix/lone-surrogate-json-error

Conversation

@vzaliva
Copy link
Copy Markdown

@vzaliva vzaliva commented Mar 12, 2026

Type of Change

  • Skill - adds a new skill in .claude/skills/
  • Fix - bug fix or security fix to source code
  • Simplification - reduces or simplifies source code

Description

WhatsApp messages can contain emoji encoded as lone surrogate characters — invalid UTF-16 that is legal in JS strings but rejected when serialised to JSON.
When the API request body is built, Anthropic returns:

400 "no low surrogate in string"

Fix: call String.prototype.toWellFormed() in escapeXml(), which replaces any lone surrogate with U+FFFD before the content enters the prompt. Also
bumps tsconfig.json lib to ES2024 so the method is recognised by tsc (Node 20+ supports it natively, no runtime dependency change).

For Skills

(not applicable)

WhatsApp messages can contain emoji encoded as lone surrogate characters
(invalid UTF-16 that is legal in JS strings but rejected by JSON). When
the API request body is serialised, Anthropic returns:

  400 "no low surrogate in string"

Fix: call String.prototype.toWellFormed() in escapeXml(), which replaces
any lone surrogate with U+FFFD before the content enters the prompt.
Also bumps tsconfig lib to ES2024 so the method is recognised by tsc
(Node 20+ supports it natively).
@vzaliva
Copy link
Copy Markdown
Author

vzaliva commented Mar 12, 2026

Note for users hitting this error before applying the fix:

If you've already received the no low surrogate in string 400 error, your active
Claude Code session file may be corrupted with lone surrogates from previous messages.
The fix in this PR only prevents new surrogates from entering — it does not clean
existing session history.

Session files are stored in data/sessions/<group>/.claude/projects/. If you continue
to get the error after updating, find the active session file (the large .jsonl
matching the session ID in your database) and either delete it (the agent will start a
fresh session) or sanitize it by replacing lone surrogates with U+FFFD.

The previous fix (escapeXml/toWellFormed) only covered incoming WhatsApp
message text. Lone surrogates can also enter the session transcript via
tool results (file reads, bash output, web fetches) and get replayed in
subsequent API calls, triggering the same 400 error.

Fix: intercept every outbound API request in the credential proxy and
replace lone \uXXXX surrogate escapes with \uFFFD before forwarding.
Valid surrogate pairs are preserved. This is the authoritative last line
of defence regardless of surrogate source.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: Fix Bug fix Status: Needs Review Ready for maintainer review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants