Skip to content

fix(docx): make XML validation and console output Windows-safe#871

Open
voidborne-d wants to merge 1 commit intoanthropics:mainfrom
voidborne-d:fix/docx-windows-utf8-issue-712
Open

fix(docx): make XML validation and console output Windows-safe#871
voidborne-d wants to merge 1 commit intoanthropics:mainfrom
voidborne-d:fix/docx-windows-utf8-issue-712

Conversation

@voidborne-d
Copy link
Copy Markdown

Summary

  • read XML files for XSD validation with explicit encoding="utf-8"
  • replace two Unicode arrow glyphs in docx validator output with ASCII ->
  • add regression tests covering UTF-8 XML reads and cp1252/Windows console output

Why

Fixes #712.

The docx office scripts currently rely on the platform default text encoding in one XML validation path, which breaks on Windows when the default codec is cp1252. They also print in two validation/repair messages, which can raise UnicodeEncodeError on cp1252 terminals.

This keeps the XML parsing path deterministic across platforms and makes the console output safe on default Windows terminals without requiring PYTHONIOENCODING=utf-8.

Testing

  • python3 -m unittest tests.test_docx_windows_encoding -v

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] docx skill: validate.py and pack.py crash on Windows due to missing UTF-8 encoding in file I/O and print output

1 participant