Skip to content

fix(io): force utf-8 encoding when loading non-yaml templates#1414

Open
genisis0x wants to merge 1 commit into
microsoft:mainfrom
genisis0x:fix/load-content-encoding-utf8-939
Open

fix(io): force utf-8 encoding when loading non-yaml templates#1414
genisis0x wants to merge 1 commit into
microsoft:mainfrom
genisis0x:fix/load-content-encoding-utf8-939

Conversation

@genisis0x
Copy link
Copy Markdown

@genisis0x genisis0x commented May 14, 2026

Summary

Resolves #939.

Tpl.load_content falls through to file_path.read_text() for non-YAML templates (.md, .txt, .py, ...). Without an explicit encoding, Path.read_text uses locale.getpreferredencoding, which is cp1252 / gbk / etc on non-UTF-8 Windows hosts. Any non-ASCII byte the template happens to carry then raises a UnicodeDecodeError that bubbles up through fin_factor / fin_quant / fin_model startup.

Aligns the non-YAML branch with the YAML branch immediately above it which already passes encoding="utf-8".

Test plan

  • Python AST parse clean on the patched file.
  • Two read_text / open call sites in load_content (YAML and fallback) now agree on UTF-8 encoding semantics.

📚 Documentation preview 📚: https://RDAgent--1414.org.readthedocs.build/en/1414/

`Tpl.load_content` falls through to `file_path.read_text()` for
non-YAML templates (`.md`, `.txt`, `.py`, ...). Without an explicit
encoding, `Path.read_text` uses `locale.getpreferredencoding`, which is
`cp1252` / `gbk` / etc on non-UTF-8 Windows hosts, and any non-ASCII
byte the template happens to carry raises a `UnicodeDecodeError` that
bubbles up through `fin_factor` / `fin_quant` / `fin_model` startup.

Reported in issue microsoft#939 (`UnicodeDecodeError: 'gbk' codec can't decode
byte 0x9e`). Aligns the non-YAML branch with the YAML branch above it
which already passes `encoding="utf-8"`.

Resolves microsoft#939.
@genisis0x
Copy link
Copy Markdown
Author

Read the CLA — all clear from my side.

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnicodeDecodeError: 'gbk' codec can't decode byte 0x9e in position 8497: illegal multibyte sequence

1 participant