Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
810ea02
test: add shell and Windows encoding test utilities
tanzhenxin Mar 16, 2026
b917cf8
fix(shell): resolve Windows GBK encoding for non-ASCII commands
tanzhenxin Mar 16, 2026
f8f189e
fix(shell): always switch codepage on Windows for consistency
tanzhenxin Mar 16, 2026
15b5f02
fix(shell): force UTF-8 output decoding after codepage switch
tanzhenxin Mar 16, 2026
7f915be
fix(shell): ensure CRLF line endings for .bat/.cmd files
tanzhenxin Mar 16, 2026
dca3ea1
refactor(shell): remove Windows encoding wrapper logic
tanzhenxin Mar 16, 2026
9b82295
refactor(encoding): try UTF-8 first in buffer encoding detection
tanzhenxin Mar 16, 2026
21014a5
test(encoding): add Windows encoding test plan and scripts
tanzhenxin Mar 16, 2026
3dd0bac
fix(shell): force UTF-8 output for PowerShell on Windows
tanzhenxin Mar 16, 2026
89e452c
test(encoding): simplify Windows encoding test plan
tanzhenxin Mar 16, 2026
e9facac
test(encoding): replace manual Windows test scripts with automated e2…
tanzhenxin Mar 16, 2026
f93e5f0
refactor(encoding): consolidate system encoding fallback logic
tanzhenxin Mar 16, 2026
d8dab9a
docs(encoding): clarify detectEncodingFromBuffer responsibility
tanzhenxin Mar 16, 2026
aa8afba
test(encoding): remove Windows encoding e2e tests
tanzhenxin Mar 16, 2026
08c1ce9
chore(shell): remove Codex CLI reference from comment
tanzhenxin Mar 16, 2026
9b1bd73
refactor(core): improve platform-specific encoding and shell utilities
tanzhenxin Mar 16, 2026
922fca5
test(services): simplify os module mock in fileSystemService tests
tanzhenxin Mar 16, 2026
82e0064
refactor(core): use dynamic terminal dimensions for replay
tanzhenxin Mar 16, 2026
17939ba
feat(core): auto-detect UTF-8 BOM for PowerShell scripts on Windows
tanzhenxin Mar 16, 2026
c3f5dd3
docs(tools): document file encoding and platform-specific behavior
tanzhenxin Mar 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions docs/developers/tools/file-system.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,4 +165,63 @@ grep_search(pattern="function", glob="*.js", limit=10)
- On failure: An error message explaining the reason (e.g., `Failed to edit, 0 occurrences found...`, `Failed to edit because the text matches multiple locations...`).
- **Confirmation:** Yes. Shows a diff of the proposed changes and asks for user approval before writing to the file.

## File encoding and platform-specific behavior

### Encoding detection and preservation

When reading files, Qwen Code detects the file's encoding using a multi-step strategy:

1. **UTF-8** — tried first (most modern tooling outputs UTF-8)
2. **chardet** — statistical detection for non-UTF-8 content
3. **System encoding** — falls back to the OS code page (Windows `chcp` / Unix `LANG`)

Both `write_file` and `edit` preserve the original encoding and BOM (byte order mark) of existing files. If a file was read as GBK with a UTF-8 BOM, it will be written back the same way.

### Configuring default encoding for new files

The `defaultFileEncoding` setting controls encoding for **newly created** files (not edits to existing files):

| Value | Behavior |
| ----------- | --------------------------------------------------------------------------- |
| _(not set)_ | UTF-8 without BOM, with automatic platform-specific adjustments (see below) |
| `utf-8` | UTF-8 without BOM, no automatic adjustments |
| `utf-8-bom` | UTF-8 with BOM for all new files |

Set it in `.qwen/settings.json` or `~/.qwen/settings.json`:

```json
{
"general": {
"defaultFileEncoding": "utf-8-bom"
}
}
```

### Windows: CRLF for batch files

On Windows, `.bat` and `.cmd` files are automatically written with CRLF (`\r\n`) line endings. This is required because `cmd.exe` uses CRLF as its line delimiter — LF-only endings can break multi-line `if`/`else`, `goto` labels, and `for` loops. This applies regardless of encoding settings and only on Windows.

### Windows: UTF-8 BOM for PowerShell scripts

On Windows with a **non-UTF-8 system code page** (e.g. GBK/cp936, Big5/cp950, Shift_JIS/cp932), newly created `.ps1` files are automatically written with a UTF-8 BOM. This is necessary because Windows PowerShell 5.1 (the version built into Windows 10/11) reads BOM-less scripts using the system's ANSI code page. Without a BOM, any non-ASCII characters in the script will be misinterpreted.

This automatic BOM only applies when:

- The platform is Windows
- The system code page is not UTF-8 (not code page 65001)
- The file is a new `.ps1` file (existing files keep their original encoding)
- The user has **not** explicitly set `defaultFileEncoding` in settings

PowerShell 7+ (pwsh) defaults to UTF-8 and handles BOM transparently, so the BOM is harmless there.

If you explicitly set `defaultFileEncoding` to `"utf-8"`, the automatic BOM is disabled — this is an intentional escape hatch for repositories or tooling that reject BOMs.

### Summary

| File type | Platform | Automatic behavior |
| -------------- | ----------------------------- | --------------------------- |
| `.bat`, `.cmd` | Windows | CRLF line endings |
| `.ps1` | Windows (non-UTF-8 code page) | UTF-8 BOM on new files |
| All others | All | UTF-8 without BOM (default) |

These file system tools provide a foundation for Qwen Code to understand and interact with your local project context.
4 changes: 1 addition & 3 deletions packages/cli/src/config/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ import {
Config,
DEFAULT_QWEN_EMBEDDING_MODEL,
FileDiscoveryService,
FileEncoding,
getAllGeminiMdFilenames,
loadServerHierarchicalMemory,
setGeminiMdFilename as setServerGeminiMdFilename,
Expand Down Expand Up @@ -1041,8 +1040,7 @@ export async function loadCliConfig(
// always be true and the settings file can never disable recording.
chatRecording:
argv.chatRecording ?? settings.general?.chatRecording ?? true,
defaultFileEncoding:
settings.general?.defaultFileEncoding ?? FileEncoding.UTF8,
defaultFileEncoding: settings.general?.defaultFileEncoding,
lsp: {
enabled: lspEnabled,
},
Expand Down
7 changes: 3 additions & 4 deletions packages/core/src/config/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ import {
type FileSystemService,
StandardFileSystemService,
type FileEncodingType,
FileEncoding,
} from '../services/fileSystemService.js';
import { GitService } from '../services/gitService.js';

Expand Down Expand Up @@ -523,7 +522,7 @@ export class Config {
private readonly truncateToolOutputLines: number;
private readonly eventEmitter?: EventEmitter;
private readonly channel: string | undefined;
private readonly defaultFileEncoding: FileEncodingType;
private readonly defaultFileEncoding: FileEncodingType | undefined;
private readonly enableHooks: boolean;
private readonly hooks?: Record<string, unknown>;
private readonly hooksConfig?: Record<string, unknown>;
Expand Down Expand Up @@ -641,7 +640,7 @@ export class Config {
this.truncateToolOutputLines =
params.truncateToolOutputLines ?? DEFAULT_TRUNCATE_TOOL_OUTPUT_LINES;
this.channel = params.channel;
this.defaultFileEncoding = params.defaultFileEncoding ?? FileEncoding.UTF8;
this.defaultFileEncoding = params.defaultFileEncoding;
this.storage = new Storage(this.targetDir);
this.inputFormat = params.inputFormat ?? InputFormat.TEXT;
this.fileExclusions = new FileExclusions(this);
Expand Down Expand Up @@ -1647,7 +1646,7 @@ export class Config {
* Get the default file encoding for new files.
* @returns FileEncodingType
*/
getDefaultFileEncoding(): FileEncodingType {
getDefaultFileEncoding(): FileEncodingType | undefined {
return this.defaultFileEncoding;
}

Expand Down
195 changes: 194 additions & 1 deletion packages/core/src/services/fileSystemService.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,27 @@

import { describe, it, expect, beforeEach, afterEach, vi } from 'vitest';
import fs from 'node:fs/promises';
import { StandardFileSystemService } from './fileSystemService.js';
import {
StandardFileSystemService,
needsUtf8Bom,
resetUtf8BomCache,
} from './fileSystemService.js';

const mockPlatform = vi.hoisted(() => vi.fn().mockReturnValue('linux'));
const mockGetSystemEncoding = vi.hoisted(() =>
vi.fn().mockReturnValue('utf-8'),
);

vi.mock('fs/promises');
vi.mock('os', () => ({
default: {
platform: mockPlatform,
},
platform: mockPlatform,
}));
vi.mock('../utils/systemEncoding.js', () => ({
getSystemEncoding: mockGetSystemEncoding,
}));

vi.mock('../utils/fileUtils.js', async (importOriginal) => {
const actual = await importOriginal<typeof import('../utils/fileUtils.js')>();
Expand All @@ -25,6 +43,9 @@ describe('StandardFileSystemService', () => {

beforeEach(() => {
vi.resetAllMocks();
resetUtf8BomCache();
mockPlatform.mockReturnValue('linux');
mockGetSystemEncoding.mockReturnValue('utf-8');
fileSystem = new StandardFileSystemService();
});

Expand Down Expand Up @@ -254,5 +275,177 @@ describe('StandardFileSystemService', () => {
// First two bytes should NOT be FF FE (the UTF-16LE BOM)
expect(!(buf[0] === 0xff && buf[1] === 0xfe)).toBe(true);
});

it('should convert LF to CRLF when writing .bat files on Windows', async () => {
mockPlatform.mockReturnValue('win32');
vi.mocked(fs.writeFile).mockResolvedValue();

await fileSystem.writeTextFile({
path: '/test/script.bat',
content: '@echo off\necho hello\nexit /b 0\n',
});

expect(fs.writeFile).toHaveBeenCalledWith(
'/test/script.bat',
'@echo off\r\necho hello\r\nexit /b 0\r\n',
'utf-8',
);
});

it('should convert LF to CRLF when writing .cmd files on Windows', async () => {
mockPlatform.mockReturnValue('win32');
vi.mocked(fs.writeFile).mockResolvedValue();

await fileSystem.writeTextFile({
path: '/test/script.cmd',
content: '@echo off\necho hello\n',
});

expect(fs.writeFile).toHaveBeenCalledWith(
'/test/script.cmd',
'@echo off\r\necho hello\r\n',
'utf-8',
);
});

it('should not double-convert existing CRLF in .bat files on Windows', async () => {
mockPlatform.mockReturnValue('win32');
vi.mocked(fs.writeFile).mockResolvedValue();

await fileSystem.writeTextFile({
path: '/test/script.bat',
content: '@echo off\r\necho hello\r\n',
});

expect(fs.writeFile).toHaveBeenCalledWith(
'/test/script.bat',
'@echo off\r\necho hello\r\n',
'utf-8',
);
});

it('should handle mixed line endings in .bat files on Windows', async () => {
mockPlatform.mockReturnValue('win32');
vi.mocked(fs.writeFile).mockResolvedValue();

await fileSystem.writeTextFile({
path: '/test/script.bat',
content: 'line1\r\nline2\nline3\r\n',
});

expect(fs.writeFile).toHaveBeenCalledWith(
'/test/script.bat',
'line1\r\nline2\r\nline3\r\n',
'utf-8',
);
});

it('should be case-insensitive for .BAT extension on Windows', async () => {
mockPlatform.mockReturnValue('win32');
vi.mocked(fs.writeFile).mockResolvedValue();

await fileSystem.writeTextFile({
path: '/test/SCRIPT.BAT',
content: 'echo hello\n',
});

expect(fs.writeFile).toHaveBeenCalledWith(
'/test/SCRIPT.BAT',
'echo hello\r\n',
'utf-8',
);
});

it('should not convert line endings for non-.bat/.cmd files on Windows', async () => {
mockPlatform.mockReturnValue('win32');
vi.mocked(fs.writeFile).mockResolvedValue();

await fileSystem.writeTextFile({
path: '/test/script.sh',
content: '#!/bin/bash\necho hello\n',
});

expect(fs.writeFile).toHaveBeenCalledWith(
'/test/script.sh',
'#!/bin/bash\necho hello\n',
'utf-8',
);
});

it('should not convert line endings for .bat files on non-Windows', async () => {
mockPlatform.mockReturnValue('darwin');
vi.mocked(fs.writeFile).mockResolvedValue();

await fileSystem.writeTextFile({
path: '/test/script.bat',
content: '@echo off\necho hello\n',
});

expect(fs.writeFile).toHaveBeenCalledWith(
'/test/script.bat',
'@echo off\necho hello\n',
'utf-8',
);
});
});

describe('needsUtf8Bom', () => {
beforeEach(() => {
resetUtf8BomCache();
});

it('should return true for .ps1 files on Windows with non-UTF-8 code page', () => {
mockPlatform.mockReturnValue('win32');
mockGetSystemEncoding.mockReturnValue('gbk');

expect(needsUtf8Bom('/test/script.ps1')).toBe(true);
});

it('should return true for .PS1 files (case-insensitive)', () => {
mockPlatform.mockReturnValue('win32');
mockGetSystemEncoding.mockReturnValue('gbk');

expect(needsUtf8Bom('/test/SCRIPT.PS1')).toBe(true);
});

it('should return false for .ps1 files on Windows with UTF-8 code page', () => {
mockPlatform.mockReturnValue('win32');
mockGetSystemEncoding.mockReturnValue('utf-8');

expect(needsUtf8Bom('/test/script.ps1')).toBe(false);
});

it('should return false for .ps1 files on non-Windows', () => {
mockPlatform.mockReturnValue('darwin');

expect(needsUtf8Bom('/test/script.ps1')).toBe(false);
});

it('should return false for non-.ps1 files on Windows with non-UTF-8 code page', () => {
mockPlatform.mockReturnValue('win32');
mockGetSystemEncoding.mockReturnValue('gbk');

expect(needsUtf8Bom('/test/script.sh')).toBe(false);
expect(needsUtf8Bom('/test/file.txt')).toBe(false);
expect(needsUtf8Bom('/test/script.bat')).toBe(false);
});

it('should cache the platform/encoding check across calls', () => {
mockPlatform.mockReturnValue('win32');
mockGetSystemEncoding.mockReturnValue('gbk');

needsUtf8Bom('/test/script.ps1');
needsUtf8Bom('/test/other.ps1');

// getSystemEncoding should only be called once due to caching
expect(mockGetSystemEncoding).toHaveBeenCalledTimes(1);
});

it('should treat null system encoding as non-UTF-8', () => {
mockPlatform.mockReturnValue('win32');
mockGetSystemEncoding.mockReturnValue(null);

expect(needsUtf8Bom('/test/script.ps1')).toBe(true);
});
});
});
Loading
Loading