TUI word movement (Option/Alt+Arrow) treats entire CJK sequences as a single word

## Summary

When using Option+Left / Option+Right (or Alt+Left / Alt+Right) for word-by-word cursor movement in the TUI composer, the cursor jumps over entire runs of CJK (Chinese/Japanese/Korean) characters as if they were a single word. This makes editing East Asian text very awkward.

## Root Cause

`textarea.rs` defines word boundaries using a fixed set of ASCII punctuation separators:

```rust
// codex-rs/tui/src/bottom_pane/textarea.rs
const WORD_SEPARATORS: &str = "`~!@#$%^&*()-=+[{]}\\|;:'\",.<>/?";

fn is_word_separator(ch: char) -> bool {
    WORD_SEPARATORS.contains(ch)
}
```

`beginning_of_previous_word()` and `end_of_next_word()` classify each character as either a "separator" or a "regular character", and continue moving until the classification changes or whitespace is encountered. Because CJK characters are neither whitespace nor in the separator list, a sequence like `你好世界` is treated as one continuous word — the cursor skips all four characters in a single keystroke.

The `unicode-segmentation` crate is already a transitive dependency (its path appears in the binary's embedded debug info), but its word-boundary logic (`unicode_words()` / `UAX #29`) is not used for cursor movement. The crate **is** used for grapheme-cluster boundaries (single-character movement), but not for word movement.

## Steps to Reproduce

1. Open Codex TUI.
2. Type a Chinese sentence, e.g. `你好世界 hello`.
3. Press Option+Left (macOS) or Alt+Left (Linux/Windows) repeatedly.

**Observed:** cursor jumps from the end of `你好世界` directly to position 0 in one keystroke — the whole CJK run is treated as one word.

**Expected:** cursor moves one logical unit at a time through the CJK text (ideally each character, consistent with how editors such as VS Code, Zed, and Terminal readline behave for CJK input).

## Failing Tests to Add

The following tests can be added to the `#[cfg(test)]` block in `codex-rs/tui/src/bottom_pane/textarea.rs` to codify the expected behavior:

```rust
#[test]
fn word_navigation_cjk_each_char_is_boundary() {
    // Each CJK character should be treated as its own word unit.
    // Cursor placed after 世 (index 9, byte offset depends on UTF-8 encoding):
    //   你(3) 好(3) 世(3) 界(3)  →  byte offsets: 0,3,6,9,12
    let text = "你好世界";
    let mut t = ta_with(text);

    // Start at end of text (after 界, byte 12)
    t.set_cursor(text.len()); // 12
    assert_eq!(t.beginning_of_previous_word(), 9, "Alt+Left from end should land at start of 界");

    t.set_cursor(9);
    assert_eq!(t.beginning_of_previous_word(), 6, "Alt+Left should land at start of 世");

    t.set_cursor(6);
    assert_eq!(t.beginning_of_previous_word(), 3, "Alt+Left should land at start of 好");

    t.set_cursor(3);
    assert_eq!(t.beginning_of_previous_word(), 0, "Alt+Left should land at start of 你");
}

#[test]
fn word_navigation_cjk_forward() {
    let text = "你好世界";
    let mut t = ta_with(text);

    t.set_cursor(0);
    assert_eq!(t.end_of_next_word(), 3, "Alt+Right from start should land after 你");

    t.set_cursor(3);
    assert_eq!(t.end_of_next_word(), 6, "Alt+Right should land after 好");

    t.set_cursor(6);
    assert_eq!(t.end_of_next_word(), 9, "Alt+Right should land after 世");

    t.set_cursor(9);
    assert_eq!(t.end_of_next_word(), 12, "Alt+Right should land after 界");
}

#[test]
fn word_navigation_mixed_ascii_cjk() {
    // Mixed text: "hello你好" — the boundary between ASCII and CJK should also be respected.
    let text = "hello你好";
    let mut t = ta_with(text);

    // Forward from start: "hello" is one word (bytes 0..5)
    t.set_cursor(0);
    assert_eq!(t.end_of_next_word(), 5, "Alt+Right should stop after 'hello'");

    // Forward from after "hello": 你 is next unit (bytes 5..8)
    t.set_cursor(5);
    assert_eq!(t.end_of_next_word(), 8, "Alt+Right should stop after 你");

    // Backward from end: 好 (bytes 8..11), so start of 好 is 8
    t.set_cursor(text.len()); // 11
    assert_eq!(t.beginning_of_previous_word(), 8, "Alt+Left should land at start of 好");

    // Backward from start of 好: 你 (bytes 5..8)
    t.set_cursor(8);
    assert_eq!(t.beginning_of_previous_word(), 5, "Alt+Left should land at start of 你");

    // Backward from start of 你: "hello" (bytes 0..5)
    t.set_cursor(5);
    assert_eq!(t.beginning_of_previous_word(), 0, "Alt+Left should land at start of 'hello'");
}
```

## Suggested Fix

In `beginning_of_previous_word()` and `end_of_next_word()`, treat any non-ASCII character as its own word unit (i.e., break on every non-ASCII character boundary). A minimal approach inside the existing loops:

```rust
// When iterating char-by-char, treat non-ASCII chars as individual word atoms:
if !ch.is_ascii() {
    // non-ASCII characters each form their own word boundary
    start = idx + ch.len_utf8();
    break;
}
```

A more principled fix would leverage the already-present `unicode-segmentation` crate and use `UnicodeSegmentation::unicode_words()` for segment enumeration, which follows `UAX #29` word-break rules and handles CJK, Arabic, and other non-Latin scripts correctly.

## Environment

- Codex version: 0.118.0
- OS: macOS 15 (arm64)
- Terminal: iTerm2 / Terminal.app
- Affected keybindings: `Option+Left`, `Option+Right`, `Alt+Left`, `Alt+Right`, `Meta+b`, `Meta+f`

## Affected Source File

`codex-rs/tui/src/bottom_pane/textarea.rs`, functions `beginning_of_previous_word()` (line 1211) and `end_of_next_word()` (line 1232).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TUI word movement (Option/Alt+Arrow) treats entire CJK sequences as a single word #16584

Summary

Root Cause

Steps to Reproduce

Failing Tests to Add

Suggested Fix

Environment

Affected Source File

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

TUI word movement (Option/Alt+Arrow) treats entire CJK sequences as a single word #16584

Description

Summary

Root Cause

Steps to Reproduce

Failing Tests to Add

Suggested Fix

Environment

Affected Source File

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions