Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
021aa7d
feat(agent): steering (#1517)
afjcjsbx Mar 15, 2026
ae23193
feat(agent): port subturn PoC to refactor/agent branch
lppp04808 Mar 16, 2026
9c82b0b
refactor(agent): context boundary detection, proactive budget check, …
is-Xiaoen Mar 13, 2026
9c65d78
fix(agent): forceCompression must not assume history[0] is system prompt
is-Xiaoen Mar 13, 2026
d5fdd5e
fix(agent): include ReasoningContent and Media in token estimation
is-Xiaoen Mar 13, 2026
e35906b
feat(config): expose context_window in example config and web UI
is-Xiaoen Mar 13, 2026
b7f1c2b
test(agent): add realistic session-shaped tests for context budget
is-Xiaoen Mar 13, 2026
efd4032
fix(agent): preallocate messages slice in budget test
is-Xiaoen Mar 13, 2026
639739c
refactor(agent): use Turn as the atomic unit for compression cut-off
is-Xiaoen Mar 13, 2026
8034ee7
fix(agent): correct media token arithmetic and tool call double-counting
is-Xiaoen Mar 13, 2026
edbdc3b
fix(agent): findSafeBoundary returns 0 for single-Turn history
is-Xiaoen Mar 13, 2026
7c1a1c2
style(agent): fix gci comment alignment in test
is-Xiaoen Mar 13, 2026
b768dab
test(agent): use realistic session data in context retry test
is-Xiaoen Mar 13, 2026
08259d7
docs(agent-refactor): add context.md for Track 6 boundary clarification
is-Xiaoen Mar 14, 2026
ceeae15
feat(agent): wire SubTurn into AgentLoop and Spawn Tool
lppp04808 Mar 16, 2026
1236dd9
feat(agent): add concurrency semaphore and hard abort for SubTurn
lppp04808 Mar 16, 2026
acd436a
feat(agent): add session state rollback on hard abort
lppp04808 Mar 16, 2026
9d761b7
Delete .claude/settings.json
lppp04808 Mar 16, 2026
6b5d7e3
fix(agent): resolve critical race conditions and resource leaks in Su…
lppp04808 Mar 16, 2026
3c2d373
fix(agent): resolve race conditions and resource leaks in SubTurn
lppp04808 Mar 16, 2026
672d11c
fix(agent): prevent double result delivery and panic bypass in SubTurn
lppp04808 Mar 16, 2026
c63c644
fix(agent): forceCompression recovers from single oversized Turn
is-Xiaoen Mar 17, 2026
12a8590
fix(agent): enhance SubTurn robustness and fix race conditions
lppp04808 Mar 17, 2026
a26a7db
moved turnState and related code from subturn.go to a new turn_state.…
lppp04808 Mar 17, 2026
2fec249
refactor(agent): improve SubTurn error handling and logging
lppp04808 Mar 17, 2026
e00a3d9
Merge upstream/main into feat/subturn-poc
lppp04808 Mar 17, 2026
e05d262
Added tests to verify SubTurn context cancellation behavior when parent
lppp04808 Mar 17, 2026
f8defe3
feat(agent): implement graceful finish vs hard abort for SubTurn life…
lppp04808 Mar 17, 2026
5e92a38
Merge pull request #1490 from is-Xiaoen/refactor/context-boundary
afjcjsbx Mar 17, 2026
c7ea018
fix(agent): prevent duplicate history during subturn context recoveries
lppp04808 Mar 18, 2026
e20ff43
fix(agent): resolve subturn deadlocks, panics and context retry state
lppp04808 Mar 18, 2026
777230d
feat(agent): implement /subagents command and fix sub-turn observability
lppp04808 Mar 18, 2026
3611034
fix(agent): implement Critical flag, complete tools.SubTurnConfig, re…
lppp04808 Mar 18, 2026
899558b
Feat/issue 1218 agent md context structure (#1705)
alexhoshina Mar 18, 2026
431a53c
Merge branch 'upstream-main' into feat/subturn-poc
lppp04808 Mar 18, 2026
c732e63
Merge branch 'upstream-main' into feat/subturn-poc
lppp04808 Mar 19, 2026
53404f1
feat(subturn): support stateful iteration for evaluator-optimizer pat…
lppp04808 Mar 19, 2026
01c2f8d
refactor(subturn): remove redundant system prompt handling in runTurn…
lppp04808 Mar 19, 2026
99b189d
feat(subturn): implement token budget tracking for SubTurns
lppp04808 Mar 19, 2026
ce311be
feat(subturn): add configurable runtime parameters under agents.defaults
lppp04808 Mar 19, 2026
e801ccb
Merge branch 'upstream-main' into feat/subturn-poc
lppp04808 Mar 19, 2026
29a161e
fix(tools): prevent nil pointer dereference in spawn tools
lppp04808 Mar 19, 2026
24a382b
merge main
lppp04808 Mar 19, 2026
532ea4b
Merge branch 'upstream-main' into feat/subturn-poc
lppp04808 Mar 19, 2026
54889f2
Merge branch 'upstream-main' into feat/subturn-poc
lppp04808 Mar 19, 2026
583c586
Merge branch 'main' into feat/subturn-poc
lppp04808 Mar 19, 2026
c18d8a2
Merge branch 'upstream-main' into feat/subturn-poc
lppp04808 Mar 19, 2026
e71ef37
fix(test): reduce blank identifiers to comply with dogsled linter
lppp04808 Mar 20, 2026
4f646ef
Merge branch 'main' into feat/subturn-poc
lppp04808 Mar 20, 2026
af61d0b
feat(agent): add event bus foundation
alexhoshina Mar 20, 2026
50cc710
feat(agent): make event logs show event kind clearly
alexhoshina Mar 20, 2026
57cde73
feat(agent): expand event bus coverage
alexhoshina Mar 20, 2026
a65e0e9
fix: lint err
alexhoshina Mar 20, 2026
0e075f7
feat(agent): centralize turn lifecycle and continue queued steering
alexhoshina Mar 20, 2026
2b3c95b
fix: lint err
alexhoshina Mar 20, 2026
54de9ad
Merge pull request #1822 from alexhoshina/feat/agent-eventbus
yinwm Mar 20, 2026
73a683f
Merge pull request #1827 from alexhoshina/refactor/agent-loop
yinwm Mar 20, 2026
1c65866
fix(agent) scope steering
afjcjsbx Mar 20, 2026
827449a
fix lint
afjcjsbx Mar 20, 2026
9e34459
fix logic
afjcjsbx Mar 20, 2026
087e851
refactor: improve code readability and consistency across multiple files
lppp04808 Mar 21, 2026
1bd144a
Merge branch 'upstream-main' into feat/subturn-poc
lppp04808 Mar 21, 2026
670b433
refactor: replace interface{} with any for improved type clarity
lppp04808 Mar 21, 2026
cf68c91
feat(agent): add hook manager foundation
alexhoshina Mar 21, 2026
337e43e
feat(agent): add configurable hook mounting
alexhoshina Mar 21, 2026
9978c95
docs(hooks): inline and translate hook examples
alexhoshina Mar 21, 2026
24d6cb5
Merge branch 'upstream-main' into feat/subturn-poc
lppp04808 Mar 21, 2026
88d754b
merge main
lppp04808 Mar 22, 2026
482c88c
remove merge conflict markers from .gitignore
lppp04808 Mar 22, 2026
04def0f
Merge pull request #1844 from afjcjsbx/fix/scope-steering
yinwm Mar 22, 2026
0432fac
Merge pull request #1863 from alexhoshina/feat/hook-manager
yinwm Mar 22, 2026
f7f27e2
merge: resolve conflicts between refactor/agent and main
lppp04808 Mar 22, 2026
7ba8682
Merge branch 'refactor/agent' into feat/subturn-poc
lppp04808 Mar 22, 2026
7868c58
fix(agent): fix subturn panic result, hard abort rollback, and drain …
lppp04808 Mar 22, 2026
729a878
Merge pull request #1636 from lppp04808/feat/subturn-poc
yinwm Mar 22, 2026
c48954d
merge: sync main into refactor/agent
yinwm Mar 22, 2026
1984bb5
fix(test): mock gateway health check in status tests
yinwm Mar 22, 2026
724cc1b
fix: resolve merge conflict markers in README files
yinwm Mar 22, 2026
6df5ea1
docs: add `picoclaw model` command to CLI Reference
yinwm Mar 22, 2026
6f1737e
docs: sync CLI Reference across all README translations
yinwm Mar 22, 2026
5790d3e
docs(it): add model command to CLI Reference
yinwm Mar 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,5 +60,6 @@ cmd/telegram/
web/backend/dist/*
!web/backend/dist/.gitkeep

.claude/

docker/data
docker/data
1 change: 1 addition & 0 deletions README.id.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@ Hubungkan Picoclaw ke Jaringan Sosial Agent hanya dengan mengirim satu pesan mel
| `picoclaw gateway` | Mulai gateway |
| `picoclaw status` | Tampilkan status |
| `picoclaw version` | Tampilkan info versi |
| `picoclaw model` | Lihat atau ubah model default |
| `picoclaw cron list` | Daftar semua tugas terjadwal |
| `picoclaw cron add ...` | Tambah tugas terjadwal |
| `picoclaw cron disable` | Nonaktifkan tugas terjadwal |
Expand Down
1 change: 1 addition & 0 deletions README.it.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@ Connetti PicoClaw al Social Network degli Agent semplicemente inviando un singol
| `picoclaw gateway` | Avvia il gateway |
| `picoclaw status` | Mostra lo stato |
| `picoclaw version` | Mostra le info sulla versione |
| `picoclaw model` | Mostra o cambia il modello predefinito |
| `picoclaw cron list` | Elenca tutti i job pianificati |
| `picoclaw cron add ...` | Aggiunge un job pianificato |
| `picoclaw cron disable` | Disabilita un job pianificato |
Expand Down
644 changes: 644 additions & 0 deletions README.md

Large diffs are not rendered by default.

26 changes: 19 additions & 7 deletions cmd/picoclaw/internal/onboard/helpers_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,32 @@ import (
"testing"
)

func TestCopyEmbeddedToTargetUsesAgentsMarkdown(t *testing.T) {
func TestCopyEmbeddedToTargetUsesStructuredAgentFiles(t *testing.T) {
targetDir := t.TempDir()

if err := copyEmbeddedToTarget(targetDir); err != nil {
t.Fatalf("copyEmbeddedToTarget() error = %v", err)
}

agentsPath := filepath.Join(targetDir, "AGENTS.md")
if _, err := os.Stat(agentsPath); err != nil {
t.Fatalf("expected %s to exist: %v", agentsPath, err)
agentPath := filepath.Join(targetDir, "AGENT.md")
if _, err := os.Stat(agentPath); err != nil {
t.Fatalf("expected %s to exist: %v", agentPath, err)
}

legacyPath := filepath.Join(targetDir, "AGENT.md")
if _, err := os.Stat(legacyPath); !os.IsNotExist(err) {
t.Fatalf("expected legacy file %s to be absent, got err=%v", legacyPath, err)
soulPath := filepath.Join(targetDir, "SOUL.md")
if _, err := os.Stat(soulPath); err != nil {
t.Fatalf("expected %s to exist: %v", soulPath, err)
}

userPath := filepath.Join(targetDir, "USER.md")
if _, err := os.Stat(userPath); err != nil {
t.Fatalf("expected %s to exist: %v", userPath, err)
}

for _, legacyName := range []string{"AGENTS.md", "IDENTITY.md"} {
legacyPath := filepath.Join(targetDir, legacyName)
if _, err := os.Stat(legacyPath); !os.IsNotExist(err) {
t.Fatalf("expected legacy file %s to be absent, got err=%v", legacyPath, err)
}
}
}
9 changes: 9 additions & 0 deletions config/config.example.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
"restrict_to_workspace": true,
"model_name": "gpt-5.4",
"max_tokens": 8192,
"context_window": 131072,
"temperature": 0.7,
"max_tool_iterations": 20,
"summarize_message_threshold": 20,
Expand Down Expand Up @@ -549,6 +550,14 @@
"voice": {
"echo_transcription": false
},
"hooks": {
"enabled": true,
"defaults": {
"observer_timeout_ms": 500,
"interceptor_timeout_ms": 5000,
"approval_timeout_ms": 60000
}
},
"gateway": {
"host": "127.0.0.1",
"port": 18790,
Expand Down
164 changes: 164 additions & 0 deletions docs/agent-refactor/context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Context

## What this document covers

This document makes explicit the boundaries of context management in the agent loop:

- what fills the context window and how space is divided
- what is stored in session history vs. built at request time
- when and how context compression happens
- how token budgets are estimated

These are existing concepts. This document clarifies their boundaries rather than introducing new ones.

---

## Context window regions

The context window is the model's total input capacity. Four regions fill it:

| Region | Assembled by | Stored in session? |
|---|---|---|
| System prompt | `BuildMessages()` — static + dynamic parts | No |
| Summary | `SetSummary()` stores it; `BuildMessages()` injects it | Separate from history |
| Session history | User / assistant / tool messages | Yes |
| Tool definitions | Provider adapter injects at call time | No |

`MaxTokens` (the output generation limit) must also be reserved from the total budget.

The available space for history is therefore:

```
history_budget = ContextWindow - system_prompt - summary - tool_definitions - MaxTokens
```

---

## ContextWindow vs MaxTokens

These serve different purposes:

- **MaxTokens** — maximum tokens the LLM may generate in one response. Sent as the `max_tokens` request parameter.
- **ContextWindow** — the model's total input context capacity.

These were previously set to the same value, which caused the summarization threshold to fire either far too early (at the default 32K) or not at all (when a user raised `max_tokens`).

Current default when not explicitly configured: `ContextWindow = MaxTokens * 4`.

---

## Session history

Session history stores only conversation messages:

- `user` — user input
- `assistant` — LLM response (may include `ToolCalls`)
- `tool` — tool execution results

Session history does **not** contain:

- System prompts — assembled at request time by `BuildMessages`
- Summary content — stored separately via `SetSummary`, injected by `BuildMessages`

This distinction matters: any code that operates on session history — compression, boundary detection, token estimation — must not assume a system message is present.

---

## Turn

A **Turn** is one complete cycle:

> user message -> LLM iterations (possibly including tool calls) -> final assistant response

This definition comes from the agent loop design (#1316). In session history, Turn boundaries are identified by `user`-role messages.

Turn is the atomic unit for compression. Cutting inside a Turn can orphan tool-call sequences — an assistant message with `ToolCalls` separated from its corresponding `tool` results. Compressing at Turn boundaries avoids this by construction.

`parseTurnBoundaries(history)` returns the starting index of each Turn.
`findSafeBoundary(history, targetIndex)` snaps a target cut point to the nearest Turn boundary.

---

## Compression paths

Three compression paths exist, in order of preference:

### 1. Async summarization

`maybeSummarize` runs after each Turn completes.

Triggers when message count exceeds a threshold, or when estimated history tokens exceed a percentage of `ContextWindow`. If triggered, a background goroutine calls the LLM to produce a summary of the oldest messages. The summary is stored via `SetSummary`; `BuildMessages` injects it into the system prompt on the next call.

Cut point uses `findSafeBoundary` so no Turn is split.

### 2. Proactive budget check

`isOverContextBudget` runs before each LLM call.

Uses the full budget formula: `message_tokens + tool_def_tokens + MaxTokens > ContextWindow`. If over budget, triggers `forceCompression` and rebuilds messages before calling the LLM.

This prevents wasted (and billed) LLM calls that would otherwise fail with a context-window error.

### 3. Emergency compression (reactive)

`forceCompression` runs when the LLM returns a context-window error despite the proactive check.

Drops the oldest ~50% of Turns. If the history is a single Turn with no safe split point (e.g. one user message followed by a massive tool response), falls back to keeping only the most recent user message — breaking Turn atomicity as a last resort to avoid a context-exceeded loop.

Stores a compression note in the session summary (not in history messages) so `BuildMessages` can include it in the next system prompt.

This is the fallback for when the token estimate undershoots reality.

---

## Token estimation

Estimation uses a heuristic of ~2.5 characters per token (`chars * 2 / 5`).

`estimateMessageTokens` counts:

- `Content` (rune count, for multibyte correctness)
- `ReasoningContent` (extended thinking / chain-of-thought)
- `ToolCalls` — ID, type, function name, arguments
- `ToolCallID` (tool result metadata)
- Per-message overhead (role label, JSON structure)
- `Media` items — flat per-item token estimate, added directly to the final count (not through the character heuristic, since actual cost depends on resolution and provider-specific image tokenization)

`estimateToolDefsTokens` counts tool definition overhead: name, description, JSON schema of parameters.

These are deliberately heuristic. The proactive check handles the common case; the reactive path catches estimation errors.

---

## Interface boundaries

Context budget functions (`parseTurnBoundaries`, `findSafeBoundary`, `estimateMessageTokens`, `isOverContextBudget`) are **pure functions**. They take `[]providers.Message` and integer parameters. They have no dependency on `AgentLoop` or any other runtime struct.

`BuildMessages` is the sole assembler of the final message array sent to the LLM. Budget functions inform compression decisions but do not construct messages.

`forceCompression` and `summarizeSession` mutate session state (history and summary). `BuildMessages` reads that state to construct context. The flow is:

```
budget check --> compression decision --> mutate session --> BuildMessages reads session --> LLM call
```

---

## Known gaps

These are recognized limitations in the current implementation, documented here for visibility:

- **Summarization trigger does not use the full budget formula.** `maybeSummarize` compares estimated history tokens against a percentage of `ContextWindow`. It does not account for system prompt size, tool definition overhead, or `MaxTokens` reserve. The proactive check covers the critical path (preventing 400 errors), but the summarization trigger could be aligned with the same budget model for more accurate early compression.

- **Token estimation is heuristic.** It does not account for provider-specific tokenization, exact system prompt size (assembled separately), or variable image token costs. The two-path design (proactive + reactive) is intended to tolerate this imprecision.

- **Reactive retry does not preserve media.** When the reactive path rebuilds context after compression, it currently passes empty values for media references. This is a pre-existing issue in the main loop, not introduced by the budget system.

---

## What this document does not cover

- How `AGENT.md` frontmatter configures context parameters — that is part of the Agent definition work
- How the context builder assembles context in the new architecture — that is upcoming work
- How compression events surface through the event system — that is part of the event model (#1316)
- Subagent context isolation — that is a separate track
Loading
Loading