Skip to content

Introduce rollout store and in-memory source#13096

Open
charley-oai wants to merge 34 commits intomainfrom
cc/rollout-reconstruction-resumable-backtracking
Open

Introduce rollout store and in-memory source#13096
charley-oai wants to merge 34 commits intomainfrom
cc/rollout-reconstruction-resumable-backtracking

Conversation

@charley-oai
Copy link
Copy Markdown
Contributor

@charley-oai charley-oai commented Feb 28, 2026

Summary

  • introduce RolloutStore and InMemoryRolloutSource in codex-rs/core/src/rollout/recorder.rs, and route rollout reconstruction plus related readers through the RolloutSource interface instead of assuming raw rollout-item vectors everywhere
  • keep one process-local in-memory rollout source in sync with runtime writes and session-meta enrichment, so replay readers and persistence share the same ordered rollout state
  • preserve main's deferred rollout-file materialization semantics for fresh threads while tightening the startup / resume paths around the shared source
  • avoid resume-specific rollout duplication by passing preloaded sources where available and reconstructing resumed state without cloning the full rollout source
  • make the remaining eager boundaries explicit (InitialHistory::Resumed, fork startup, phase-1 memories, some metadata helpers) so later work can move them onto RolloutSource incrementally

Why

This keeps the PR focused on one productive groundwork step for lazy rollout reading without pulling in the larger backtracking / retention design yet:

  • core/runtime now has a shared in-memory rollout source instead of separate queue and replay views
  • reconstruction and related readers already speak in terms of RolloutSource operations (inclusive_start_of_rollout_index, exclusive_end_of_rollout_index, iter_forward_from, iter_reverse_from)
  • resume/startup paths can reuse preloaded rollout sources instead of reparsing or silently degrading
  • the branch remains compatible with main's deferred-persistence behavior rather than changing rollout-file lifecycle semantics as part of this work

Not In Scope

  • resumable backtracking state
  • retention-window trimming of older in-memory rollout items
  • replacing eager InitialHistory::Resumed(Vec<RolloutItem>) end-to-end with a RolloutSource-backed startup API
  • making phase-1 memories compaction-aware or fully token-budget-aware (that path now uses the source interface, but still serializes the full filtered rollout)

Testing

  • just fmt
  • just fix -p codex-core
  • targeted codex-core / codex-app-server checks while iterating on:
    • resume startup / ephemeral resume
    • rollout metadata extraction
    • rollout source ordering and session-meta sync
    • app-server thread start / read / resume / fork / archive paths
  • CI on the final branch SHA

Notes

  • load_rollout_items intentionally remains as a shim in recorder.rs for diff quality; InMemoryRolloutSource::load_from_path delegates to it for now.
  • startup git metadata is captured from startup time without blocking startup: collection begins eagerly and is awaited later when the session-meta line is written.
  • some startup consumers still require owned rollout items today; those callsites are marked with TODOs where the next RolloutSource-backed cleanup should happen.

@charley-oai
Copy link
Copy Markdown
Contributor Author

@codex review this

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1fded1ebb9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@charley-oai
Copy link
Copy Markdown
Contributor Author

@codex review this

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d6ad0a04a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@charley-oai charley-oai force-pushed the cc/rollout-reconstruction-resumable-backtracking branch from 4d6ad0a to 3a84aa9 Compare March 2, 2026 06:00
@charley-oai charley-oai changed the title Make rollout reconstruction resumable for backtracking Introduce in-memory rollout source Mar 2, 2026
@charley-oai charley-oai changed the title Introduce in-memory rollout source Introduce rollout store and in-memory source Mar 2, 2026
@charley-oai charley-oai force-pushed the cc/rollout-reconstruction-resumable-backtracking branch from f83d40f to 1c5141e Compare March 3, 2026 02:02
@charley-oai
Copy link
Copy Markdown
Contributor Author

@codex review this

@chatgpt-codex-connector
Copy link
Copy Markdown
Contributor

💡 Codex Review

let source = match Self::load_source(path.as_path()).await {

P2 Badge Avoid double-loading rollout files on resume

ThreadManager::resume_thread_from_rollout already materializes the full rollout with RolloutStore::get_rollout_history before session startup. In RolloutStore::new (resume path), load_source reads/parses the same file again. Large .jsonl sessions now pay ~2x file I/O and peak memory during resume, which can noticeably slow startup or exhaust memory for long threads.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@charley-oai
Copy link
Copy Markdown
Contributor Author

@codex review this

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d7a0d5b7f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@charley-oai
Copy link
Copy Markdown
Contributor Author

@codex review this

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1405b7cd52

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@charley-oai
Copy link
Copy Markdown
Contributor Author

@codex review this

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45cb6d9a77

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@charley-oai
Copy link
Copy Markdown
Contributor Author

@codex review for no behavior change

@charley-oai
Copy link
Copy Markdown
Contributor Author

@codex review for correctness

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a163f0536b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ebrevdo and others added 10 commits March 10, 2026 12:08
This updates the `skill-creator` sample skill to explicitly cover
forward-testing as part of the skill authoring workflow. The guidance
now treats subagent-based validation as a first-class step for complex
or fragile skills, with an emphasis on preserving evaluation integrity
and avoiding leaked context.

The sample initialization script is also updated so newly created skills
point authors toward forward-testing after validation. Together, these
changes make the sample more opinionated about how skills should be
iterated on once the initial implementation is complete.

- Add new guidance to `SKILL.md` on protecting validation integrity,
when to use subagents for forward-testing, and how to structure
realistic test prompts without leaking expected answers.
- Expand the skill creation workflow so iteration explicitly includes
forward-testing for complex skills, including approval guidance for
expensive or risky validation runs.
## Summary
- add the OpenAI Docs skill under
codex-rs/skills/src/assets/samples/openai-docs
- include the skill metadata, assets, and GPT-5.4 upgrade reference
files
- exclude the test harness and test fixtures

## Testing
- not run (skill-only asset copy)
## Summary
- add ARC monitor support for MCP tool calls by serializing MCP approval
requests into the ARC action shape and sending the relevant
conversation/policy context to the `/api/codex/safety/arc` endpoint
- route ARC outcomes back into MCP approval flow so `ask-user` falls
back to a user prompt and `steer-model` blocks the tool call, with
guardian/ARC tests covering the new request shape
- update the TUI approval copy from “Approve Once” to “Allow” / “Allow
for this session” and refresh the related
  snapshots

---------

Co-authored-by: Fouad Matin <[email protected]>
Co-authored-by: Fouad Matin <[email protected]>
Add forceRemoteSync to plugin/list.
When it is set to True, we will sync the local plugin status with the
remote one (backend-api/plugins/list).
#14232)

…e package

## Summary

This changes the Python SDK packaging model so we no longer commit
`codex`
binaries into `sdk/python`.

Instead, published SDK builds now depend on a separate `codex-cli-bin`
runtime
package that carries the platform-specific `codex` binary. The SDK and
runtime
can be staged together with an exact version pin, so the published
Python SDK
still resolves to a Codex version we know is compatible.

The SDK now resolves `codex` in this order:

- `AppServerConfig.codex_bin` if explicitly set
- installed `codex-cli-bin` runtime package

There is no `PATH` fallback anymore. Published installs either use the
pinned
runtime or fail loudly, and local development uses an explicit
`codex_bin`
override when working from the repo.

## What changed

- removed checked-in binaries from `sdk/python/src/codex_app_server/bin`
- changed `AppServerClient` to resolve `codex` from:
  - explicit `AppServerConfig.codex_bin`
  - installed `codex-cli-bin`
- kept `AppServerConfig.codex_bin` override support for local/dev use
- added a new `sdk/python-runtime` package template for the pinned
runtime
- updated `scripts/update_sdk_artifacts.py` to stage releasable
SDK/runtime
  packages instead of downloading binaries into the repo
- made `codex-cli-bin` build as a platform-specific wheel
- made `codex-cli-bin` wheel-only by rejecting `sdist` builds
- updated docs/tests to match the new packaging flow and explicit
local-dev
  contract

## Why

Checking in six platform binaries made the repo much heavier and tied
normal
source changes to release artifacts.

This keeps the compatibility guarantees we want, but moves them into
packaging:

- the published SDK can depend on an exact `codex-cli-bin==...`
- the runtime package carries the platform-specific binary
- users still get a pinned runtime
- the repo no longer needs to store those binaries

It also makes the runtime contract stricter and more predictable:

- published installs never silently fall back to an arbitrary `codex` on
`PATH`
- local development remains supported through explicit `codex_bin`
- `codex-cli-bin` is distributed as platform wheels only, which avoids
unsafe
source-distribution installs for a package that embeds a prebuilt binary

## Validation

- ran targeted Python SDK tests:
- `python3 -m pytest
sdk/python/tests/test_artifact_workflow_and_binaries.py
sdk/python/tests/test_client_rpc_methods.py
sdk/python/tests/test_contract_generation.py`
- exercised the staging flow with a local dummy binary to verify
SDK/runtime
  staging end to end
- verified the staged runtime package builds a platform-specific wheel
(`Root-Is-Purelib: false`) rather than a universal `py3-none-any` wheel
- added test coverage for the explicit-only runtime resolution model
- added test coverage that `codex-cli-bin` rejects `sdist` builds

---------

Co-authored-by: sdcoffey <[email protected]>
- add `model` and `reasoning_effort` to the `spawn_agent` schema so the
values pass through
- validate requested models against `model.model` and only check that
the selected model supports the requested reasoning effort

---------

Co-authored-by: Codex <[email protected]>
image-gen feature will have the model saving to /tmp by default + at all
times
…Reject (#14191)

## Summary
This change makes `AskForApproval::Reject` gate correctly anywhere it
appears inside otherwise-stable app-server protocol types.

Previously, experimental gating for `approval_policy: Reject` was
handled with request-specific logic in `ClientRequest` detection. That
covered a few request params types, but it did not generalize to other
nested uses such as `ProfileV2`, `Config`, `ConfigReadResponse`, or
`ConfigRequirements`.

This PR replaces that ad hoc handling with a generic nested experimental
propagation mechanism.

## Testing

seeing this when run app-server-test-client without experimental api
enabled:
```
 initialize response: InitializeResponse { user_agent: "codex-toy-app-server/0.0.0 (Mac OS 26.3.1; arm64) vscode/2.4.36 (codex-toy-app-server; 0.0.0)" }
> {
>   "id": "50244f6a-270a-425d-ace0-e9e98205bde7",
>   "method": "thread/start",
>   "params": {
>     "approvalPolicy": {
>       "reject": {
>         "mcp_elicitations": false,
>         "request_permissions": true,
>         "rules": false,
>         "sandbox_approval": true
>       }
>     },
>     "baseInstructions": null,
>     "config": null,
>     "cwd": null,
>     "developerInstructions": null,
>     "dynamicTools": null,
>     "ephemeral": null,
>     "experimentalRawEvents": false,
>     "mockExperimentalField": null,
>     "model": null,
>     "modelProvider": null,
>     "persistExtendedHistory": false,
>     "personality": null,
>     "sandbox": null,
>     "serviceName": null
>   }
> }
< {
<   "error": {
<     "code": -32600,
<     "message": "askForApproval.reject requires experimentalApi capability"
<   },
<   "id": "50244f6a-270a-425d-ace0-e9e98205bde7"
< }
[verified] thread/start rejected approvalPolicy=Reject without experimentalApi
```

---------

Co-authored-by: celia-oai <[email protected]>
…de (#14236)

Summary
- drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers
to keep MCP tooling focused on the final result shape
- wire the new schema definitions through code mode, context, handlers,
and spec modules so MCP tools serialize the exact output shape expected
by the model
- extend code mode tests to cover multiple MCP call scenarios and ensure
the serialized data matches the new schema
- refresh JS runner helpers and protocol models alongside the schema
changes

Testing
- Not run (not requested)
@charley-oai
Copy link
Copy Markdown
Contributor Author

@codex review this

charley-oai and others added 14 commits March 10, 2026 15:37
Reinstate main's deferred rollout persistence behavior while keeping the shared rollout source groundwork. Materialize the rollout on first persisted user-visible activity, restore app-server's unmaterialized-thread handling, and keep in-memory rollout state aligned with persisted session metadata.

Co-authored-by: Codex <[email protected]>
Borrow the in-memory rollout source when reconstructing resumed history, tool selection, and token usage instead of cloning the full rollout into a second buffer during startup.

Co-authored-by: Codex <[email protected]>
@charley-oai charley-oai force-pushed the cc/rollout-reconstruction-resumable-backtracking branch from d969369 to 6e2682f Compare March 10, 2026 22:39
@chatgpt-codex-connector
Copy link
Copy Markdown
Contributor

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.