Introduce rollout store and in-memory source#13096
Introduce rollout store and in-memory source#13096charley-oai wants to merge 34 commits intomainfrom
Conversation
|
@codex review this |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1fded1ebb9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
1fded1e to
3e225c1
Compare
|
@codex review this |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4d6ad0a04a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
4d6ad0a to
3a84aa9
Compare
f83d40f to
1c5141e
Compare
|
@codex review this |
💡 Codex Reviewcodex/codex-rs/core/src/rollout/store.rs Line 605 in 1c5141e
ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review this |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d7a0d5b7f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review this |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1405b7cd52
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review this |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 45cb6d9a77
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review for no behavior change |
|
@codex review for correctness |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a163f0536b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
This updates the `skill-creator` sample skill to explicitly cover forward-testing as part of the skill authoring workflow. The guidance now treats subagent-based validation as a first-class step for complex or fragile skills, with an emphasis on preserving evaluation integrity and avoiding leaked context. The sample initialization script is also updated so newly created skills point authors toward forward-testing after validation. Together, these changes make the sample more opinionated about how skills should be iterated on once the initial implementation is complete. - Add new guidance to `SKILL.md` on protecting validation integrity, when to use subagents for forward-testing, and how to structure realistic test prompts without leaking expected answers. - Expand the skill creation workflow so iteration explicitly includes forward-testing for complex skills, including approval guidance for expensive or risky validation runs.
## Summary - add the OpenAI Docs skill under codex-rs/skills/src/assets/samples/openai-docs - include the skill metadata, assets, and GPT-5.4 upgrade reference files - exclude the test harness and test fixtures ## Testing - not run (skill-only asset copy)
## Summary - add ARC monitor support for MCP tool calls by serializing MCP approval requests into the ARC action shape and sending the relevant conversation/policy context to the `/api/codex/safety/arc` endpoint - route ARC outcomes back into MCP approval flow so `ask-user` falls back to a user prompt and `steer-model` blocks the tool call, with guardian/ARC tests covering the new request shape - update the TUI approval copy from “Approve Once” to “Allow” / “Allow for this session” and refresh the related snapshots --------- Co-authored-by: Fouad Matin <[email protected]> Co-authored-by: Fouad Matin <[email protected]>
Add forceRemoteSync to plugin/list. When it is set to True, we will sync the local plugin status with the remote one (backend-api/plugins/list).
#14232) …e package ## Summary This changes the Python SDK packaging model so we no longer commit `codex` binaries into `sdk/python`. Instead, published SDK builds now depend on a separate `codex-cli-bin` runtime package that carries the platform-specific `codex` binary. The SDK and runtime can be staged together with an exact version pin, so the published Python SDK still resolves to a Codex version we know is compatible. The SDK now resolves `codex` in this order: - `AppServerConfig.codex_bin` if explicitly set - installed `codex-cli-bin` runtime package There is no `PATH` fallback anymore. Published installs either use the pinned runtime or fail loudly, and local development uses an explicit `codex_bin` override when working from the repo. ## What changed - removed checked-in binaries from `sdk/python/src/codex_app_server/bin` - changed `AppServerClient` to resolve `codex` from: - explicit `AppServerConfig.codex_bin` - installed `codex-cli-bin` - kept `AppServerConfig.codex_bin` override support for local/dev use - added a new `sdk/python-runtime` package template for the pinned runtime - updated `scripts/update_sdk_artifacts.py` to stage releasable SDK/runtime packages instead of downloading binaries into the repo - made `codex-cli-bin` build as a platform-specific wheel - made `codex-cli-bin` wheel-only by rejecting `sdist` builds - updated docs/tests to match the new packaging flow and explicit local-dev contract ## Why Checking in six platform binaries made the repo much heavier and tied normal source changes to release artifacts. This keeps the compatibility guarantees we want, but moves them into packaging: - the published SDK can depend on an exact `codex-cli-bin==...` - the runtime package carries the platform-specific binary - users still get a pinned runtime - the repo no longer needs to store those binaries It also makes the runtime contract stricter and more predictable: - published installs never silently fall back to an arbitrary `codex` on `PATH` - local development remains supported through explicit `codex_bin` - `codex-cli-bin` is distributed as platform wheels only, which avoids unsafe source-distribution installs for a package that embeds a prebuilt binary ## Validation - ran targeted Python SDK tests: - `python3 -m pytest sdk/python/tests/test_artifact_workflow_and_binaries.py sdk/python/tests/test_client_rpc_methods.py sdk/python/tests/test_contract_generation.py` - exercised the staging flow with a local dummy binary to verify SDK/runtime staging end to end - verified the staged runtime package builds a platform-specific wheel (`Root-Is-Purelib: false`) rather than a universal `py3-none-any` wheel - added test coverage for the explicit-only runtime resolution model - added test coverage that `codex-cli-bin` rejects `sdist` builds --------- Co-authored-by: sdcoffey <[email protected]>
- add `model` and `reasoning_effort` to the `spawn_agent` schema so the values pass through - validate requested models against `model.model` and only check that the selected model supports the requested reasoning effort --------- Co-authored-by: Codex <[email protected]>
image-gen feature will have the model saving to /tmp by default + at all times
…Reject (#14191) ## Summary This change makes `AskForApproval::Reject` gate correctly anywhere it appears inside otherwise-stable app-server protocol types. Previously, experimental gating for `approval_policy: Reject` was handled with request-specific logic in `ClientRequest` detection. That covered a few request params types, but it did not generalize to other nested uses such as `ProfileV2`, `Config`, `ConfigReadResponse`, or `ConfigRequirements`. This PR replaces that ad hoc handling with a generic nested experimental propagation mechanism. ## Testing seeing this when run app-server-test-client without experimental api enabled: ``` initialize response: InitializeResponse { user_agent: "codex-toy-app-server/0.0.0 (Mac OS 26.3.1; arm64) vscode/2.4.36 (codex-toy-app-server; 0.0.0)" } > { > "id": "50244f6a-270a-425d-ace0-e9e98205bde7", > "method": "thread/start", > "params": { > "approvalPolicy": { > "reject": { > "mcp_elicitations": false, > "request_permissions": true, > "rules": false, > "sandbox_approval": true > } > }, > "baseInstructions": null, > "config": null, > "cwd": null, > "developerInstructions": null, > "dynamicTools": null, > "ephemeral": null, > "experimentalRawEvents": false, > "mockExperimentalField": null, > "model": null, > "modelProvider": null, > "persistExtendedHistory": false, > "personality": null, > "sandbox": null, > "serviceName": null > } > } < { < "error": { < "code": -32600, < "message": "askForApproval.reject requires experimentalApi capability" < }, < "id": "50244f6a-270a-425d-ace0-e9e98205bde7" < } [verified] thread/start rejected approvalPolicy=Reject without experimentalApi ``` --------- Co-authored-by: celia-oai <[email protected]>
…de (#14236) Summary - drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers to keep MCP tooling focused on the final result shape - wire the new schema definitions through code mode, context, handlers, and spec modules so MCP tools serialize the exact output shape expected by the model - extend code mode tests to cover multiple MCP call scenarios and ensure the serialized data matches the new schema - refresh JS runner helpers and protocol models alongside the schema changes Testing - Not run (not requested)
|
@codex review this |
Co-authored-by: Codex <[email protected]>
Co-authored-by: Codex <[email protected]>
Co-authored-by: Codex <[email protected]>
Co-authored-by: Codex <[email protected]>
Co-authored-by: Codex <[email protected]>
Reinstate main's deferred rollout persistence behavior while keeping the shared rollout source groundwork. Materialize the rollout on first persisted user-visible activity, restore app-server's unmaterialized-thread handling, and keep in-memory rollout state aligned with persisted session metadata. Co-authored-by: Codex <[email protected]>
Borrow the in-memory rollout source when reconstructing resumed history, tool selection, and token usage instead of cloning the full rollout into a second buffer during startup. Co-authored-by: Codex <[email protected]>
Co-authored-by: Codex <[email protected]>
Co-authored-by: Codex <[email protected]>
Co-authored-by: Codex <[email protected]>
Co-authored-by: Codex <[email protected]>
Co-authored-by: Codex <[email protected]>
Co-authored-by: Codex <[email protected]>
Co-authored-by: Codex <[email protected]>
d969369 to
6e2682f
Compare
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Co-authored-by: Codex <[email protected]>
Co-authored-by: Codex <[email protected]>
Summary
RolloutStoreandInMemoryRolloutSourceincodex-rs/core/src/rollout/recorder.rs, and route rollout reconstruction plus related readers through theRolloutSourceinterface instead of assuming raw rollout-item vectors everywheremain's deferred rollout-file materialization semantics for fresh threads while tightening the startup / resume paths around the shared sourceInitialHistory::Resumed, fork startup, phase-1 memories, some metadata helpers) so later work can move them ontoRolloutSourceincrementallyWhy
This keeps the PR focused on one productive groundwork step for lazy rollout reading without pulling in the larger backtracking / retention design yet:
RolloutSourceoperations (inclusive_start_of_rollout_index,exclusive_end_of_rollout_index,iter_forward_from,iter_reverse_from)main's deferred-persistence behavior rather than changing rollout-file lifecycle semantics as part of this workNot In Scope
InitialHistory::Resumed(Vec<RolloutItem>)end-to-end with aRolloutSource-backed startup APITesting
just fmtjust fix -p codex-corecodex-core/codex-app-serverchecks while iterating on:Notes
load_rollout_itemsintentionally remains as a shim inrecorder.rsfor diff quality;InMemoryRolloutSource::load_from_pathdelegates to it for now.RolloutSource-backed cleanup should happen.