Skip to content

feat: Reuse Codex CLI OAuth tokens for ChatGPT backend LLM calls#693

Merged
ilblackdragon merged 30 commits intonearai:stagingfrom
ZeroTrust01:codex_auth
Mar 16, 2026
Merged

feat: Reuse Codex CLI OAuth tokens for ChatGPT backend LLM calls#693
ilblackdragon merged 30 commits intonearai:stagingfrom
ZeroTrust01:codex_auth

Conversation

@ZeroTrust01
Copy link
Copy Markdown
Contributor

@ZeroTrust01 ZeroTrust01 commented Mar 7, 2026

Summary

This PR enables IronClaw to reuse OAuth authentication tokens from the Codex CLI's auth.json file, allowing users with an active OpenAI/ChatGPT subscription to use IronClaw without needing a separate API key. When enabled, IronClaw communicates directly with ChatGPT's private Responses API backend (chatgpt.com/backend-api/codex) using the subscriber's JWT token.

Motivation

Currently, IronClaw requires users to configure a separate LLM API key (e.g., from OpenAI Platform, Anthropic, or a third-party proxy). Users who already have a ChatGPT Plus/Pro/Team subscription and use the Codex CLI have valid OAuth tokens stored locally, but cannot leverage them with IronClaw. This PR bridges that gap, making it easy for ChatGPT subscribers to get started with IronClaw without additional API key setup.

Changes

New Files

  • src/codex_auth.rs — Reads and parses Codex CLI's ~/.codex/auth.json, supporting both apiKey and chatgpt authentication modes. Returns CodexCredentials with the token, mode flag, and correct base URL.

  • src/llm/codex_chatgpt.rs — New LlmProvider implementation (CodexChatGptProvider) that speaks the OpenAI Responses API protocol (POST /responses with SSE streaming), which is different from the standard Chat Completions API. Includes:

    • Auto-detection of the correct model from the /models endpoint (e.g., gpt-5.2-codex)
    • Message format translation from IronClaw's ChatMessage format to Responses API input items
    • SSE stream parsing for text deltas, function call arguments, and usage statistics
    • store: false flag required by the ChatGPT backend
    • Empty string stripping from tool call arguments (model-specific quirk)
    • Timeout-wrapped error handling to prevent silent hangs on HTTP errors

Modified Files

  • src/config/llm.rs — Added is_codex_chatgpt field to RegistryProviderConfig. When LLM_USE_CODEX_AUTH=true, Codex credentials are loaded with highest priority (over env vars and secrets store). In ChatGPT mode, the base URL is automatically overridden to the private backend endpoint.

  • src/llm/mod.rs — Added mod codex_chatgpt and dispatch logic in create_registry_provider to route to the new Responses API provider when is_codex_chatgpt is true.

  • src/lib.rs — Registered the codex_auth module.

Usage

# In ~/.ironclaw/.env:
LLM_USE_CODEX_AUTH=true
# Optionally specify a custom auth.json path:
# CODEX_AUTH_PATH=/path/to/auth.json

# Run IronClaw (model is auto-detected, no API key needed):
cargo run --release -- run --cli-only

The feature requires that the user has previously authenticated with the Codex CLI (codex --login), which stores the OAuth token in ~/.codex/auth.json.

Compatibility

  • No breaking changes. The feature is entirely opt-in via LLM_USE_CODEX_AUTH=true.
  • When LLM_USE_CODEX_AUTH=false (default), all existing LLM provider paths remain completely unchanged.
  • The is_codex_chatgpt flag only affects the provider selection when Codex ChatGPT mode credentials are detected; all other provider protocols (OpenAI-compatible, Anthropic, Ollama) are unaffected.
  • Standard OpenAI Platform API keys in auth.json (apiKey mode) are also supported and route through the existing OpenAI-compatible provider.

Testing

  • 8 unit tests added in codex_chatgpt.rs covering:
    • Message format conversion (user, assistant, tool calls, tool results)
    • System message extraction to instructions field
    • SSE response parsing (text deltas, tool call arguments)
    • store: false assertion
    • Empty string value stripping
  • 6 unit tests in codex_auth.rs covering auth.json parsing
  • Manual end-to-end testing confirmed successful LLM responses from the ChatGPT backend with auto-detected gpt-5.2-codex model

Background

This PR builds on the observation made during #384 (feat(setup): Anthropic OAuth onboarding), where Codex OAuth support was initially attempted but then intentionally removed with:

"refactor: remove Codex OAuth onboarding (incompatible with OpenAI API) — Codex CLI OAuth tokens use a different endpoint (chatgpt.com/backend-api/codex) and the Responses API wire format, not api.openai.com with Chat Completions. The tokens lack the model.request scope needed for the platform API, so they can't be used as drop-in OPENAI_API_KEY replacements."

This PR solves that exact problem by implementing a dedicated CodexChatGptProvider that speaks the Responses API protocol natively (POST /responses with SSE streaming), rather than trying to use the Codex OAuth token as a standard OpenAI API key. This approach correctly handles the different wire format and endpoint.

@github-actions github-actions bot added scope: llm LLM integration size: XL 500+ changed lines risk: low Changes to docs, tests, or low-risk modules contributor: new First-time contributor labels Mar 7, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances IronClaw's authentication flexibility by allowing it to utilize existing OpenAI Codex CLI OAuth tokens. This means users with an active ChatGPT subscription can now seamlessly integrate IronClaw without the need for additional API key setup, as IronClaw can directly interact with ChatGPT's private backend API. The change introduces new modules for credential management and a dedicated LLM provider for the ChatGPT Responses API, making the system more convenient for a broader user base.

Highlights

  • Codex CLI OAuth Token Reuse: IronClaw can now reuse OAuth authentication tokens from the OpenAI Codex CLI's auth.json file, enabling ChatGPT subscribers to use IronClaw without a separate API key.
  • Direct ChatGPT Backend Integration: A new LLM provider (CodexChatGptProvider) has been added to communicate directly with ChatGPT's private Responses API backend (chatgpt.com/backend-api/codex), bypassing the standard Chat Completions API.
  • Codex Credentials Loading: A new module (src/codex_auth.rs) was introduced to read and parse auth.json, supporting both apiKey and chatgpt authentication modes and determining the correct base URL.
  • Configuration and Opt-in: The feature is entirely opt-in via the LLM_USE_CODEX_AUTH=true environment variable, ensuring no breaking changes to existing LLM provider configurations.
  • Robust SSE Parsing and Tool Handling: The new provider includes advanced SSE stream parsing for text deltas, function call arguments, and usage statistics, along with a mechanism to strip empty string values from tool arguments to prevent validation issues.
Changelog
  • src/codex_auth.rs
    • Added a new module to read and parse Codex CLI's auth.json file.
    • Implemented logic to extract credentials for both apiKey and chatgpt authentication modes.
    • Defined constants for ChatGPT backend and standard OpenAI API URLs.
    • Included utility functions for default path resolution and credential loading.
    • Added unit tests to cover various credential loading scenarios.
  • src/config/llm.rs
    • Added is_codex_chatgpt field to RegistryProviderConfig to flag the use of the new provider.
    • Modified LlmConfig::load to prioritize Codex credentials when LLM_USE_CODEX_AUTH is enabled.
    • Implemented logic to override the base URL to the ChatGPT backend when in ChatGPT mode.
  • src/lib.rs
    • Registered the new codex_auth module.
  • src/llm/codex_chatgpt.rs
    • Added a new LlmProvider implementation (CodexChatGptProvider) for the ChatGPT Responses API.
    • Implemented message format translation from IronClaw's ChatMessage to Responses API input items.
    • Included auto-detection of the default model from the /models endpoint.
    • Developed SSE stream parsing for text deltas, function call arguments, and usage statistics.
    • Ensured store: false is set in requests and added a function to strip empty string values from JSON objects.
    • Added comprehensive unit tests for message conversion, request body building, SSE parsing, and empty string stripping.
  • src/llm/mod.rs
    • Registered the new codex_chatgpt module.
    • Updated create_registry_provider to dispatch to the CodexChatGptProvider based on the is_codex_chatgpt flag.
    • Introduced create_codex_chatgpt_from_registry to handle the instantiation and model auto-detection for the new provider.
Activity
  • No human activity has occurred on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a great new feature to reuse Codex CLI's OAuth tokens, which simplifies setup for many users. The implementation is well-structured, with new modules for authentication and the custom LLM provider. I've found a couple of areas for improvement. There's a high-severity bug in the credential loading logic that could lead to using the wrong LLM provider, and a medium-severity performance issue related to reqwest::Client instantiation. My detailed comments include code suggestions to address these points. Overall, this is a solid contribution.

Comment on lines +101 to +109
if !is_chatgpt {
if let Some(key) = auth.openai_api_key.filter(|k| !k.is_empty()) {
tracing::info!("Loaded API key from Codex auth.json (API key mode)");
return Some(CodexCredentials {
token: key,
is_chatgpt_mode: false,
});
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic has a bug where if auth_mode is "apiKey" but the key is missing, it incorrectly falls through to check for a ChatGPT token. This can lead to returning credentials with is_chatgpt_mode: true, causing the wrong LLM provider to be used.

To fix this, we should explicitly prevent the fallback when auth_mode is set. If auth_mode is "apiKey" and no key is found, the function should return None.

    if !is_chatgpt {
        if let Some(key) = auth.openai_api_key.filter(|k| !k.is_empty()) {
            tracing::info!("Loaded API key from Codex auth.json (API key mode)");
            return Some(CodexCredentials {
                token: key,
                is_chatgpt_mode: false,
            });
        }
        // If auth_mode was explicitly `apiKey`, do not fall back to checking for a token.
        if auth.auth_mode.is_some() {
            return None;
        }
    }

Comment on lines +56 to +58
async fn fetch_default_model(base_url: &str, api_key: &str) -> Option<String> {
let url = format!("{base_url}/models?client_version=0.1.0");
let client = Client::new();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating a new reqwest::Client for a single request is inefficient. The client manages a connection pool and is designed to be reused. This function is called from with_auto_model, which then calls Self::new, creating another client. This results in two clients being created to instantiate one provider.

To improve this, the client should be created once in with_auto_model, passed as an argument to this function, and then moved into the CodexChatGptProvider instance. This will require modifying with_auto_model as well.

Suggested change
async fn fetch_default_model(base_url: &str, api_key: &str) -> Option<String> {
let url = format!("{base_url}/models?client_version=0.1.0");
let client = Client::new();
async fn fetch_default_model(client: &Client, base_url: &str, api_key: &str) -> Option<String> {
let url = format!("{base_url}/models?client_version=0.1.0");
References
  1. While the general rule advises against reqwest::Client reuse for one-time or infrequent operations with negligible benefit and high churn, this specific scenario involves creating two clients for a single provider instantiation, indicating a clear inefficiency where the benefit of reuse is not negligible.

When LLM_USE_CODEX_AUTH=true, IronClaw reads the Codex CLI's auth.json
(default ~/.codex/auth.json) and extracts the API key or OAuth access
token. This lets IronClaw piggyback on a Codex login without
implementing its own OAuth flow.

New env vars:
  - LLM_USE_CODEX_AUTH: enable Codex auth fallback (default: false)
  - CODEX_AUTH_PATH: override path to auth.json
Switch base_url to chatgpt.com/backend-api/codex when auth.json
contains ChatGPT OAuth tokens. The access_token is a JWT that only
works against the private ChatGPT backend, not the public OpenAI API.

Refactored codex_auth.rs to return CodexCredentials (token +
is_chatgpt_mode) instead of just a string key.
When LLM_USE_CODEX_AUTH=true, Codex credentials are now loaded before
checking env vars or the secrets store overlay. Previously the secrets
store key (injected during onboarding) would shadow the Codex token.
- New CodexChatGptProvider speaks the Responses API protocol
- Auto-detects model from /models endpoint (gpt-4o -> gpt-5.2-codex)
- Adds store=false (required by ChatGPT backend)
- Error handling with timeout for HTTP 400 responses
- Message format translation: Chat Completions -> Responses API
- SSE response parsing for text, tool calls, and usage stats
- 7 unit tests for message conversion and SSE parsing
The Responses API sends function_call_arguments.delta events with
item_id (e.g. fc_...) not call_id (e.g. call_...). The parser now
keys pending tool calls by item_id from output_item.added and
tracks call_id separately for result matching.
gpt-5.2-codex fills optional tool parameters with empty strings
(e.g. timestamp: ""), which IronClaw's tool validation rejects.
Strip them before passing to tool execution.
When auth_mode is explicitly 'apiKey' but the key is missing/empty,
do not fall through to check for a ChatGPT access_token. This prevents
returning credentials with is_chatgpt_mode: true and routing to the
wrong LLM provider.
… calls

Create Client once in with_auto_model, pass &Client to
fetch_default_model, and move it into the provider struct.
Eliminates the redundant Client::new() that wasted a connection pool.
The /models endpoint gates newer models behind client_version.
Version 0.1.0 only returns up to gpt-5.2-codex, while 1.0.0+
also returns gpt-5.3-codex and gpt-5.4.
Fetch the full model list from /models endpoint. If LLM_MODEL is set,
validate it against the supported list and warn with available models
if not found. If LLM_MODEL is not set, auto-detect the highest-priority
model. Also bumps client_version to 1.0.0 to unlock gpt-5.3/5.4.
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecturally sound -- correctly identifies that Codex ChatGPT tokens need a dedicated provider (different wire format, /responses SSE endpoint vs /chat/completions). Good test coverage (14 tests). No new dependencies.

Issues to address:

1. No timeout on fetch_available_models (must fix). The model discovery request has no timeout. If chatgpt.com is slow or hangs, startup blocks indefinitely. Add a timeout (e.g., 10 seconds).

2. Undocumented private API (must document). The ChatGPT backend endpoint (chatgpt.com/backend-api/codex) is a private, undocumented API. Using subscriber OAuth tokens from a third-party app may violate the token's intended scope or OpenAI's ToS. Add a prominent warning (both in docs and as a log message on first use).

3. No token refresh handling (should fix). OAuth access tokens expire but the PR only reads access_token, not refresh_token. Either implement refresh or document that users need to re-run codex --login when the token expires.

4. block_in_place for async model detection (should fix). Consider making create_registry_provider async or doing model detection lazily on first call.

5. Multimodal content silently dropped (should fix). message_to_input_items only reads msg.content (text). If content_parts contains images, they're silently dropped. At minimum log a warning.

6. Module placement (minor). codex_auth.rs at crate root -- should be under src/llm/ since it's only used by LLM config/provider code.

Prevents startup from blocking indefinitely if chatgpt.com
is slow or unreachable. Uses reqwest per-request timeout.
The chatgpt.com/backend-api/codex endpoint is private and
undocumented. Add warning in module docs and a runtime log
on first use to inform users of potential ToS implications.
On HTTP 401, if a refresh_token is available, the provider now
automatically refreshes the access token via auth.openai.com/oauth/token
(same protocol as Codex CLI) and retries the request once. Refreshed
tokens are persisted back to auth.json.

Changes:
- codex_auth: read refresh_token, add refresh_access_token() and
  persist_refreshed_tokens()
- codex_chatgpt: RwLock for api_key, 401 detection + retry in
  send_request, send_http_request helper
- config/llm: thread refresh_token/auth_path through RegistryProviderConfig
- llm/mod: pass refresh params to with_auto_model
Model is no longer resolved during provider construction. Instead,
resolve_model() uses tokio::sync::OnceCell to lazily fetch from
/models on the first LLM call. This eliminates the block_in_place
+ block_on workaround in create_codex_chatgpt_from_registry.

- with_auto_model (async) -> with_lazy_model (sync constructor)
- resolve_model() added with OnceCell-based lazy init
- build_request_body takes model as parameter
- model_name() returns resolved or configured_model as fallback
message_to_input_items now checks content_parts for user messages.
ContentPart::Text maps to input_text and ContentPart::ImageUrl maps
to input_image, matching the Responses API format used by Codex CLI.
Falls back to plain text when content_parts is empty.

Also updates client_version to 0.111.0 for /models endpoint.

Adds test: test_message_conversion_user_with_image
codex_auth is only used by the LLM layer (codex_chatgpt provider
and config/llm). Moving it under src/llm/ reflects its actual scope.

- Remove pub mod codex_auth from lib.rs
- Add pub mod codex_auth to llm/mod.rs
- Update imports: super::codex_auth, crate::llm::codex_auth
@ZeroTrust01
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review! All 6 issues have been addressed in the latest push:

1. No timeout on fetch_available_models (fixed)
Added a 10-second reqwest timeout on the HTTP GET request in fetch_available_models. Commit: 5b90676

2. Undocumented private API (documented)
Added a prominent tracing::warn! on provider construction and expanded module-level doc comments warning about the private chatgpt.com/backend-api/codex endpoint and ToS implications. Commit: 3c8a381

3. No token refresh handling (fixed — passive refresh)
Implemented 401-based passive token refresh following the same protocol as Codex CLI:

  • codex_auth.rs: Added refresh_access_token() — POST to auth.openai.com/oauth/token with the same CLIENT_ID and grant_type
  • codex_chatgpt.rs: On HTTP 401, refresh the token via RwLock, persist to auth.json, and retry once
  • Threaded refresh_token / auth_path through RegistryProviderConfig

Commit: 0f73dfd

4. block_in_place for async model detection (fixed — lazy init)
Replaced eager with_auto_model (async) with synchronous with_lazy_model. Model is now resolved lazily on the first LLM call via tokio::sync::OnceCell. block_in_place + block_on completely removed from create_codex_chatgpt_from_registry. Commit: c8c97cf

5. Multimodal content silently dropped (fixed — full support)
Updated message_to_input_items to convert ContentPart::ImageUrl{"type": "input_image", "image_url": ...} matching Codex CLI's Responses API format. Added test test_message_conversion_user_with_image. Commit: a2d2d13

6. Module placement (fixed)
Moved codex_auth.rs from src/ to src/llm/ and updated all import paths. Commit: 0f4d8aa

All 16 tests pass (7 codex_auth + 9 codex_chatgpt).

@ZeroTrust01 ZeroTrust01 requested a review from zmanian March 9, 2026 12:06
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review of PR #693 (17 commits, 1232 additions)

The 6 issues from the initial review have all been addressed. The rework is solid: lazy model detection via OnceCell, 401-based token refresh with auth.json persistence, multimodal content support, ToS warning, module moved to src/llm/. Good work.

However, this re-review found several new issues that need fixing before merge.


Issues to fix

1. refresh_token stored as plain String, not SecretString (must fix)

In RegistryProviderConfig, refresh_token is Option<String>. In CodexCredentials, both token and refresh_token are plain String. The access token gets wrapped in SecretString at the config boundary, but the refresh token never does. This means:

  • The refresh token will appear in debug logs if RegistryProviderConfig is ever debug-printed
  • It persists in memory as a cleartext String with no zeroize-on-drop

Both refresh_token in RegistryProviderConfig and in CodexCredentials should be SecretString (or at minimum Option<SecretString>). The CodexChatGptProvider::refresh_token field should follow suit.

2. No timeout on send_http_request for LLM responses (must fix)

send_http_request() (line ~879 of codex_chatgpt.rs) posts to /responses with no timeout. The fetch_available_models correctly has a 10-second timeout, but the actual LLM request -- which reads an SSE stream that could hang indefinitely -- has none. The reqwest::Client is constructed with Client::new() which has no default timeout.

Add a configurable timeout (respect the project's LLM_REQUEST_TIMEOUT_SECS setting, defaulting to 120s) on the HTTP client or on send_http_request.

3. Race condition in token refresh (should fix)

In send_request(), on a 401:

  1. Read lock to get api_key
  2. Send request, get 401
  3. Call codex_auth::refresh_access_token(rt, ...)
  4. Write lock to update api_key
  5. Retry

If two concurrent requests both get a 401 simultaneously, both will attempt to refresh the token in parallel. The second refresh may fail (refresh tokens are often single-use) or produce a different token, creating a race. The NEAR AI provider handles this by checking if the token has already been refreshed (compare the token that failed with the current stored token) before attempting refresh. Consider adding similar guard logic.

4. reqwest::Client not reused (minor, Gemini also flagged this)

Client::new() is called in the constructor and again in refresh_access_token(). The refresh function creates a throwaway client for each refresh. Pass the provider's self.client to the refresh function, or make refresh_access_token accept a &Client parameter.

5. auth.json persistence is not atomic (should fix)

persist_refreshed_tokens() does read_to_string -> modify -> write. If the process crashes mid-write, auth.json is corrupted. Use write-to-temp-then-rename (atomic write pattern). The tempfile crate is already a dependency. Example: write to a sibling temp file, then std::fs::rename().

6. SSE parsing reads entire response body into memory (minor)

resp.text().await buffers the entire SSE stream. For normal responses this is fine, but a misbehaving server could send unbounded data. Consider adding a body size limit via resp.bytes() with a cap, or using streaming SSE parsing. Low priority since this is behind an opt-in flag.

7. No CLAUDE.md / module spec update (must fix per project rules)

Per the project's LLM module docs (src/llm/CLAUDE.md), new providers must be documented there. The file map table, provider selection table, and provider chain sections need updating to mention codex_chatgpt. Also update .env.example with LLM_USE_CODEX_AUTH documentation.


Overlap with PR #744 (OpenAI Codex provider)

PR #744 adds a first-class LLM_BACKEND=openai_codex with its own OAuth device-code login flow, session persistence, and token-refreshing decorator. Both PRs target the same endpoint (chatgpt.com/backend-api/codex) and implement similar SSE parsing for the Responses API.

Key differences:

  • This PR (#693) piggybacks on Codex CLI's auth.json -- zero IronClaw-side OAuth flow
  • PR #744 implements a full device-code OAuth flow within IronClaw itself

These are complementary but will conflict heavily on merge. The SSE parsing, message conversion, and model detection logic are near-identical. Whoever merges second will need to deduplicate. Recommend coordinating: either merge one first and rebase the other, or extract the shared Responses API client into a common module that both can use.


What looks good

  • All 6 original review items addressed correctly
  • 16 tests (7 codex_auth + 9 codex_chatgpt) with good edge case coverage
  • No .unwrap() / .expect() in production code
  • Lazy model detection via OnceCell avoids blocking during provider setup
  • store: false correctly set (ChatGPT backend requirement)
  • strip_empty_string_values is a pragmatic fix for gpt-5.2-codex quirk
  • Gemini's apiKey-mode fallback bug was caught and fixed with a test
  • Module correctly placed in src/llm/
  • Feature is opt-in (LLM_USE_CODEX_AUTH=true), no breaking changes

Verdict: REQUEST_CHANGES

Items 1 (SecretString for refresh_token), 2 (LLM request timeout), and 7 (CLAUDE.md update) are required. Items 3 (race condition) and 5 (atomic write) are strongly recommended. Items 4 and 6 are nice-to-haves.

Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 6 previous review items addressed (timeout on model fetch, ToS warning, 401-based refresh, lazy resolution, multimodal content, module location). However, new issues found:

  1. refresh_token stored as plain String -- Should be SecretString for zeroize-on-drop. The project uses secrecy::SecretString extensively for sensitive values.

  2. No timeout on LLM HTTP requests -- send_http_request() has no timeout. Model fetch has one, but the actual LLM call doesn't.

  3. No CLAUDE.md update -- Project rules require updating src/llm/CLAUDE.md and .env.example for new providers.

Strongly recommended:

  • Race condition in concurrent token refresh (two 401s could both refresh, second may fail with single-use refresh tokens)
  • Non-atomic auth.json write (crash mid-write corrupts the file; use write-to-temp-then-rename)

Note: Overlaps significantly with PR #744 -- both target the same endpoint with near-identical SSE parsing. Should coordinate to extract shared Responses API client.

@github-actions github-actions bot added risk: low Changes to docs, tests, or low-risk modules and removed risk: medium Business logic, config, or moderate-risk modules labels Mar 10, 2026
@github-actions github-actions bot added scope: dependencies Dependency updates risk: medium Business logic, config, or moderate-risk modules and removed risk: low Changes to docs, tests, or low-risk modules labels Mar 10, 2026
@ZeroTrust01
Copy link
Copy Markdown
Contributor Author

Thanks for the detailed re-review. The issues you called out have now been addressed on the latest revision.

Fixed:

  1. Secret handling: refresh_token is now Option<SecretString> throughout the Codex path, and access_token is also carried as SecretString end-to-end (auth.json decode structs, CodexCredentials, RegistryProviderConfig, and CodexChatGptProvider).
  2. Request timeout: /responses requests now use the configured LLM_REQUEST_TIMEOUT_SECS value (default 120s) via send_http_request(..., timeout).
  3. Concurrent refresh race: the provider now uses a refresh lock and compares the failed token with the currently stored token before refreshing, so concurrent 401s do not race the same refresh token.
  4. Client reuse: token refresh now reuses the provider’s reqwest::Client instead of constructing a throwaway client.
  5. Atomic persistence: refreshed tokens are now written to a sibling temp file and then renamed into place.
  6. SSE buffering: the Codex ChatGPT provider no longer uses resp.text().await for successful SSE responses. It now parses /responses incrementally via bytes_stream() + eventsource-stream, with an idle timeout, following the same general approach used by Codex CLI.
  7. Docs/spec updates: src/llm/CLAUDE.md and .env.example were updated to document the Codex auth / ChatGPT backend path.

Validation run locally:

  • cargo fmt --all
  • cargo test --lib codex_auth
  • cargo test --lib codex_chatgpt
  • cargo clippy --lib --all-features -- -D warnings

The latest branch head includes the follow-up fixes for client reuse and streaming SSE parsing.

zmanian
zmanian previously approved these changes Mar 11, 2026
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review #3: All previous feedback addressed

This is a thorough rework. All 13 issues from the two prior reviews have been addressed:

First review (6/6 fixed)

  1. Timeout on model fetch -- 10s timeout added
  2. Private API warning -- doc comment + runtime tracing::warn! on construction
  3. Token refresh -- Full 401-based refresh with refresh_access_token(), persists to auth.json
  4. block_in_place -- Replaced with OnceCell lazy model resolution
  5. Multimodal content -- content_parts mapped to input_text/input_image
  6. Module placement -- Moved to src/llm/

Second review (7/7 fixed)

  1. SecretString for refresh_token -- SecretString used throughout: CodexCredentials, RegistryProviderConfig, CodexChatGptProvider, CodexTokens, RefreshResponse
  2. LLM request timeout -- send_http_request takes configurable Duration, respects LLM_REQUEST_TIMEOUT_SECS
  3. Race condition in token refresh -- refresh_lock: Mutex<()> + token comparison guard (same pattern as NEAR AI provider)
  4. Client reuse for refresh -- refresh_access_token now takes &reqwest::Client parameter, reuses provider's client
  5. Atomic auth.json write -- write-to-temp-then-rename pattern with cleanup on rename failure
  6. SSE streaming -- Now uses eventsource-stream crate for incremental parsing with per-event idle timeout (no full-body buffering)
  7. CLAUDE.md update -- src/llm/CLAUDE.md file map and provider sections updated, .env.example updated

New observations (non-blocking)

1. pub mod visibility (style, non-blocking). codex_chatgpt is only used within src/llm/mod.rs -- could be pub(crate) instead of pub. Same applies to codex_auth if it's only used from crate::config::llm and llm/mod.rs. Not a blocker.

2. eventsource-stream dependency (acceptable). New dependency eventsource-stream = "0.2". Lightweight crate (2 transitive deps: nom and memchr, both already in the tree). Reasonable for proper SSE parsing vs hand-rolling it.

3. client_version=0.111.0 hardcoded (noted). The /models endpoint uses a hardcoded client version string. If OpenAI changes version gating, this will need updating. Fine for now since it's behind an opt-in flag.

4. Overlap with PR #744 (coordination note). Both PRs target the same ChatGPT backend endpoint with similar SSE parsing logic. Whichever merges second should deduplicate into a shared Responses API client module.

What looks good

  • Zero .unwrap() / .expect() in production code
  • 16 tests with solid edge case coverage (apiKey fallback bug, image content, SSE streaming, empty string stripping)
  • All sensitive values use SecretString with secrecy crate (zeroize-on-drop)
  • Entirely opt-in via LLM_USE_CODEX_AUTH=true, no breaking changes
  • Clean separation: codex_auth.rs (credential loading/refresh) vs codex_chatgpt.rs (Responses API provider)
  • store: false correctly enforced for ChatGPT backend
  • Lazy model detection avoids blocking provider setup

Approved.

@ZeroTrust01
Copy link
Copy Markdown
Contributor Author

All follow-up fixes are in, and the latest branch head is now stable. The previous approval appears to have been dismissed after the later sync/check-trigger commit. Could you please take another quick look and re-approve if everything still looks good?
Thanks!

@ZeroTrust01 ZeroTrust01 requested a review from zmanian March 11, 2026 05:56
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat: Reuse Codex CLI OAuth tokens for ChatGPT backend LLM calls

Overall this is well-structured and follows the existing LLM provider patterns well. The token refresh handling with mutex-based race prevention is solid. However, there are security and code quality issues that should be addressed before merge.

Security Issues (Must Fix)

1. File permissions not set after writing auth.json
persist_refreshed_tokens() in codex_auth.rs writes refreshed OAuth tokens to auth.json via std::fs::write() + std::fs::rename() but never sets file permissions. This file contains bearer tokens and refresh tokens. On Unix systems, the file inherits the umask default (often 0644), making it world-readable.

After the rename, set permissions to 0600:

#[cfg(unix)]
{
    use std::os::unix::fs::PermissionsExt;
    let _ = std::fs::set_permissions(path, std::fs::Permissions::from_mode(0o600));
}

2. RefreshRequest derives Debug with plaintext refresh token
RefreshRequest (line ~254) has #[derive(Debug, Serialize)] but contains refresh_token: &'a str as a plain string. If anything in the error/retry path debug-prints this struct, the refresh token leaks to logs. Either:

  • Remove Debug derive from RefreshRequest, or
  • Wrap the field and implement a custom Debug that redacts it

This is defense-in-depth -- the struct isn't currently debug-printed, but the derive makes it easy to accidentally do so.

Code Quality Issues

3. Module visibility inconsistency
codex_chatgpt is declared pub mod but anthropic_oauth (which serves an analogous role) is mod. Since codex_chatgpt is only consumed within llm/mod.rs, it should be mod codex_chatgpt (or pub(crate) mod at most) for consistency with the existing pattern.

Similarly, codex_auth needs to be pub (or pub(crate)) since it's called from config/llm.rs via crate::llm::codex_auth::, but explicitly marking it pub(crate) would be more intentional.

4. Unrelated change bundled in PR
The main.rs change (#[cfg(unix)] on ChannelSecretUpdater import) is unrelated to Codex auth. Consider splitting it out or at minimum noting it in the PR description.

Minor / Non-blocking

5. default_codex_auth_path() uses dirs::home_dir().unwrap_or_else(|| PathBuf::from(".")) which falls back to CWD silently. This is fine for a best-effort path, but a tracing::warn! when home_dir() returns None would help debugging on unusual systems.

6. No token expiry check before use. The access token (a JWT) is used as-is without checking its exp claim. A proactive check could avoid one round-trip 401 + refresh cycle. This is non-blocking since the refresh-on-401 flow handles it, but would be a nice optimization.

7. store: false assertion in tests is good. The explicit test that store: false is set in the request body is a nice touch for ensuring ChatGPT backend compliance.

What Looks Good

  • Token refresh race condition handling via refresh_lock mutex with "already refreshed" check is well done
  • SecretString usage for all token storage
  • SSE parsing with idle timeout prevents silent hangs
  • strip_empty_string_values addresses a real model quirk pragmatically
  • Lazy model resolution avoids blocking during provider setup
  • Good test coverage (14 tests across both modules)
  • Clear warning about private/undocumented API usage
  • Error handling consistently uses LlmError variants with context

@ZeroTrust01
Copy link
Copy Markdown
Contributor Author

Fixed in this follow-up:

  1. auth.json persistence permissions:
    after persisting refreshed tokens, the file permissions are now set to 0600 on Unix.

  2. RefreshRequest debug exposure:
    Debug was removed from RefreshRequest so the refresh token is not accidentally printable.

  3. Module visibility:
    codex_auth is now pub(crate) and codex_chatgpt is now private to the llm module.

  4. Unrelated main.rs change:
    the import-shape change called out in review was reverted.

  5. default_codex_auth_path() fallback:
    it now emits a tracing::warn! when home_dir() is unavailable before falling back to the current directory.

Also included as a small supporting cleanup:

  • CodexChatGptProvider::new() is now marked #[cfg(test)] so clippy stays clean after tightening module visibility.

@ZeroTrust01 ZeroTrust01 requested a review from zmanian March 13, 2026 00:02
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the review feedback. All three must-fix issues are resolved:

  1. File permissions: set_auth_file_permissions() now sets 0600 on Unix after write
  2. RefreshRequest: Debug derive removed, only Serialize remains
  3. Module visibility: codex_chatgpt is now mod, codex_auth is pub(crate) mod

The unrelated main.rs change (ChannelSecretUpdater import move) is still present but it's cosmetic and non-blocking.

LGTM

@ilblackdragon ilblackdragon merged commit 1b59eb6 into nearai:staging Mar 16, 2026
8 of 9 checks passed
@ironclaw-ci ironclaw-ci bot mentioned this pull request Mar 17, 2026
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…rai#693)

* feat: add Codex auth.json token reuse for LLM authentication

When LLM_USE_CODEX_AUTH=true, IronClaw reads the Codex CLI's auth.json
(default ~/.codex/auth.json) and extracts the API key or OAuth access
token. This lets IronClaw piggyback on a Codex login without
implementing its own OAuth flow.

New env vars:
  - LLM_USE_CODEX_AUTH: enable Codex auth fallback (default: false)
  - CODEX_AUTH_PATH: override path to auth.json

* fix: handle ChatGPT auth mode correctly

Switch base_url to chatgpt.com/backend-api/codex when auth.json
contains ChatGPT OAuth tokens. The access_token is a JWT that only
works against the private ChatGPT backend, not the public OpenAI API.

Refactored codex_auth.rs to return CodexCredentials (token +
is_chatgpt_mode) instead of just a string key.

* fix: Codex auth takes highest priority over secrets store

When LLM_USE_CODEX_AUTH=true, Codex credentials are now loaded before
checking env vars or the secrets store overlay. Previously the secrets
store key (injected during onboarding) would shadow the Codex token.

* feat: Responses API provider for ChatGPT backend

- New CodexChatGptProvider speaks the Responses API protocol
- Auto-detects model from /models endpoint (gpt-4o -> gpt-5.2-codex)
- Adds store=false (required by ChatGPT backend)
- Error handling with timeout for HTTP 400 responses
- Message format translation: Chat Completions -> Responses API
- SSE response parsing for text, tool calls, and usage stats
- 7 unit tests for message conversion and SSE parsing

* fix: SSE parser uses item_id instead of call_id for tool call deltas

The Responses API sends function_call_arguments.delta events with
item_id (e.g. fc_...) not call_id (e.g. call_...). The parser now
keys pending tool calls by item_id from output_item.added and
tracks call_id separately for result matching.

* fix: strip empty string values from tool call arguments

gpt-5.2-codex fills optional tool parameters with empty strings
(e.g. timestamp: ""), which IronClaw's tool validation rejects.
Strip them before passing to tool execution.

* fix: prevent apiKey mode fallback to ChatGPT token

When auth_mode is explicitly 'apiKey' but the key is missing/empty,
do not fall through to check for a ChatGPT access_token. This prevents
returning credentials with is_chatgpt_mode: true and routing to the
wrong LLM provider.

* refactor: reuse single reqwest::Client across model discovery and LLM calls

Create Client once in with_auto_model, pass &Client to
fetch_default_model, and move it into the provider struct.
Eliminates the redundant Client::new() that wasted a connection pool.

* fix: bump client_version to 1.0.0 to unlock gpt-5.3-codex and gpt-5.4

The /models endpoint gates newer models behind client_version.
Version 0.1.0 only returns up to gpt-5.2-codex, while 1.0.0+
also returns gpt-5.3-codex and gpt-5.4.

* feat: user-configured LLM_MODEL takes priority over auto-detection

Fetch the full model list from /models endpoint. If LLM_MODEL is set,
validate it against the supported list and warn with available models
if not found. If LLM_MODEL is not set, auto-detect the highest-priority
model. Also bumps client_version to 1.0.0 to unlock gpt-5.3/5.4.

* fix: add 10s timeout to model discovery HTTP request

Prevents startup from blocking indefinitely if chatgpt.com
is slow or unreachable. Uses reqwest per-request timeout.

* docs: add private API warning for ChatGPT backend endpoint

The chatgpt.com/backend-api/codex endpoint is private and
undocumented. Add warning in module docs and a runtime log
on first use to inform users of potential ToS implications.

* feat: implement OAuth 401 token refresh for Codex ChatGPT provider

On HTTP 401, if a refresh_token is available, the provider now
automatically refreshes the access token via auth.openai.com/oauth/token
(same protocol as Codex CLI) and retries the request once. Refreshed
tokens are persisted back to auth.json.

Changes:
- codex_auth: read refresh_token, add refresh_access_token() and
  persist_refreshed_tokens()
- codex_chatgpt: RwLock for api_key, 401 detection + retry in
  send_request, send_http_request helper
- config/llm: thread refresh_token/auth_path through RegistryProviderConfig
- llm/mod: pass refresh params to with_auto_model

* refactor: lazy model detection via OnceCell, remove block_in_place

Model is no longer resolved during provider construction. Instead,
resolve_model() uses tokio::sync::OnceCell to lazily fetch from
/models on the first LLM call. This eliminates the block_in_place
+ block_on workaround in create_codex_chatgpt_from_registry.

- with_auto_model (async) -> with_lazy_model (sync constructor)
- resolve_model() added with OnceCell-based lazy init
- build_request_body takes model as parameter
- model_name() returns resolved or configured_model as fallback

* feat: support multimodal content (images) in Codex ChatGPT provider

message_to_input_items now checks content_parts for user messages.
ContentPart::Text maps to input_text and ContentPart::ImageUrl maps
to input_image, matching the Responses API format used by Codex CLI.
Falls back to plain text when content_parts is empty.

Also updates client_version to 0.111.0 for /models endpoint.

Adds test: test_message_conversion_user_with_image

* refactor: move codex_auth module from src/ to src/llm/

codex_auth is only used by the LLM layer (codex_chatgpt provider
and config/llm). Moving it under src/llm/ reflects its actual scope.

- Remove pub mod codex_auth from lib.rs
- Add pub mod codex_auth to llm/mod.rs
- Update imports: super::codex_auth, crate::llm::codex_auth

* Fix codex provider style issues

* Use SecretString throughout codex auth refresh flow

* Use SecretString for codex access tokens

* Reuse provider client for codex token refresh

* Stream Codex SSE responses incrementally

* Fix Windows clippy and SQLite test linkage

* Trigger checks after regression skip label

* Tighten codex auth module handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: new First-time contributor risk: medium Business logic, config, or moderate-risk modules scope: ci CI/CD workflows scope: dependencies Dependency updates scope: docs Documentation scope: llm LLM integration size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants