sipeed · mingmxren · Feb 28, 2026 · Feb 28, 2026
diff --git a/docs/plans/2026-02-28-gemini-google-genai-design.md b/docs/plans/2026-02-28-gemini-google-genai-design.md
@@ -0,0 +1,207 @@
+# Gemini/Google via GenAI SDK Design
+
+## Background
+
+Current Gemini/Google traffic goes through `openai_compat` HTTP calls. This creates two issues:
+
+1. Gemini-specific behavior is coupled to OpenAI-compatible wire format branches.
+2. Gemini thought-signature handling depends on dialect-specific JSON mapping in HTTP code.
+
+We already accepted a hybrid provider strategy in this branch:
+
+- OpenAI protocol uses official OpenAI SDK.
+- Other protocols stay on their existing adapter.
+
+This document applies the same strategy to Gemini-family protocols:
+
+- Route `gemini/*` and `google/*` to `google.golang.org/genai`.
+- Keep `antigravity/*` unchanged.
+
+## Goals
+
+- Move Gemini/Google transport to official `genai` SDK.
+- Keep `providers.LLMProvider.Chat` output contract unchanged.
+- Preserve thought-signature set/use compatibility across runtime and session history.
+- Minimize regression risk by isolating routing changes.
+
+## Non-Goals
+
+- Migrating `antigravity/*` to `genai`.
+- Refactoring all provider-selection legacy paths in one pass.
+- Changing agent/session schema.
+
+## Why `antigravity/*` Stays Separate
+
+`antigravity` uses Cloud Code Assist private endpoints (`cloudcode-pa.googleapis.com/v1internal:*`) with custom envelope fields (`project`, `requestType`, `requestId`, etc.). This is not the normal Gemini API surface used by `genai`.
+
+Conclusion: `antigravity/*` remains on its dedicated provider.
+
+## Candidate Approaches
+
+### Option 1 (Recommended): New `gemini_sdk` provider, protocol routing split
+
+- Add `pkg/providers/gemini_sdk`.
+- Route `gemini` and `google` protocols to this provider.
+- Keep `openai_compat` for other OpenAI-compatible protocols.
+- Remove only Gemini-specific request-branch logic from `openai_compat`.
+
+Pros:
+- Clear boundaries and low coupling.
+- Easy to test and roll back.
+- Matches existing OpenAI SDK migration pattern.
+
+Cons:
+- One extra provider package.
+
+### Option 2: Keep `openai_compat`, internally branch into `genai`
+
+Pros:
+- Fewer top-level provider packages.
+
+Cons:
+- `openai_compat` grows in complexity and mixed responsibilities.
+- Harder long-term maintenance.
+
+### Option 3: One-shot migrate `gemini/google/antigravity`
+
+Pros:
+- Superficial unification.
+
+Cons:
+- Highest risk.
+- `antigravity` protocol mismatch makes this brittle.
+
+## Decision
+
+Adopt Option 1.
+
+## Target Architecture
+
+### New Provider
+
+Create `pkg/providers/gemini_sdk/provider.go` implementing `providers.LLMProvider` with `genai.Client`.
+
+Provider constructor inputs:
+
+- `apiKey`
+- `apiBase` (optional override)
+- `proxy`
+- `requestTimeout`
+
+### Factory Routing (`model_list` path)
+
+In `CreateProviderFromConfig`:
+
+- `case "gemini", "google"` => `gemini_sdk.NewProvider(...)`
+- `case "antigravity"` => unchanged.
+- Other protocol routing unchanged.
+
+## Request Mapping
+
+`providers.Message` -> `[]*genai.Content` + `GenerateContentConfig`:
+
+- `system` -> `config.SystemInstruction`
+- `user` text -> user text part
+- `assistant` text -> model text part
+- `assistant` tool calls -> model `FunctionCall` parts
+- `tool`/tool result -> user `FunctionResponse` parts
+
+Tool definitions:
+
+- map to `Tool.FunctionDeclarations`.
+
+Options:
+
+- `max_tokens` -> `MaxOutputTokens`
+- `temperature` -> `Temperature`
+- `prompt_cache_key` ignored for Gemini (consistent with current behavior)
+
+## Response Mapping
+
+From `GenerateContentResponse` first candidate:
+
+- `LLMResponse.Content` <- `resp.Text()`
+- `LLMResponse.ToolCalls` <- `part.FunctionCall` entries
+- `LLMResponse.Usage` <- `UsageMetadata` counts
+- `LLMResponse.FinishReason` mapping:
+  - tool calls present -> `tool_calls`
+  - `MAX_TOKENS` -> `length`
+  - else -> `stop`
+
+## ThoughtSignature Compatibility (Set + Use)
+
+### Source field in SDK
+
+- `genai.Part.ThoughtSignature` (`[]byte`) alongside `Part.FunctionCall`.
+
+### Write path (set)
+
+When parsing response function calls:
+
+- Store signature into `ToolCall.ExtraContent.Google.ThoughtSignature`.
+- Mirror same value into `ToolCall.Function.ThoughtSignature` for backward compatibility.
+
+### Read path (use)
+
+When rebuilding assistant tool-call history for the next SDK request:
+
+1. Read `ToolCall.ExtraContent.Google.ThoughtSignature` (preferred).
+2. Fallback to `ToolCall.Function.ThoughtSignature`.
+3. If missing/invalid, continue without signature.
+
+This guarantees compatibility across mixed old/new session data.
+
+## Session Serialization/Deserialization Compatibility
+
+Session persistence serializes `providers.Message` as JSON.
+
+Key compatibility facts:
+
+- `ToolCall.ThoughtSignature` is non-serialized (`json:"-"`).
+- Serialized fields are `Function.ThoughtSignature` and `ExtraContent.Google.ThoughtSignature`.
+
+Compatibility strategy:
+
+- New code writes both serialized fields.
+- New code reads both (priority: `extra_content` then `function`).
+- Old session files (function-only) remain usable.
+- New session files remain usable by old readers through mirrored function field.
+
+## `openai_compat` Cleanup After Migration
+
+### Safe to remove in Phase A
+
+- Gemini host-based `prompt_cache_key` request suppression.
+- Gemini/Google request-side model-prefix special casing tied to Gemini routing.
+
+### Keep in Phase A
+
+- Generic response parsing support for `extra_content.google.thought_signature`.
+
+Rationale: this may still appear from non-gemini OpenAI-compatible gateways.
+
+## Testing Plan (Design-Level)
+
+1. Provider routing tests:
+   - `gemini/*` -> `*gemini_sdk.Provider`
+   - `google/*` -> `*gemini_sdk.Provider`
+   - `antigravity/*` unchanged
+2. Mapping tests:
+   - message roles, tool declarations, max tokens, temperature
+3. Thought-signature tests:
+   - response signature -> extra_content + function mirror
+   - history rebuild prefers extra_content, falls back to function
+4. Session compatibility tests:
+   - old session payload replays correctly
+   - new payload round-trips via JSON
+5. Regression:
+   - providers package tests
+   - full `go test ./...`
+
+## Acceptance Criteria
+
+- Gemini/Google protocols use `genai` SDK provider.
+- `Provider.Chat` external behavior remains stable.
+- ThoughtSignature set/use is compatible across old/new sessions.
+- `antigravity` behavior unchanged.
+- Full test suite passes.
diff --git a/docs/plans/2026-02-28-openai-sdk-for-openai-protocol-design.md b/docs/plans/2026-02-28-openai-sdk-for-openai-protocol-design.md
@@ -0,0 +1,146 @@
+# OpenAI SDK for OpenAI Protocol Design
+
+## Background
+
+The project currently uses:
+
+- `codex_provider` with `github.com/openai/openai-go/v3` for Codex-specific backend.
+- `openai_compat` with manual HTTP JSON for OpenAI-compatible multi-provider support.
+
+`openai_compat` is intentionally broad and handles provider dialect differences. For protocol `openai`, we can use the official SDK with lower integration risk if we isolate it to OpenAI-only path.
+
+## Decision
+
+Adopt hybrid routing:
+
+1. `openai` protocol (API key path) uses a new SDK-backed provider.
+2. Other OpenAI-compatible protocols continue using existing `openai_compat` HTTP provider.
+3. `openai` oauth/token paths remain on existing Codex provider path.
+
+## Goals
+
+- Improve OpenAI protocol correctness/maintainability via official SDK.
+- Avoid destabilizing non-OpenAI compatible providers.
+- Preserve existing external provider interface and agent loop behavior.
+
+## Non-Goals
+
+- Full migration of all OpenAI-compatible providers to SDK.
+- Removing `openai_compat`.
+- Mapping non-existent SDK fields into `ReasoningContent`, `ReasoningDetails`, `ThoughtSignature` for OpenAI protocol.
+
+## Architecture
+
+### New Provider
+
+Create `pkg/providers/openai_sdk/provider.go` implementing `providers.LLMProvider`.
+
+Core construction inputs:
+
+- `apiKey`
+- `apiBase`
+- `proxy`
+- `requestTimeout`
+- `maxTokensField`
+
+Implementation uses:
+
+- `openai.NewClient(...)`
+- `option.WithBaseURL(...)`
+- `option.WithAPIKey(...)`
+- `option.WithHTTPClient(...)`
+
+### Factory Routing
+
+Update `CreateProviderFromConfig`:
+
+- `protocol == openai` + API key path => `OpenAISDKProvider`
+- `protocol == openai` + oauth/token => existing codex auth provider (unchanged)
+- all other openai-compatible protocols => existing `HTTPProvider` (`openai_compat`)
+
+## Data Mapping
+
+### Request Mapping
+
+`providers.Message` -> `openai.ChatCompletionMessageParamUnion`:
+
+- `system`, `user`, `assistant`, `tool` roles
+- assistant tool calls mapped where needed
+
+`providers.ToolDefinition` -> SDK function tools list.
+
+Options mapping:
+
+- `max_tokens` -> `MaxTokens` or `MaxCompletionTokens` (respect `maxTokensField`)
+- `temperature` -> `Temperature`
+- `prompt_cache_key` -> `PromptCacheKey` (OpenAI path only)
+
+### Response Mapping
+
+From first choice:
+
+- `Content`
+- `ToolCalls`
+- `FinishReason`
+- `Usage`
+
+OpenAI SDK path intentionally does not map:
+
+- `ReasoningContent`
+- `ReasoningDetails`
+- `ThoughtSignature`
+
+These fields remain available for dialect providers on `openai_compat` path.
+
+## Error Handling
+
+- Surface SDK errors with status/type/code where available.
+- Preserve current provider error semantics as much as possible (human-readable failure context).
+- Keep timeout/proxy failures actionable.
+
+## Testing Strategy
+
+### Unit Tests (`openai_sdk/provider_test.go`)
+
+- Basic content response parsing.
+- Tool call response parsing.
+- Max token field routing (`max_tokens` vs `max_completion_tokens`).
+- Prompt cache key inclusion.
+- Timeout behavior.
+- Proxy behavior.
+
+### Factory Tests
+
+- OpenAI API-key config returns `*OpenAISDKProvider`.
+- OpenAI oauth/token continues existing path.
+- Non-openai protocols still return `*HTTPProvider`.
+
+### Regression
+
+- Existing `openai_compat` tests remain green.
+- Full test suite passes (`go test ./...`).
+
+## Risks and Mitigations
+
+1. Behavior drift between SDK and HTTP paths.
+   - Mitigation: focused parity tests for fields/options used by agent loop.
+
+2. Incomplete message/tool mapping edge cases.
+   - Mitigation: explicit role/tool test matrix and conservative fallback behavior.
+
+3. Future duplication between SDK and HTTP logic.
+   - Mitigation: keep SDK path narrow (OpenAI-only) and avoid over-abstracting in this iteration.
+
+## Rollout
+
+1. Introduce new provider with tests.
+2. Route `openai` protocol API-key path in factory.
+3. Run provider package and full suite tests.
+4. Keep capability profile/override logic in `openai_compat` for non-openai protocols.
+
+## Acceptance Criteria
+
+- OpenAI protocol (API key path) no longer uses `openai_compat`.
+- Non-openai protocols continue working on `openai_compat` unchanged.
+- No regression in tool call flow and usage accounting.
+- Test suite passes.