Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
207 changes: 207 additions & 0 deletions docs/plans/2026-02-28-gemini-google-genai-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# Gemini/Google via GenAI SDK Design

## Background

Current Gemini/Google traffic goes through `openai_compat` HTTP calls. This creates two issues:

1. Gemini-specific behavior is coupled to OpenAI-compatible wire format branches.
2. Gemini thought-signature handling depends on dialect-specific JSON mapping in HTTP code.

We already accepted a hybrid provider strategy in this branch:

- OpenAI protocol uses official OpenAI SDK.
- Other protocols stay on their existing adapter.

This document applies the same strategy to Gemini-family protocols:

- Route `gemini/*` and `google/*` to `google.golang.org/genai`.
- Keep `antigravity/*` unchanged.

## Goals

- Move Gemini/Google transport to official `genai` SDK.
- Keep `providers.LLMProvider.Chat` output contract unchanged.
- Preserve thought-signature set/use compatibility across runtime and session history.
- Minimize regression risk by isolating routing changes.

## Non-Goals

- Migrating `antigravity/*` to `genai`.
- Refactoring all provider-selection legacy paths in one pass.
- Changing agent/session schema.

## Why `antigravity/*` Stays Separate

`antigravity` uses Cloud Code Assist private endpoints (`cloudcode-pa.googleapis.com/v1internal:*`) with custom envelope fields (`project`, `requestType`, `requestId`, etc.). This is not the normal Gemini API surface used by `genai`.

Conclusion: `antigravity/*` remains on its dedicated provider.

## Candidate Approaches

### Option 1 (Recommended): New `gemini_sdk` provider, protocol routing split

- Add `pkg/providers/gemini_sdk`.
- Route `gemini` and `google` protocols to this provider.
- Keep `openai_compat` for other OpenAI-compatible protocols.
- Remove only Gemini-specific request-branch logic from `openai_compat`.

Pros:
- Clear boundaries and low coupling.
- Easy to test and roll back.
- Matches existing OpenAI SDK migration pattern.

Cons:
- One extra provider package.

### Option 2: Keep `openai_compat`, internally branch into `genai`

Pros:
- Fewer top-level provider packages.

Cons:
- `openai_compat` grows in complexity and mixed responsibilities.
- Harder long-term maintenance.

### Option 3: One-shot migrate `gemini/google/antigravity`

Pros:
- Superficial unification.

Cons:
- Highest risk.
- `antigravity` protocol mismatch makes this brittle.

## Decision

Adopt Option 1.

## Target Architecture

### New Provider

Create `pkg/providers/gemini_sdk/provider.go` implementing `providers.LLMProvider` with `genai.Client`.

Provider constructor inputs:

- `apiKey`
- `apiBase` (optional override)
- `proxy`
- `requestTimeout`

### Factory Routing (`model_list` path)

In `CreateProviderFromConfig`:

- `case "gemini", "google"` => `gemini_sdk.NewProvider(...)`
- `case "antigravity"` => unchanged.
- Other protocol routing unchanged.

## Request Mapping

`providers.Message` -> `[]*genai.Content` + `GenerateContentConfig`:

- `system` -> `config.SystemInstruction`
- `user` text -> user text part
- `assistant` text -> model text part
- `assistant` tool calls -> model `FunctionCall` parts
- `tool`/tool result -> user `FunctionResponse` parts

Tool definitions:

- map to `Tool.FunctionDeclarations`.

Options:

- `max_tokens` -> `MaxOutputTokens`
- `temperature` -> `Temperature`
- `prompt_cache_key` ignored for Gemini (consistent with current behavior)

## Response Mapping

From `GenerateContentResponse` first candidate:

- `LLMResponse.Content` <- `resp.Text()`
- `LLMResponse.ToolCalls` <- `part.FunctionCall` entries
- `LLMResponse.Usage` <- `UsageMetadata` counts
- `LLMResponse.FinishReason` mapping:
- tool calls present -> `tool_calls`
- `MAX_TOKENS` -> `length`
- else -> `stop`

## ThoughtSignature Compatibility (Set + Use)

### Source field in SDK

- `genai.Part.ThoughtSignature` (`[]byte`) alongside `Part.FunctionCall`.

### Write path (set)

When parsing response function calls:

- Store signature into `ToolCall.ExtraContent.Google.ThoughtSignature`.
- Mirror same value into `ToolCall.Function.ThoughtSignature` for backward compatibility.

### Read path (use)

When rebuilding assistant tool-call history for the next SDK request:

1. Read `ToolCall.ExtraContent.Google.ThoughtSignature` (preferred).
2. Fallback to `ToolCall.Function.ThoughtSignature`.
3. If missing/invalid, continue without signature.

This guarantees compatibility across mixed old/new session data.

## Session Serialization/Deserialization Compatibility

Session persistence serializes `providers.Message` as JSON.

Key compatibility facts:

- `ToolCall.ThoughtSignature` is non-serialized (`json:"-"`).
- Serialized fields are `Function.ThoughtSignature` and `ExtraContent.Google.ThoughtSignature`.

Compatibility strategy:

- New code writes both serialized fields.
- New code reads both (priority: `extra_content` then `function`).
- Old session files (function-only) remain usable.
- New session files remain usable by old readers through mirrored function field.

## `openai_compat` Cleanup After Migration

### Safe to remove in Phase A

- Gemini host-based `prompt_cache_key` request suppression.
- Gemini/Google request-side model-prefix special casing tied to Gemini routing.

### Keep in Phase A

- Generic response parsing support for `extra_content.google.thought_signature`.

Rationale: this may still appear from non-gemini OpenAI-compatible gateways.

## Testing Plan (Design-Level)

1. Provider routing tests:
- `gemini/*` -> `*gemini_sdk.Provider`
- `google/*` -> `*gemini_sdk.Provider`
- `antigravity/*` unchanged
2. Mapping tests:
- message roles, tool declarations, max tokens, temperature
3. Thought-signature tests:
- response signature -> extra_content + function mirror
- history rebuild prefers extra_content, falls back to function
4. Session compatibility tests:
- old session payload replays correctly
- new payload round-trips via JSON
5. Regression:
- providers package tests
- full `go test ./...`

## Acceptance Criteria

- Gemini/Google protocols use `genai` SDK provider.
- `Provider.Chat` external behavior remains stable.
- ThoughtSignature set/use is compatible across old/new sessions.
- `antigravity` behavior unchanged.
- Full test suite passes.
146 changes: 146 additions & 0 deletions docs/plans/2026-02-28-openai-sdk-for-openai-protocol-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# OpenAI SDK for OpenAI Protocol Design

## Background

The project currently uses:

- `codex_provider` with `github.com/openai/openai-go/v3` for Codex-specific backend.
- `openai_compat` with manual HTTP JSON for OpenAI-compatible multi-provider support.

`openai_compat` is intentionally broad and handles provider dialect differences. For protocol `openai`, we can use the official SDK with lower integration risk if we isolate it to OpenAI-only path.

## Decision

Adopt hybrid routing:

1. `openai` protocol (API key path) uses a new SDK-backed provider.
2. Other OpenAI-compatible protocols continue using existing `openai_compat` HTTP provider.
3. `openai` oauth/token paths remain on existing Codex provider path.

## Goals

- Improve OpenAI protocol correctness/maintainability via official SDK.
- Avoid destabilizing non-OpenAI compatible providers.
- Preserve existing external provider interface and agent loop behavior.

## Non-Goals

- Full migration of all OpenAI-compatible providers to SDK.
- Removing `openai_compat`.
- Mapping non-existent SDK fields into `ReasoningContent`, `ReasoningDetails`, `ThoughtSignature` for OpenAI protocol.

## Architecture

### New Provider

Create `pkg/providers/openai_sdk/provider.go` implementing `providers.LLMProvider`.

Core construction inputs:

- `apiKey`
- `apiBase`
- `proxy`
- `requestTimeout`
- `maxTokensField`

Implementation uses:

- `openai.NewClient(...)`
- `option.WithBaseURL(...)`
- `option.WithAPIKey(...)`
- `option.WithHTTPClient(...)`

### Factory Routing

Update `CreateProviderFromConfig`:

- `protocol == openai` + API key path => `OpenAISDKProvider`
- `protocol == openai` + oauth/token => existing codex auth provider (unchanged)
- all other openai-compatible protocols => existing `HTTPProvider` (`openai_compat`)

## Data Mapping

### Request Mapping

`providers.Message` -> `openai.ChatCompletionMessageParamUnion`:

- `system`, `user`, `assistant`, `tool` roles
- assistant tool calls mapped where needed

`providers.ToolDefinition` -> SDK function tools list.

Options mapping:

- `max_tokens` -> `MaxTokens` or `MaxCompletionTokens` (respect `maxTokensField`)
- `temperature` -> `Temperature`
- `prompt_cache_key` -> `PromptCacheKey` (OpenAI path only)

### Response Mapping

From first choice:

- `Content`
- `ToolCalls`
- `FinishReason`
- `Usage`

OpenAI SDK path intentionally does not map:

- `ReasoningContent`
- `ReasoningDetails`
- `ThoughtSignature`

These fields remain available for dialect providers on `openai_compat` path.

## Error Handling

- Surface SDK errors with status/type/code where available.
- Preserve current provider error semantics as much as possible (human-readable failure context).
- Keep timeout/proxy failures actionable.

## Testing Strategy

### Unit Tests (`openai_sdk/provider_test.go`)

- Basic content response parsing.
- Tool call response parsing.
- Max token field routing (`max_tokens` vs `max_completion_tokens`).
- Prompt cache key inclusion.
- Timeout behavior.
- Proxy behavior.

### Factory Tests

- OpenAI API-key config returns `*OpenAISDKProvider`.
- OpenAI oauth/token continues existing path.
- Non-openai protocols still return `*HTTPProvider`.

### Regression

- Existing `openai_compat` tests remain green.
- Full test suite passes (`go test ./...`).

## Risks and Mitigations

1. Behavior drift between SDK and HTTP paths.
- Mitigation: focused parity tests for fields/options used by agent loop.

2. Incomplete message/tool mapping edge cases.
- Mitigation: explicit role/tool test matrix and conservative fallback behavior.

3. Future duplication between SDK and HTTP logic.
- Mitigation: keep SDK path narrow (OpenAI-only) and avoid over-abstracting in this iteration.

## Rollout

1. Introduce new provider with tests.
2. Route `openai` protocol API-key path in factory.
3. Run provider package and full suite tests.
4. Keep capability profile/override logic in `openai_compat` for non-openai protocols.

## Acceptance Criteria

- OpenAI protocol (API key path) no longer uses `openai_compat`.
- Non-openai protocols continue working on `openai_compat` unchanged.
- No regression in tool call flow and usage accounting.
- Test suite passes.
Loading