feat(codemode): implement code mode for composable tool execution#99
Merged
spachava753 merged 34 commits intomainfrom Dec 1, 2025
Merged
feat(codemode): implement code mode for composable tool execution#99spachava753 merged 34 commits intomainfrom
spachava753 merged 34 commits intomainfrom
Conversation
Introduces design spec for Code Mode, a feature that allows LLMs to execute Golang code to interact with MCP tools. Instead of discrete tool calls requiring multiple round-trips, the LLM generates Go code that calls strongly-typed functions, enabling complex logic and composed tool executions in a single turn. The spec covers the execute_go_code tool design, function generation from MCP tool schemas, type mapping, code preamble structure, and configuration options including per-model settings and tool exclusion lists.
…andling Update the code_mode.md spec to address implementation details that were previously missing or underspecified. The tool input schema now includes an "imports" parameter for specifying packages used by generated code. Function names are now PascalCase (GetWeather vs getWeather) to follow Go conventions, and enum types are replaced with string fields plus doc comments listing allowed values. The spec now documents the full "shell" program structure that wraps generated code, including signal handling and context cancellation. A new section covers tool execution via `go run` in a temp directory with a go.mod for third-party dependencies. Error handling is clarified with distinct exit codes: 0 for success, 1 for returned errors, 2 for panics, and 3 for critical generated code failures. The spec also adds guidance on JSON schema to Go type mapping, noting that advanced schema features like anyOf/oneOf are out of scope but common patterns are supported. Additional sections cover security expectations (LLM is trusted, sandbox if needed), tool call rendering improvements for syntax highlighting, and context cancellation behavior when the parent CPE process receives a signal.
This commit fixes several typos and syntax errors in the code_mode.md specification. Corrected "the execute" to "the code to execute", fixed function declaration keywords from "fun" to "func", fixed a generic type parameter from "T" to "I", and added a missing newline at the end of the file.
…dling Reorganized the code mode specification to move Configuration section to the end after all implementation details. Added documentation about handling multimedia content types returned from MCP tools and clarified that only text content is currently supported. Added Reconnection Latency section to explain the performance tradeoffs of reconnecting to MCP servers on every code execution. Minor fixes include capitalizing `getCity` to `GetCity` in example code to match convention and clarifying that package imports can include aliases and side-effect imports.
The previous approach had the LLM generate code snippets that were inlined into a Run function within main.go. This caused a problem: when compilation errors occurred, the line numbers reported by the Go compiler would not match what the LLM expected, since the inlined code was offset by the preamble. The new approach has the LLM generate a complete Go source file (run.go) containing the Run function, package declaration, and imports. CPE generates main.go with types, function definitions, MCP setup, and the main entry point that calls Run. This separation means compilation errors in LLM-generated code report accurate line numbers, making debugging straightforward. The tool schema no longer needs an "imports" parameter since the LLM now controls the entire file. The spec examples have been updated to show both files and demonstrate the cleaner rendering of complete source files.
The existing example demonstrates basic tool composition (get_city → get_weather), but doesn't fully showcase the power of code mode when dealing with iteration and file I/O. This adds a new "Advanced Example: File I/O with Loops" section that reads a list of cities from a text file and calls get_weather on each. The example contrasts normal tool calling, which requires N+1 round-trips for N cities, against code mode which accomplishes the same task in a single execution. This O(N) vs O(1) difference in model round-trips makes a compelling case for code mode's value in batch processing scenarios.
… error handling The execute_go_code tool now requires an executionTimeout field in its input schema, allowing the LLM to specify an appropriate timeout (1-300 seconds) based on its estimate of the generated code's runtime. When the timeout expires, CPE sends SIGINT to the child process, waits 5 seconds for graceful shutdown, then sends SIGKILL if needed. This prevents runaway processes from infinite loops or unexpectedly long operations. The callMcpTool preamble function now checks for context cancellation before making tool calls and handles context deadline/cancellation errors gracefully by returning them normally instead of calling fatalExit. This allows the LLM-generated code to properly propagate timeout signals up through the execution stack without fatally exiting the entire program.
The code mode specification has been updated to more accurately reflect the implementation design and clarify several key areas. The output schema for the execute_go_code tool now explicitly states that it returns combined stdout and stderr output, providing clarity on what the tool result will contain. The function generation section has been rewritten to correctly explain how CPE and the LLM split responsibilities. Rather than incorrectly stating that types and function code are prepended to generated code, the spec now clearly explains that the LLM generates a complete run.go file containing the Run function, while CPE generates a separate main.go with types, function definitions, and MCP setup code. The code compilation section now documents the go mod tidy step that CPE performs after creating the temporary directory and files to download required modules before compilation. This clarifies the initial latency cost and explains how subsequent executions benefit from the module cache. A new Naming Collisions section has been added to describe two types of collisions that CPE must detect at startup. In addition to detecting when an MCP tool is named execute_go_code, CPE must also check that tool names don't collide when converted to pascal case function names, such as get_weather and get_Weather both becoming GetWeather. The section clarifies that excludedTools resolves collisions by exposing the tool as a regular tool rather than disabling it. The tool call rendering section now specifies that only the non-streaming printer should stylize the generated code as a Go code block, while the streaming printer treats execute_go_code as a normal tool and prints JSON arguments.
…of merging The code mode configuration resolution strategy has been documented to explicitly specify that model-level settings provide override behavior, not merging. When a model specifies its own `codeMode` configuration, it completely replaces the global default rather than merging individual fields like `excludedTools`. This keeps the configuration predictable and self-contained for each model. The configuration examples and documentation have been expanded to show inheritance, override, and disable scenarios more clearly. A new Configuration Resolution subsection explains the precedence order and provides a concrete example demonstrating that `excludedTools` lists are replaced entirely, preventing the unexpected accumulation of excluded tools when different models need different tool configurations.
The code mode specification now includes a comprehensive set of implementation tasks broken down into five logical phases: foundation, code generation, execution engine, integration, and CLI integration. The tasks are ordered by dependency and each incorporates testing guidance as part of its scope rather than as separate work items. The foundation phase establishes the configuration system, JSON schema to Go type conversion, and tool name collision detection. The code generation phase creates the templates for main.go and tool descriptions. The execution engine implements the sandbox, timeout handling, and error classification. The integration phase wires everything together into the generator and printer, and the final phase connects the config through the CLI. Each task includes sufficient context from the specification to guide implementation without requiring external documentation, and dependencies are made explicit to enable parallel planning and prioritization.
Implement the foundation for code mode support by adding configuration types and resolution logic per the code mode specification. This enables users to enable or disable code mode globally or per-model, and exclude specific tools from code mode via configuration. Add CodeModeConfig struct to represent code mode settings with Enabled and ExcludedTools fields. Wire this into both Defaults and ModelConfig structs to support configuration at both levels. Implement override semantics where model-level code mode configuration completely replaces global defaults rather than merging, keeping the configuration predictable and explicit. Extend ResolveConfig to resolve effective CodeModeConfig using the override behavior, placing it into the runtime Config struct. Add environment variable expansion support for ExcludedTools in both global and model-specific configurations, following the same pattern as other config fields. Include comprehensive table-driven tests covering code mode loading and resolution scenarios: validation of various configurations, override behavior verification, inheritance from defaults, and environment variable expansion. All tests pass without modifying existing config tests.
Add the internal/codemode package with schema conversion functionality that transforms MCP tool input/output JSON schemas into Go type definitions. This enables code mode to generate strongly-typed function signatures and struct definitions that the LLM can use when generating Go code to call MCP tools. The converter handles the JSON schema features commonly used by MCP tools: primitive types (string, number, integer, boolean), objects with properties, arrays, nullable types via type arrays like ["null", "string"], and enum values rendered as doc comments. Nested objects generate separate named types using a Parent_Field naming convention to keep the generated code readable. Missing or nil schemas produce a map[string]any type alias, accommodating tools that lack output schemas. Update the code mode spec to mark Task 2 complete and revise subsequent tasks to reflect the flattened package structure. Task 3 now focuses solely on collision detection since FieldNameToGo already provides the pascal case conversion needed for tool names. Task 7 no longer creates a nested executor subpackage.
Add collision detection to the codemode package that validates tool names before code mode initialization. This prevents two types of naming conflicts that would cause issues when generating Go code from MCP tools.
The first check catches tools named "execute_go_code", which is reserved for the code mode tool itself. The second check detects when different tool names would produce identical Go function names after pascal case conversion (e.g., "get_weather" and "getWeather" both become "GetWeather"). Both checks return descriptive errors guiding users to resolve conflicts via the excludedTools configuration or by removing the conflicting MCP server.
Remove the FieldNameToGo wrapper function from schema.go in favor of calling strcase.UpperCamelCase directly with an explanatory comment. The wrapper added no value since collision detection needs the same conversion logic and can use the library directly.
Apply gopls modernize fixes across the codebase: replace interface{} with any, use slices.Contains instead of manual loops, use strings.SplitSeq for iteration, and use range-over-int syntax for simple counted loops.
…for tools Add GenerateToolDefinitions function that takes a slice of MCP tools and produces the complete Go code needed for the execute_go_code tool description and main.go template. The function generates input/output struct definitions using the existing SchemaToGoType converter, plus function variable declarations with doc comments from tool descriptions. Tools without input schemas produce functions that only take context, while tools with input schemas include an input parameter. Tools lacking output schemas default to map[string]any. The output is deterministically ordered by tool name to ensure consistent code generation across runs.
Add GenerateMainGo function that produces the complete main.go file for code mode execution. The generated code includes MCP client initialization, server connections for each configured server, and function variable assignments that wire tools to their respective server sessions. The template handles all three transport types (stdio, http, sse) with proper configuration. Stdio servers support command arguments and environment variable overlays. HTTP and SSE servers support custom headers via a generated headerRoundTripper type, with each server receiving its own HTTP client instance containing its specific headers. Imports are generated conditionally based on actual usage to avoid dead code compilation errors. The template is embedded from a separate file using go:embed for maintainability. Server and tool initialization ordering is deterministic via alphabetical sorting. A compilation test verifies that all generated code variants actually compile successfully.
Add GenerateExecuteGoCodeDescription and GenerateExecuteGoCodeTool functions that produce the tool definition for code mode. The description includes the runtime Go version, generated type definitions and function signatures from MCP tools, a template showing the expected code structure, an abbreviated main.go shape, and instructions for the LLM. GenerateExecuteGoCodeTool returns a complete gai.Tool with the description and input schema specifying the required code and executionTimeout parameters. The timeout is constrained to 1-300 seconds per the spec. Also adds a testing guideline to AGENTS.md requiring exact string matching in test assertions rather than partial matching with strings.Contains.
Add ExecuteCode function that creates a temporary sandbox for running LLM-generated Go code. The function creates a temp directory with a random suffix, writes go.mod with the MCP SDK dependency, generates main.go using the existing GenerateMainGo function, writes run.go with the LLM-provided code, runs go mod tidy to download dependencies, then builds and executes the binary. The implementation builds a binary with go build before execution rather than using go run directly. This preserves accurate exit codes from the child process, which is important for distinguishing between normal errors (exit code 1) and panics (exit code 2) as required by the spec. The temp directory is cleaned up via defer after execution completes. The executor accepts a timeout parameter in preparation for Task 8, which will add SIGINT/SIGKILL signal handling. Exit code classification for error handling will be added in Task 9.
Extend the code execution sandbox to enforce timeouts and handle graceful shutdown. The executor now uses Go's exec.Cmd.Cancel and WaitDelay fields to implement the timeout and signal handling behavior specified in the code mode spec. When the timeout expires or parent context is cancelled, the executor sends SIGINT to allow the child process to shut down gracefully. If the process doesn't exit within a 5-second grace period, Go automatically sends SIGKILL. The Cancel function returns os.ErrProcessDone to suppress context errors when the process exits cleanly after receiving SIGINT, ensuring accurate exit code reporting. This approach leverages Go's built-in subprocess management rather than manual goroutine orchestration, resulting in cleaner and more reliable signal propagation for both timeout and parent context cancellation scenarios.
Extend the executor to classify execution results into appropriate error types that downstream consumers (Task 10's tool callback) can use to determine whether to return output as a recoverable tool result or halt agent execution. The classification maps exit codes to error types: exit 0 returns nil (success), exits 1 and 2 return RecoverableError (Run() errors and panics respectively), and exit 3 returns FatalExecutionError (from fatalExit() in generated MCP setup code). Compilation errors from go build or go mod tidy also return RecoverableError since the LLM can adapt by fixing syntax issues. Other non-zero codes like -1 from SIGKILL are treated as recoverable to allow the LLM to retry with faster-running code. The error types are defined in a new errors.go file with RecoverableError containing both the output and exit code for context, while FatalExecutionError contains just the output since it always corresponds to exit code 3.
Add documentation for timeout/SIGKILL scenarios (exit code -1) in the error handling section, clarifying that these are treated as recoverable errors since the LLM can adapt by generating faster code. Also fix a minor grammatical error in the compilation error description.
Implement the gai.ToolCallback interface for the execute_go_code tool, which serves as the bridge between LLM tool calls and the code execution sandbox. The callback parses the code and executionTimeout parameters from the tool call JSON, invokes the executor from Tasks 7-9, and translates execution results into appropriate responses. The error handling follows the classification established in the previous commit: successful executions and recoverable errors (compilation failures, Run() errors, panics, timeouts) return tool result messages that the LLM can adapt to, while fatal errors (exit code 3) and infrastructure failures propagate as errors to halt agent execution. This separation allows the LLM to iterate on code that fails to compile or run, while ensuring unrecoverable issues in the generated MCP setup code stop the agent immediately.
Wire code mode support into the generator creation pipeline. When code mode is enabled in the configuration, the generator now partitions MCP tools into two categories: tools accessible via generated Go code and excluded tools that remain as regular tool calls. The execute_go_code tool is generated and registered with the code-mode tools, while excluded tools are registered normally for direct LLM invocation. The FetchTools function in the MCP client was refactored to return tools grouped by server name rather than a flat map. This structure is necessary for code mode since the generated main.go needs to know which server each tool belongs to for MCP client setup. The new ToolData type includes both the gai.Tool for registration and the original *mcp.Tool for code generation. Collision detection runs on code-mode tools at startup before any tool registration occurs, catching reserved name conflicts and pascal case collisions early. When code mode is disabled or nil, the generator falls back to the original behavior of registering all tools directly.
When the non-streaming ResponsePrinterGenerator encounters an execute_go_code tool call, it now extracts the code parameter and renders it as a Go markdown block with syntax highlighting instead of displaying the raw JSON arguments. This provides better readability for users following along with code mode execution, as they see properly formatted Go code rather than escaped strings within JSON. The streaming printer remains unchanged and continues to render tool calls as JSON, which is acceptable given the incremental nature of streaming output. Edge cases such as malformed JSON, missing code parameters, or non-string code values gracefully fall back to the existing JSON rendering behavior.
MCP tools without output schemas can return content in any format - plain text, markdown, or arbitrary strings - not just JSON objects. The previous implementation used `map[string]any` which would fail at runtime when attempting to JSON unmarshal non-JSON content. The schema converter now generates `type XOutput = string` for tools with nil output schemas, and the generated `callMcpTool` function detects when the output type is `string` and returns the raw text content directly, bypassing JSON unmarshaling entirely. This allows LLM-generated code to receive and process the raw tool output as needed, whether that means parsing it as JSON, treating it as plain text, or handling it as markdown.
… unset from zero values
When generating Go structs from JSON schemas, optional fields were previously generated as value types with `omitempty`. This caused issues with MCP tools that validate input strictly - an unset optional enum field would serialize as an empty string "" rather than being omitted entirely, triggering validation errors like "Invalid enum value. Expected 'fallback' | 'preferred', received ''".
Optional fields (those not in the schema's `required` array) are now generated as pointer types with `omitempty`. This allows proper distinction between "not set" (`nil`, omitted from JSON) and "explicitly set to zero value" (e.g., `ptr("")` for an intentionally empty string). A generic `ptr[T any](v T) *T` helper function is included in the generated `main.go` and documented in the tool description, enabling LLMs to easily create pointers from literals like `ptr("value")` or `ptr(42)`.
The `model system-prompt` command now also includes the resolved `CodeModeConfig` when rendering templates, ensuring the system prompt preview accurately reflects code mode settings.
…fields Update the code mode specification to document that optional fields (not in the schema's required array) are now generated as pointer types with omitempty JSON tags. This change allows distinguishing between unset fields and fields explicitly set to zero values, preventing validation errors from MCP servers that strictly validate enum and optional parameters. The spec now documents the ptr[T any] helper function in the generated main.go template and the tool description, showing examples of how LLMs can create pointers from literals when setting optional fields.
The execute_go_code tool was previously only registered when there were MCP tools available for code mode. This missed the key insight that code mode is valuable on its own - the tool provides LLMs with access to the full Go standard library for file I/O, HTTP requests, data processing, and complex control flow, independent of any MCP server integration. When code mode is enabled in the configuration, the execute_go_code tool is now always registered, regardless of whether MCP servers are configured or whether all tools are excluded via excludedTools. This allows users to leverage code mode purely for its Go execution capabilities without needing to set up any MCP servers. The spec and implementation task descriptions have been updated to reflect this behavior, and a new test verifies the tool registration logic for various code mode configurations.
Regenerates the JSON schema to include the CodeModeConfig type definition and its references in Defaults and ModelConfig. This reflects the code mode configuration options (enabled flag and excludedTools list) that were added to the config structs but were missing from the generated schema.
Added comprehensive documentation for the new code mode feature to both AGENTS.md and README.md. AGENTS.md changes: - Added codemode/ package description to project structure section - Added code mode usage examples to build/test commands - Added new Code Mode section covering configuration, implementation details, and technical notes including tool schema conversion, execution sandboxing, timeout handling, exit codes, and rendering behavior README.md changes: - Added Code Mode to features list - Added new Code Mode section with detailed explanation of functionality and benefits - Documented code mode configuration with examples - Explained how execute_go_code tool works with example generated code - Added guidance on when to enable/exclude code mode and tools - Included security considerations for production deployments - Added code mode example to configuration section The documentation explains that code mode allows LLMs to compose multiple tool calls in a single execution by generating Go programs, reducing latency and enabling control flow, data processing, and standard library access.
…de and best practices This update substantially revises the agent instructions template to provide clearer, more comprehensive guidance. The new version adds detailed sections on code mode operations, execution models, and concrete patterns for common tasks like shell commands, concurrent fan-out, and HTTP requests. It includes new guidance on file editing constraints, software development practices, and commit message conventions. The template now better explains CPE's capabilities and establishes clear expectations for tone, verbosity, and error handling in responses.
…ode efficiency Add comprehensive token counting tests that validate code mode's design trade-offs using real Anthropic API token counts. The test suite demonstrates that while code mode introduces upfront overhead from the execute_go_code tool description, it achieves significant token savings when processing large intermediate results that don't require LLM analysis. The test uses gai.AnthropicGenerator.Count to measure actual token usage across four scenarios. Simple workflows with minimal tool composition show code mode has a base overhead of approximately 700 tokens from the execute_go_code tool description, resulting in 48-131% higher token usage for short conversations. However, the massive file search test reveals code mode's strength: when a ~4300 token file needs to be searched for a specific name, normal tool calling consumes 8553 tokens (as the entire file content flows through the LLM context) while code mode uses only 1291 tokens (84.9% savings), since the Go code processes the file locally and returns just the search result. The test generates a temporary file with 1200 fake names using the go-faker library to simulate realistic large file processing scenarios. This demonstrates the core insight from the code mode specification: the upfront overhead is amortized when intermediate tool results would otherwise bloat the context window. The test file is placed in docs/specs alongside the code mode specification document for easy reference during future development. The go.mod changes reflect the addition of go-faker/faker/v4 for test data generation and promote go-strcase from indirect to direct dependency, as it's used by the codemode package for pascal case tool name conversion.
…rence and file operations Enhance the system prompt template with three new code mode sections that guide LLMs toward more idiomatic Go code generation. The additions establish clear preferences for when to use standard library functions versus shell commands, how to perform surgical file edits, and best practices for reading large files efficiently. The "Prefer standard library" section explicitly directs the model to use Go's native file operations, path manipulation, and time functions instead of shelling out to commands like find, ls, or date. This guidance improves code portability and reduces subprocess overhead while still acknowledging when shell commands like rg, git, and gh provide clear advantages. The "File editing" section promotes surgical edits using strings.Replace or regexp.ReplaceAllString over rewriting entire files, reducing I/O and preserving unrelated content. It includes specific guidance on handling backticks in string literals, a common pitfall when generating markdown or code that contains Go code blocks, demonstrating the concatenation pattern to avoid raw string literal conflicts. The "Reading large files" section discourages loading massive files into memory, instead recommending targeted extraction with rg context flags for pattern-based access and bufio.Scanner with line counting for range-based reads. The explicit prohibition against sed and inclusion of a working example guide the model toward portable, Go-native solutions. The go doc recommendation for Go source introspection aligns with existing AGENTS.md documentation practices. Testing with haiku confirmed the new instructions successfully shift behavior: file counting now uses filepath.Walk instead of find, date queries leverage the model's awareness of system info rather than executing commands, and line range requests generate bufio.Scanner loops rather than sed invocations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Code mode allows LLMs to execute Go programs that compose multiple MCP tool calls in a single execution, enabling control flow, data processing, and access to the Go standard library. This reduces latency by eliminating round-trips between the LLM and tools, while providing expressiveness beyond discrete tool calls.
What Changed
main.gowith MCP client setup, and runs LLM-generated codeexecute_go_codetool that wraps the execution engine and integrates with the agent pipelineCreateToolCapableGeneratorwith tool partitioning and collision checksexecute_go_codewhen code mode is enabled, even without MCP serversKey Features
Composability: Chain multiple tool calls in Go code with native control flow (loops, conditionals)
Standard Library Access: File I/O, HTTP requests, and data processing without additional tools
Efficiency: O(1) vs O(N) round-trips for N operations, reducing latency and token usage
Type Safety: Strongly-typed Go functions generated from MCP tool schemas with proper optional field handling
Error Recovery: LLM can iterate on compilation errors, panics, and timeouts; fatal errors halt execution
Configuration
Architecture
When enabled, CPE partitions MCP tools into:
main.gotemplateThe LLM generates complete Go files implementing
Run(ctx context.Context) error. CPE:go.mod, generatedmain.go, and LLM'srun.gogo mod tidyto download dependenciesgo buildto preserve accurate exit codesExit Code Classification
Run()returned error (recoverable)fatalExit()in generated code (fatal - stops agent)Testing
Documentation
docs/specs/code_mode.mdwith examples and implementation tasksFiles Changed
internal/codemode/with 14 implementation files and comprehensive testsCodeModeConfigto config types, loading, and resolutionexecute_go_codeFetchToolsto group tools by serverCloses #XX (if there's an issue)