-
Notifications
You must be signed in to change notification settings - Fork 16.5k
/v1/responses streaming SSE missing fields required by Vercel AI SDK (output_index, id on function_call items, created_at/model) #20607
Description
Summary
The /v1/responses streaming SSE implementation (from PR #18486) is missing three fields that the Vercel AI SDK's @ai-sdk/openai package requires via Zod schema validation. This prevents tools like OpenCode, Cursor, and other AI SDK-based clients from using the Responses API with llama-server for tool calling.
Chat Completions (/v1/chat/completions) works correctly. The model produces perfect structured tool calls. This is specifically about the Responses API streaming event format.
Missing fields
1. output_index (number) on multiple event types
Affected events: response.output_item.added, response.output_item.done, response.function_call_arguments.delta, response.output_text.delta
The AI SDK uses output_index to track which output item each event belongs to. Without it, the Zod schema validation fails (chunk.success = false) and the event is silently dropped.
Fix: Add a sequential counter starting at 0, incremented each time a new output item is added.
2. id (string) on function_call items
Affected events: response.output_item.added and response.output_item.done where item.type === "function_call"
llama-server sends call_id but not id on function_call items. The AI SDK schema requires both fields. When id is missing, the Zod discriminated union matches type === "function_call" but fails inner validation. The stream handler skips the event, hasFunctionCall never becomes true, and finishReason is always "stop" regardless of the model's actual output.
Fix: Copy call_id to id (or generate a unique id) on function_call items.
3. created_at (number) and model (string) on response.created
The AI SDK schema requires response.{id, created_at, model} but llama-server sends only response.{id, status}.
Fix: Add created_at (unix timestamp) and model (the model name from the request) to the response object in response.created events.
What does NOT need fixing
The AI SDK has a catch-all that transforms unknown event types into { type: "unknown_chunk" }. These events are silently ignored and do NOT cause errors:
response.reasoning_text.delta(vs AI SDK's expectedresponse.reasoning_summary_text.delta)response.in_progressresponse.content_part.added/doneresponse.output_text.done
Evidence
Tested with GPT-OSS-120B on llama.cpp build 8305 (d63aa39) using OpenCode v1.2.26 with @ai-sdk/openai. The model handles 11 tools + 48KB system prompts correctly at the API level (verified via curl). An 80-line Node.js SSE proxy that patches only these three fields makes tool calling work end-to-end through the AI SDK.
Reproduction
- Start llama-server with a tool-calling model and
--jinja - Configure OpenCode with
"npm": "@ai-sdk/openai"pointing at the server - Run
opencode run "Create a file called test.txt" --format json - Result:
reason: "stop", no tool execution. Model produced correct function_call but AI SDK silently dropped it.
Related
- PR server: /v1/responses (partial) #18486 (Responses API implementation)
- Issue Misc. bug: OpenAI API v1/responses llama-server #14702 (original feature request)
- Issue Feature Request: Support OpenAI Responses API (/v1/responses) in llama.cpp server #19138 (Responses API feature request)