Skip to content

/v1/responses streaming SSE missing fields required by Vercel AI SDK (output_index, id on function_call items, created_at/model) #20607

@MidnightOnMars

Description

@MidnightOnMars

Summary

The /v1/responses streaming SSE implementation (from PR #18486) is missing three fields that the Vercel AI SDK's @ai-sdk/openai package requires via Zod schema validation. This prevents tools like OpenCode, Cursor, and other AI SDK-based clients from using the Responses API with llama-server for tool calling.

Chat Completions (/v1/chat/completions) works correctly. The model produces perfect structured tool calls. This is specifically about the Responses API streaming event format.

Missing fields

1. output_index (number) on multiple event types

Affected events: response.output_item.added, response.output_item.done, response.function_call_arguments.delta, response.output_text.delta

The AI SDK uses output_index to track which output item each event belongs to. Without it, the Zod schema validation fails (chunk.success = false) and the event is silently dropped.

Fix: Add a sequential counter starting at 0, incremented each time a new output item is added.

2. id (string) on function_call items

Affected events: response.output_item.added and response.output_item.done where item.type === "function_call"

llama-server sends call_id but not id on function_call items. The AI SDK schema requires both fields. When id is missing, the Zod discriminated union matches type === "function_call" but fails inner validation. The stream handler skips the event, hasFunctionCall never becomes true, and finishReason is always "stop" regardless of the model's actual output.

Fix: Copy call_id to id (or generate a unique id) on function_call items.

3. created_at (number) and model (string) on response.created

The AI SDK schema requires response.{id, created_at, model} but llama-server sends only response.{id, status}.

Fix: Add created_at (unix timestamp) and model (the model name from the request) to the response object in response.created events.

What does NOT need fixing

The AI SDK has a catch-all that transforms unknown event types into { type: "unknown_chunk" }. These events are silently ignored and do NOT cause errors:

  • response.reasoning_text.delta (vs AI SDK's expected response.reasoning_summary_text.delta)
  • response.in_progress
  • response.content_part.added/done
  • response.output_text.done

Evidence

Tested with GPT-OSS-120B on llama.cpp build 8305 (d63aa39) using OpenCode v1.2.26 with @ai-sdk/openai. The model handles 11 tools + 48KB system prompts correctly at the API level (verified via curl). An 80-line Node.js SSE proxy that patches only these three fields makes tool calling work end-to-end through the AI SDK.

Reproduction

  1. Start llama-server with a tool-calling model and --jinja
  2. Configure OpenCode with "npm": "@ai-sdk/openai" pointing at the server
  3. Run opencode run "Create a file called test.txt" --format json
  4. Result: reason: "stop", no tool execution. Model produced correct function_call but AI SDK silently dropped it.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions