Skip to content

fix: switch structured output to tool-call with reflection retry#879

Merged
tito merged 5 commits intomainfrom
mathieu/fix-structured-output-kimi
Feb 25, 2026
Merged

fix: switch structured output to tool-call with reflection retry#879
tito merged 5 commits intomainfrom
mathieu/fix-structured-output-kimi

Conversation

@tito
Copy link
Collaborator

@tito tito commented Feb 24, 2026

Problem

The two-pass structured output approach (TreeSummarize → JSON formatting via acomplete) had a 25% success rate with Kimi K2.5. The model would produce valid text analysis but fail to format it as valid JSON consistently, causing topic detection and summarization failures in production.

Solution

Replaced StructuredOutputWorkflow with astructured_predict (tool-call / function-calling mode) + reflection retry loop:

  • astructured_predict uses the LLM's native tool-call interface to produce structured output directly as a Pydantic object, bypassing the fragile two-pass text→JSON pipeline
  • Reflection retry: on ValidationError or parse failure, the error is fed back to the LLM as a reflection prompt and the call is retried (up to LLM_PARSE_MAX_RETRIES)
  • min_length=10 on TopicResponse.title and TopicResponse.summary fields catches short/empty content via Pydantic validation, triggering reflection retry

Benchmark results

Model Old (two-pass) New (tool-call + reflection)
Kimi K2.5 25% (2/8) 100% (10/10)
qwen2.5:14b 100% (8/8) 100% (7/7)

Changes

  • server/reflector/llm.pyis_function_calling_model=True, rewrote get_structured_response(), removed dead StructuredOutputWorkflow class + event types
  • server/tests/test_llm_retry.py — rewrote tests for the new astructured_predict approach (9 tests, all passing)

Full analysis

Structured output benchmark analysis

Test plan

  • uv run pytest tests/test_llm_retry.py -v — 9/9 pass
  • uv run pytest — full suite passes (failures are pre-existing Redis/infra issues)
  • 10-run benchmark with Kimi K2.5 via debug_topic.py — 10/10 success
  • 10-run benchmark with qwen2.5:14b via debug_topic.py — 7/7 success (3 timed out on local Ollama)

tito added 5 commits February 24, 2026 15:07
Replace the two-pass StructuredOutputWorkflow (TreeSummarize → acomplete)
with astructured_predict + reflection retry loop for structured LLM output.

- Enable function-calling mode (is_function_calling_model=True)
- Use astructured_predict with PromptTemplate for first attempt
- On ValidationError/parse failure, retry with reflection feedback
- Add min_length=10 to TopicResponse title/summary fields
- Remove dead StructuredOutputWorkflow class and its event types
- Rewrite tests to match new astructured_predict approach
The switch to astructured_predict dropped the texts parameter entirely,
causing summary prompts (participants, subjects, action items) to be
sent without the transcript content. Combine texts with the prompt
before calling astructured_predict, mirroring what TreeSummarize did.
@tito tito merged commit 5d54758 into main Feb 25, 2026
7 of 8 checks passed
@tito tito deleted the mathieu/fix-structured-output-kimi branch February 25, 2026 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants