fix: switch structured output to tool-call with reflection retry by tito · Pull Request #879 · GreyhavenHQ/reflector

tito · 2026-02-24T21:09:14Z

Problem

The two-pass structured output approach (TreeSummarize → JSON formatting via acomplete) had a 25% success rate with Kimi K2.5. The model would produce valid text analysis but fail to format it as valid JSON consistently, causing topic detection and summarization failures in production.

Solution

Replaced StructuredOutputWorkflow with astructured_predict (tool-call / function-calling mode) + reflection retry loop:

astructured_predict uses the LLM's native tool-call interface to produce structured output directly as a Pydantic object, bypassing the fragile two-pass text→JSON pipeline
Reflection retry: on ValidationError or parse failure, the error is fed back to the LLM as a reflection prompt and the call is retried (up to LLM_PARSE_MAX_RETRIES)
min_length=10 on TopicResponse.title and TopicResponse.summary fields catches short/empty content via Pydantic validation, triggering reflection retry

Benchmark results

Model	Old (two-pass)	New (tool-call + reflection)
Kimi K2.5	25% (2/8)	100% (10/10)
qwen2.5:14b	100% (8/8)	100% (7/7)

Changes

server/reflector/llm.py — is_function_calling_model=True, rewrote get_structured_response(), removed dead StructuredOutputWorkflow class + event types
server/tests/test_llm_retry.py — rewrote tests for the new astructured_predict approach (9 tests, all passing)

Full analysis

Structured output benchmark analysis

Test plan

uv run pytest tests/test_llm_retry.py -v — 9/9 pass
uv run pytest — full suite passes (failures are pre-existing Redis/infra issues)
10-run benchmark with Kimi K2.5 via debug_topic.py — 10/10 success
10-run benchmark with qwen2.5:14b via debug_topic.py — 7/7 success (3 timed out on local Ollama)

Replace the two-pass StructuredOutputWorkflow (TreeSummarize → acomplete) with astructured_predict + reflection retry loop for structured LLM output. - Enable function-calling mode (is_function_calling_model=True) - Use astructured_predict with PromptTemplate for first attempt - On ValidationError/parse failure, retry with reflection feedback - Add min_length=10 to TopicResponse title/summary fields - Remove dead StructuredOutputWorkflow class and its event types - Rewrite tests to match new astructured_predict approach

The switch to astructured_predict dropped the texts parameter entirely, causing summary prompts (participants, subjects, action items) to be sent without the transcript content. Combine texts with the prompt before calling astructured_predict, mirroring what TreeSummarize did.

tito added 5 commits February 24, 2026 15:07

fix: reduce TopicResponse min_length from 10 to 8 for title and summary

7aba0eb

ci: try fixing spawning job in github

6385adc

ci: fix for new arm64 builder

77a9a2c

yamijuan approved these changes Feb 24, 2026

View reviewed changes

tito merged commit 5d54758 into main Feb 25, 2026
7 of 8 checks passed

tito deleted the mathieu/fix-structured-output-kimi branch February 25, 2026 00:28

yamijuan mentioned this pull request Feb 25, 2026

chore(main): release 0.35.1 #877

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: switch structured output to tool-call with reflection retry#879

fix: switch structured output to tool-call with reflection retry#879
tito merged 5 commits intomainfrom
mathieu/fix-structured-output-kimi

tito commented Feb 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tito commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Benchmark results

Changes

Full analysis

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tito commented Feb 24, 2026 •

edited

Loading