Skip to content

v0.16.7

Latest

Choose a tag to compare

@carenthomas carenthomas released this 31 Mar 19:28
· 3 commits to main since this release
f333247

Letta Server 0.16.7 Release Notes

173 commits since 0.16.6 | Released March 31, 2026

Highlights

Self-hosted users: this is a big upgrade. The default global context window is raised from 32k to 128k, the context window reset bug (LET-7991) is fixed, and compaction has been overhauled. If you've been running curl commands to patch your config after every ADE load, most of that pain should be gone.

Breaking Changes

  • Block limits are no longer enforced -- block limit validation has been deprecated and removed from the git memory sync path (#9977, #9983). Blocks can now grow freely. If you were relying on limits to cap per-turn cost, you'll need to manage block size via other means.

Context Window & Compaction (21 fixes)

The biggest category of fixes. Self-hosted users were hit hardest by these.

  • Global context window default raised from 32k to 128k (#9993) -- self-hosted servers no longer default to 32k for unknown models
  • Context window preserved on conversation model override (LET-7991, #9986) -- the bug where non-default conversations fell back to 32k is fixed
  • Compaction overflow fixes (#9897) -- addresses the double-compaction and runaway compaction loops
  • Compaction model resets on agent model change (#10031) -- switching your agent's model no longer leaves the old summarizer model behind
  • Summarizer prompt improved (#10314) -- now remembers plan files, GitHub PRs, and other structured content during summarization
  • BYOK summarization fixed (#10152) -- summarizer provider fallback no longer fires for BYOK requests
  • Better error surfacing -- context window exceeded errors now have descriptive messages (#10135, #10171), and system prompt size warnings during compaction (#10058)

Gemini (2 fixes)

  • thought_signature preserved on function calls without reasoning (LET-8166, #10237) -- the bug blocking all Gemini 2.5+/3.x multi-turn tool calling is fixed
  • Streaming interface crash fixed (#10306) -- self.model now initialized in SimpleGeminiStreamingInterface constructor (LET-8129)

Memory & memfs (10 fixes, 4 features)

  • available_skills block no longer duplicates in system prompt (#10006, #10011, #10021) -- three separate fixes for the skills block multiplying and inflating context (LET-8013)
  • Git memory sync deferred until stream close (#9951) -- reduces mid-stream sync failures
  • System prompt recompiles on agent creation with git memory (#9950) -- new git-enabled agents no longer start with empty compiled context
  • Projection-style git memory rendering (#10211) -- new rendering approach for memfs content in system prompts
  • Manual block edits via API trigger recompile (#9775) -- no more stale context after API block updates
  • Conversation recompile endpoint (#9848) -- POST /v1/conversations/{id}/recompile is now available

Conversations (7 features)

  • Conversation forking (#10234, #10263) -- fork conversations with shared message history, including the default conversation
  • Sort conversations by last_message_at (#10190)
  • Idempotent conversation streaming (#10147) -- OTID-based retry safety
  • Request-scoped system overrides (#10227) -- per-request system prompt modifications

Streaming & Reliability (13 fixes)

  • OTID retry hardening (#10229, #10209) -- stream resume with backoff, race condition fixes
  • Conversation lock released earlier (#10203) -- reduces contention on concurrent requests
  • Better error messages (#10207) -- known LLM errors now surface descriptive messages instead of generic failures
  • BYOK error tagging (#10204, #10311) -- errors now include is_byok flag for debugging
  • stream_incomplete diagnosis (#10033) -- BaseException catching to identify root causes

Model Support (14 features, 20 fixes)

  • GPT-5.4 -- full support including mini, nano, and fast variants (#9798, #10043)
  • GLM-5 -- GLM-5, GLM-5.1, GLM-5 Turbo, GLM-4.7 (#10317, #10285, #9994)
  • MiniMax M2.7 (#10093)
  • Baseten -- added as provider with full frontend integration, serverless auto mode, reasoning support (#10250, #9846, #9998)
  • Fireworks (#9780) and zAI coding provider (#10064)
  • Opus 4.6 / Sonnet 4.6 -- adaptive thinking tokens no longer incorrectly capped (#9795)
  • OpenAI proxy cleanup -- extra fields removed (#9949), parallel tool calling supported (#9879)

Security

  • Local filesystem access blocked via ImageContent bypass (#3256, #10329) -- file:/// URLs in images are now rejected
  • Internal MCP server targets blocked (#10009)
  • SECURITY.md added (#3228)

Infrastructure

  • Readiness enforcement scaffold (M1-M3 metrics pipeline) -- request pressure, DB pool, SSE lifecycle, event loop lag monitoring
  • Multi-agent tools moved to less privileged execution environment (#9779)
  • Subagent agents auto-hidden on create (#10096)
  • WebSocket transport for OpenAI Responses API (#9841)

For self-hosted users upgrading from 0.16.6: This release addresses the majority of issues reported in the community over the past month. The context window default change alone (#9993) eliminates the most common source of "everything breaks when I open ADE" complaints.

Full Changelog: 0.16.6...0.16.7