Skip to content

APIConnectionError when running letta-evals with letta_agent target #158

@jasonlarkin

Description

@jasonlarkin

Summary

letta-evals run fails with generic APIConnectionError: Connection error. when connecting to Letta Cloud agents, even though all isolated connectivity tests pass.

Environment

  • letta-evals version: 0.9.0
  • letta-client version: 1.6.2
  • Python version: 3.11
  • OS: WSL (Windows Subsystem for Linux)
  • Network: Direct connection to Letta Cloud (no proxy)

Expected Behavior

letta-evals run should successfully connect to Letta Cloud agents and run evaluations.

Actual Behavior

All samples fail with:

Failed to run agent for sample X after 0 retries. Final error: APIConnectionError: Connection error.
[X] ⚠ ERROR: Connection error.

The error message is very generic and provides no details about the underlying issue.

What We've Verified

✅ All Isolated Tests Pass

  1. AsyncLetta client in isolation (scripts/test_async_letta.py):

    • ✓ Works without project_id
    • ✓ Works with project_id
    • ✓ Can retrieve agents
    • ✓ Can stream messages
  2. AsyncLetta with anyio (scripts/test_letta_evals_connection.py):

    • ✓ Works with anyio.run() (same async runtime as letta-evals)
  3. Exact streaming call (scripts/test_streaming_call.py):

    • ✓ Replicates the exact client.agents.messages.stream() call that letta-evals makes
    • ✓ Works perfectly in isolation
  4. Suite validation:

    • letta-evals validate passes successfully
  5. Network connectivity:

    • ✓ DNS resolution works
    • ✓ Direct Letta API calls work
    • ✓ No proxy settings

❌ What Doesn't Work

  • letta-evals run fails with APIConnectionError for all samples
  • Error occurs when letta-evals tries to connect to the agent target
  • Generic error message masks the underlying issue

Configuration

Suite YAML (evals/letta/suite.yaml):

name: casamigo-buyer-agent-eval
target:
  kind: letta_agent
  agent_id: <agent-id>
  project_id: project-NL-J6EjN2LNimjtna8pv
dataset: dataset.jsonl
graders:
  casamigo_rubric_grader:
    kind: model_judge
    prompt_path: rubric.md
gate:
  kind: simple
  metric_key: casamigo_rubric_grader
  op: gte
  value: 0.0

Environment Variables:

  • LETTA_API_KEY: ✓ Set
  • LETTA_PROJECT_ID: ✓ Set
  • OPENAI_API_KEY: ✓ Set (for model_judge)

Code Investigation

From .venv/lib/python3.11/site-packages/letta_evals/runner.py:

  • Client is created at line 93: self.client = AsyncLetta(**client_kwargs)
  • client_kwargs includes only api_key, base_url, and timeout (line 87-91)
  • project_id is read but not passed to AsyncLetta client (line 85)
  • However, project_id is passed to target.run() method (line 336 in _get_or_run_trajectory)

From .venv/lib/python3.11/site-packages/letta_evals/targets/letta_agent.py:

  • Streaming call at line 105: await self.client.agents.messages.stream(...)
  • This exact call works in isolation

Attempted Fixes

  1. ✅ Reduced concurrency (--max-concurrent 1) - no change
  2. ✅ Network tweaks (timeouts, IPv4, proxy settings) - no change
  3. ✅ Verified all isolated components work - all pass

Hypothesis

The issue appears to be specific to how letta-evals framework manages the client lifecycle or handles errors, not with:

  • Network connectivity
  • AsyncLetta client itself
  • Streaming API calls
  • WSL networking
  • anyio vs asyncio

The generic error message suggests exceptions are being caught and re-raised without preserving the original error details.

Request

  1. More detailed error output: The generic "Connection error" message makes debugging impossible. Can we get the underlying exception details?

  2. Investigation: Since all isolated tests pass, the issue must be in how letta-evals uses the AsyncLetta client. Could you investigate:

    • Client lifecycle management
    • Error handling that might be masking real errors
    • Concurrent request handling
    • Any framework-specific configuration needed
  3. Workaround: Is there a known workaround or configuration we're missing?

Related Issues

Test Scripts

All test scripts are available in casamigo-letta/backend/scripts/:

  • test_async_letta.py - Basic AsyncLetta connectivity
  • test_letta_evals_connection.py - AsyncLetta with anyio
  • test_streaming_call.py - Exact streaming call replication

Additional Context

This is blocking our evaluation workflow. We have a workaround using model_judge (which requires OpenAI API key), but we'd prefer to use Letta's infrastructure directly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions