Summary
letta-evals run fails with generic APIConnectionError: Connection error. when connecting to Letta Cloud agents, even though all isolated connectivity tests pass.
Environment
- letta-evals version: 0.9.0
- letta-client version: 1.6.2
- Python version: 3.11
- OS: WSL (Windows Subsystem for Linux)
- Network: Direct connection to Letta Cloud (no proxy)
Expected Behavior
letta-evals run should successfully connect to Letta Cloud agents and run evaluations.
Actual Behavior
All samples fail with:
Failed to run agent for sample X after 0 retries. Final error: APIConnectionError: Connection error.
[X] ⚠ ERROR: Connection error.
The error message is very generic and provides no details about the underlying issue.
What We've Verified
✅ All Isolated Tests Pass
-
AsyncLetta client in isolation (scripts/test_async_letta.py):
- ✓ Works without
project_id
- ✓ Works with
project_id
- ✓ Can retrieve agents
- ✓ Can stream messages
-
AsyncLetta with anyio (scripts/test_letta_evals_connection.py):
- ✓ Works with
anyio.run() (same async runtime as letta-evals)
-
Exact streaming call (scripts/test_streaming_call.py):
- ✓ Replicates the exact
client.agents.messages.stream() call that letta-evals makes
- ✓ Works perfectly in isolation
-
Suite validation:
- ✓
letta-evals validate passes successfully
-
Network connectivity:
- ✓ DNS resolution works
- ✓ Direct Letta API calls work
- ✓ No proxy settings
❌ What Doesn't Work
letta-evals run fails with APIConnectionError for all samples
- Error occurs when letta-evals tries to connect to the agent target
- Generic error message masks the underlying issue
Configuration
Suite YAML (evals/letta/suite.yaml):
name: casamigo-buyer-agent-eval
target:
kind: letta_agent
agent_id: <agent-id>
project_id: project-NL-J6EjN2LNimjtna8pv
dataset: dataset.jsonl
graders:
casamigo_rubric_grader:
kind: model_judge
prompt_path: rubric.md
gate:
kind: simple
metric_key: casamigo_rubric_grader
op: gte
value: 0.0
Environment Variables:
LETTA_API_KEY: ✓ Set
LETTA_PROJECT_ID: ✓ Set
OPENAI_API_KEY: ✓ Set (for model_judge)
Code Investigation
From .venv/lib/python3.11/site-packages/letta_evals/runner.py:
- Client is created at line 93:
self.client = AsyncLetta(**client_kwargs)
client_kwargs includes only api_key, base_url, and timeout (line 87-91)
project_id is read but not passed to AsyncLetta client (line 85)
- However,
project_id is passed to target.run() method (line 336 in _get_or_run_trajectory)
From .venv/lib/python3.11/site-packages/letta_evals/targets/letta_agent.py:
- Streaming call at line 105:
await self.client.agents.messages.stream(...)
- This exact call works in isolation
Attempted Fixes
- ✅ Reduced concurrency (
--max-concurrent 1) - no change
- ✅ Network tweaks (timeouts, IPv4, proxy settings) - no change
- ✅ Verified all isolated components work - all pass
Hypothesis
The issue appears to be specific to how letta-evals framework manages the client lifecycle or handles errors, not with:
- Network connectivity
- AsyncLetta client itself
- Streaming API calls
- WSL networking
- anyio vs asyncio
The generic error message suggests exceptions are being caught and re-raised without preserving the original error details.
Request
-
More detailed error output: The generic "Connection error" message makes debugging impossible. Can we get the underlying exception details?
-
Investigation: Since all isolated tests pass, the issue must be in how letta-evals uses the AsyncLetta client. Could you investigate:
- Client lifecycle management
- Error handling that might be masking real errors
- Concurrent request handling
- Any framework-specific configuration needed
-
Workaround: Is there a known workaround or configuration we're missing?
Related Issues
Test Scripts
All test scripts are available in casamigo-letta/backend/scripts/:
test_async_letta.py - Basic AsyncLetta connectivity
test_letta_evals_connection.py - AsyncLetta with anyio
test_streaming_call.py - Exact streaming call replication
Additional Context
This is blocking our evaluation workflow. We have a workaround using model_judge (which requires OpenAI API key), but we'd prefer to use Letta's infrastructure directly.
Summary
letta-evals runfails with genericAPIConnectionError: Connection error.when connecting to Letta Cloud agents, even though all isolated connectivity tests pass.Environment
Expected Behavior
letta-evals runshould successfully connect to Letta Cloud agents and run evaluations.Actual Behavior
All samples fail with:
The error message is very generic and provides no details about the underlying issue.
What We've Verified
✅ All Isolated Tests Pass
AsyncLetta client in isolation (
scripts/test_async_letta.py):project_idproject_idAsyncLetta with anyio (
scripts/test_letta_evals_connection.py):anyio.run()(same async runtime as letta-evals)Exact streaming call (
scripts/test_streaming_call.py):client.agents.messages.stream()call that letta-evals makesSuite validation:
letta-evals validatepasses successfullyNetwork connectivity:
❌ What Doesn't Work
letta-evals runfails withAPIConnectionErrorfor all samplesConfiguration
Suite YAML (
evals/letta/suite.yaml):Environment Variables:
LETTA_API_KEY: ✓ SetLETTA_PROJECT_ID: ✓ SetOPENAI_API_KEY: ✓ Set (for model_judge)Code Investigation
From
.venv/lib/python3.11/site-packages/letta_evals/runner.py:self.client = AsyncLetta(**client_kwargs)client_kwargsincludes onlyapi_key,base_url, andtimeout(line 87-91)project_idis read but not passed to AsyncLetta client (line 85)project_idis passed totarget.run()method (line 336 in_get_or_run_trajectory)From
.venv/lib/python3.11/site-packages/letta_evals/targets/letta_agent.py:await self.client.agents.messages.stream(...)Attempted Fixes
--max-concurrent 1) - no changeHypothesis
The issue appears to be specific to how letta-evals framework manages the client lifecycle or handles errors, not with:
The generic error message suggests exceptions are being caught and re-raised without preserving the original error details.
Request
More detailed error output: The generic "Connection error" message makes debugging impossible. Can we get the underlying exception details?
Investigation: Since all isolated tests pass, the issue must be in how letta-evals uses the AsyncLetta client. Could you investigate:
Workaround: Is there a known workaround or configuration we're missing?
Related Issues
letta_judgegrader withagent_idbug (different issue, already reported)Test Scripts
All test scripts are available in
casamigo-letta/backend/scripts/:test_async_letta.py- Basic AsyncLetta connectivitytest_letta_evals_connection.py- AsyncLetta with anyiotest_streaming_call.py- Exact streaming call replicationAdditional Context
This is blocking our evaluation workflow. We have a workaround using
model_judge(which requires OpenAI API key), but we'd prefer to use Letta's infrastructure directly.