APIConnectionError when running letta-evals with letta_agent target

## Summary

`letta-evals run` fails with generic `APIConnectionError: Connection error.` when connecting to Letta Cloud agents, even though all isolated connectivity tests pass.

## Environment

- **letta-evals version**: 0.9.0
- **letta-client version**: 1.6.2
- **Python version**: 3.11
- **OS**: WSL (Windows Subsystem for Linux)
- **Network**: Direct connection to Letta Cloud (no proxy)

## Expected Behavior

`letta-evals run` should successfully connect to Letta Cloud agents and run evaluations.

## Actual Behavior

All samples fail with:
```
Failed to run agent for sample X after 0 retries. Final error: APIConnectionError: Connection error.
[X] ⚠ ERROR: Connection error.
```

The error message is very generic and provides no details about the underlying issue.

## What We've Verified

### ✅ All Isolated Tests Pass

1. **AsyncLetta client in isolation** (`scripts/test_async_letta.py`):
   - ✓ Works without `project_id`
   - ✓ Works with `project_id`
   - ✓ Can retrieve agents
   - ✓ Can stream messages

2. **AsyncLetta with anyio** (`scripts/test_letta_evals_connection.py`):
   - ✓ Works with `anyio.run()` (same async runtime as letta-evals)

3. **Exact streaming call** (`scripts/test_streaming_call.py`):
   - ✓ Replicates the exact `client.agents.messages.stream()` call that letta-evals makes
   - ✓ Works perfectly in isolation

4. **Suite validation**:
   - ✓ `letta-evals validate` passes successfully

5. **Network connectivity**:
   - ✓ DNS resolution works
   - ✓ Direct Letta API calls work
   - ✓ No proxy settings

### ❌ What Doesn't Work

- `letta-evals run` fails with `APIConnectionError` for all samples
- Error occurs when letta-evals tries to connect to the agent target
- Generic error message masks the underlying issue

## Configuration

**Suite YAML** (`evals/letta/suite.yaml`):
```yaml
name: casamigo-buyer-agent-eval
target:
  kind: letta_agent
  agent_id: <agent-id>
  project_id: project-NL-J6EjN2LNimjtna8pv
dataset: dataset.jsonl
graders:
  casamigo_rubric_grader:
    kind: model_judge
    prompt_path: rubric.md
gate:
  kind: simple
  metric_key: casamigo_rubric_grader
  op: gte
  value: 0.0
```

**Environment Variables**:
- `LETTA_API_KEY`: ✓ Set
- `LETTA_PROJECT_ID`: ✓ Set
- `OPENAI_API_KEY`: ✓ Set (for model_judge)

## Code Investigation

From `.venv/lib/python3.11/site-packages/letta_evals/runner.py`:
- Client is created at line 93: `self.client = AsyncLetta(**client_kwargs)`
- `client_kwargs` includes only `api_key`, `base_url`, and `timeout` (line 87-91)
- `project_id` is read but **not passed to AsyncLetta client** (line 85)
- However, `project_id` is passed to `target.run()` method (line 336 in `_get_or_run_trajectory`)

From `.venv/lib/python3.11/site-packages/letta_evals/targets/letta_agent.py`:
- Streaming call at line 105: `await self.client.agents.messages.stream(...)`
- This exact call works in isolation

## Attempted Fixes

1. ✅ Reduced concurrency (`--max-concurrent 1`) - no change
2. ✅ Network tweaks (timeouts, IPv4, proxy settings) - no change
3. ✅ Verified all isolated components work - all pass

## Hypothesis

The issue appears to be **specific to how letta-evals framework manages the client lifecycle** or handles errors, not with:
- Network connectivity
- AsyncLetta client itself
- Streaming API calls
- WSL networking
- anyio vs asyncio

The generic error message suggests exceptions are being caught and re-raised without preserving the original error details.

## Request

1. **More detailed error output**: The generic "Connection error" message makes debugging impossible. Can we get the underlying exception details?

2. **Investigation**: Since all isolated tests pass, the issue must be in how letta-evals uses the AsyncLetta client. Could you investigate:
   - Client lifecycle management
   - Error handling that might be masking real errors
   - Concurrent request handling
   - Any framework-specific configuration needed

3. **Workaround**: Is there a known workaround or configuration we're missing?

## Related Issues

- Issue #156: `letta_judge` grader with `agent_id` bug (different issue, already reported)

## Test Scripts

All test scripts are available in `casamigo-letta/backend/scripts/`:
- `test_async_letta.py` - Basic AsyncLetta connectivity
- `test_letta_evals_connection.py` - AsyncLetta with anyio
- `test_streaming_call.py` - Exact streaming call replication

## Additional Context

This is blocking our evaluation workflow. We have a workaround using `model_judge` (which requires OpenAI API key), but we'd prefer to use Letta's infrastructure directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APIConnectionError when running letta-evals with letta_agent target #158

Summary

Environment

Expected Behavior

Actual Behavior

What We've Verified

✅ All Isolated Tests Pass

❌ What Doesn't Work

Configuration

Code Investigation

Attempted Fixes

Hypothesis

Request

Related Issues

Test Scripts

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

APIConnectionError when running letta-evals with letta_agent target #158

Description

Summary

Environment

Expected Behavior

Actual Behavior

What We've Verified

✅ All Isolated Tests Pass

❌ What Doesn't Work

Configuration

Code Investigation

Attempted Fixes

Hypothesis

Request

Related Issues

Test Scripts

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions