letta_judge grader with agent_id validates but fails at runtime

## Problem Summary

The `letta_judge` grader with `agent_id` parameter validates successfully but fails at runtime with:
```
FileNotFoundError: Agent file not found: /path/to/.venv/lib/python3.11/site-packages/letta_evals/graders/letta-evals-judge-agent.af
```

## Expected Behavior

When using `kind: letta_judge` with `agent_id` in suite.yaml, the grader should use the specified Letta Cloud agent (identified by agent_id) to perform evaluations, leveraging Letta's built-in LLM access without requiring an OpenAI API key.

## Actual Behavior

The suite configuration validates successfully:
```bash
$ letta-evals validate evals/letta/suite.yaml
✓ Suite 'casamigo-buyer-agent-eval' is valid
```

But at runtime, it fails because the code looks for a bundled agent file (`letta-evals-judge-agent.af`) that doesn't exist, instead of using the `agent_id` provided.

## Configuration

```yaml
graders:
  casamigo_rubric_grader:
    kind: letta_judge
    agent_id: agent-10d4286d-74a9-4fc6-82ff-a751bda72449
    prompt_path: rubric.md
    extractor: last_assistant
```

## Environment

- letta-evals version: 0.9.0
- Python: 3.11
- Platform: Linux (WSL)

## Error Traceback

```
File "/mnt/c/Users/jason/Documents/casamigo/casamigo-letta/backend/.venv/lib/python3.11/site-packages/letta_evals/graders/agent_judge.py", line 119, in _validate_agent_file
    raise FileNotFoundError(f"Agent file not found: {self.agent_file}")
FileNotFoundError: Agent file not found: /mnt/c/Users/jason/Documents/casamigo/casamigo-letta/backend/.venv/lib/python3.11/site-packages/letta_evals/graders/letta-evals-judge-agent.af
```

## Evidence

1. **Configuration validation passes**: The suite.yaml with `agent_id` validates successfully, indicating the schema accepts this parameter:
   ```bash
   $ letta-evals validate evals/letta/suite.yaml
   ✓ Suite 'casamigo-buyer-agent-eval' is valid
   ```

2. **Documentation indicates support**: Official documentation suggests `letta_judge` with `agent_id` is the correct approach to use Letta's built-in LLM access without requiring OpenAI API keys.

3. **Runtime implementation behavior**: Despite validation passing, the runtime code path in `AgentJudgeGrader._validate_agent_file()` always looks for a bundled agent file (`letta-evals-judge-agent.af`) regardless of whether `agent_id` is provided. The code appears to:
   - Always call `_validate_agent_file()` during initialization
   - Look for a hardcoded file path (`letta-evals-judge-agent.af`)
   - Not check for or use the provided `agent_id` parameter

4. **Error traceback location**: The error occurs in `agent_judge.py` line 119 in `_validate_agent_file()`, which is called during initialization, suggesting the file validation happens unconditionally.

## Steps to Reproduce

1. Create a Letta agent to act as grader (e.g., with `submit_grade` tool)
2. Configure suite.yaml with `kind: letta_judge` and `agent_id: <your-agent-id>`
3. Run `letta-evals validate suite.yaml` - ✅ validates successfully
4. Run `letta-evals run suite.yaml` - ❌ fails with FileNotFoundError

## Expected Fix

The `AgentJudgeGrader` should:
1. When `agent_id` is provided, use that agent via Letta Cloud API
2. When `agent_file` is provided, use the local `.af` file
3. Not look for a bundled `letta-evals-judge-agent.af` file when `agent_id` is specified

## Related Context

The `model_judge` grader bypasses Letta's built-in GPT-4.1 access and requires a direct OpenAI API key, even though Letta provides model access. The `letta_judge` (agent-as-judge) approach is documented as the solution to use Letta's built-in LLM access, but the runtime implementation appears to only support `agent_file` and not `agent_id`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

letta_judge grader with agent_id validates but fails at runtime #156

Problem Summary

Expected Behavior

Actual Behavior

Configuration

Environment

Error Traceback

Evidence

Steps to Reproduce

Expected Fix

Related Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

letta_judge grader with agent_id validates but fails at runtime #156

Description

Problem Summary

Expected Behavior

Actual Behavior

Configuration

Environment

Error Traceback

Evidence

Steps to Reproduce

Expected Fix

Related Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions