Skip to content

Commit 496e793

Browse files
authored
Tinker Integration (#245)
1 parent 80d306f commit 496e793

File tree

19 files changed

+2926
-0
lines changed

19 files changed

+2926
-0
lines changed

.github/workflows/tests.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ jobs:
4141
--group torch-cpu \
4242
--group torch-stable \
4343
--group trl \
44+
--group tinker \
4445
--group agents \
4546
--no-default-groups
4647
if: matrix.setup == 'slow'

examples/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,6 @@ This catalog highlights the examples shipped with Agent-lightning.
1010
| [search_r1](./search_r1) | Framework-free Search-R1 reinforcement learning training workflow with a retrieval backend. | **Unmaintained** — last verified with Agent-lightning v0.1.2 |
1111
| [spider](./spider) | Text-to-SQL reinforcement learning training on the Spider dataset using LangGraph. | [![spider workflow status](https://github.com/microsoft/agent-lightning/actions/workflows/badge-spider.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/badge-spider.yml) |
1212
| [unsloth](./unsloth) | Supervised fine-tuning example powered by Unsloth with 4-bit quantization and LoRA. | [![unsloth workflow status](https://github.com/microsoft/agent-lightning/actions/workflows/badge-unsloth.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/badge-unsloth.yml) |
13+
| [tinker](./tinker) | Reinforcement learning with Tinker as the backend training service. | **Unmaintained** — last verified with Agent-lightning v0.2.1 |
1314

1415
*NOTE: CI status avoid taking any workflow running with latest dependencies into account. That's why we reference the corresponding `badge-*` workflows instead. Each example's own README also displays its `examples-*` workflow status whenever the project is maintained by CI.*

examples/tinker/.env.example

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
TINKER_API_KEY=<your_tinker_api_key>
2+
3+
# Commonly used keys for debugging, testing and training
4+
OPENAI_BASE_URL=<your_openai_base_url>
5+
OPENAI_API_KEY=<your_openai_api_key>
6+
WANDB_API_KEY=<your_wandb_api_key>
7+
8+
# Needed for CrewAI example: Temporarily disable CrewAI telemetry to make AgentOps work
9+
CREWAI_DISABLE_TELEMETRY=true

examples/tinker/README.md

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
# Tinker + Agent-lightning Integration
2+
3+
This example shows how to use [Tinker's reinforcement-learning infrastructure](https://tinker-docs.thinkingmachines.ai/) as a fine-tuning backend for agents written against Agent-lightning. You author the agent exactly the way you would for deployment, while the bridge code reconstructs Tinker-compatible trajectories from Agent-lightning traces.
4+
5+
**NOTE: The example is tested and compatible with Agent-lightning v0.2.x, but it's not yet maintained on CI due to the cost of running the Tinker training service.**
6+
7+
## How this differs from the original Tinker Cookbook RL recipe
8+
9+
Real-world agent apps orchestrate logic in familiar frameworks (CrewAI, LangChain, AutoGen, OpenAI Agents, etc.) or by calling OpenAI-compatible REST APIs. A simple number-guessing agent might look like this:
10+
11+
```python
12+
def guess_number_agent():
13+
client = openai.OpenAI()
14+
messages = [{"role": "user", "content": "Guess a number between 1 and 100."}]
15+
for _ in range(MAX_TURNS):
16+
response = client.chat.completions.create(model="gpt-4.1", messages=messages)
17+
response_content = response.choices[0].message.content
18+
messages.append({"role": "assistant", "content": response_content})
19+
guessed_number = extract_number(response_content)
20+
if guessed_number == gold_answer:
21+
return 1.0
22+
elif guessed_number < gold_answer:
23+
messages.append({"role": "user", "content": "Too low"})
24+
else:
25+
messages.append({"role": "user", "content": "Too high"})
26+
return 0.0
27+
```
28+
29+
The reference [Tinker Cookbook example](https://github.com/thinking-machines-lab/tinker-cookbook/tree/51d9e8226f2dcf82ceac272c734a5f6e3b4f0203/tinker_cookbook/recipes/multiplayer_rl/guess_number), however, expects you to rewrite the same logic into a callback-style `Env`, and it creates a simple loop to iterate between a language model (`TokenCompleter`) and the `Env`.
30+
31+
```python
32+
class GuessNumberEnv:
33+
def __init__(self, gold_answer: int):
34+
self.system_prompt: Message = {"role": "system", "content": SYSTEM_PROMPT}
35+
self.turns: list[Message] = []
36+
self.gold_answer: int = gold_answer
37+
38+
async def initial_observation(self) -> list[int]:
39+
return message_to_tokens(self.system_prompt)
40+
41+
async def step(self, action_tokens: list[int]) -> tuple[list[int], float, bool]:
42+
action_message = tokens_to_message(action_tokens)
43+
guessed_number = extract_number(action_message["content"])
44+
45+
if guessed_number == self.gold_answer:
46+
text, reward = "Correct", 1.0
47+
elif guessed_number < self.gold_answer:
48+
text, reward = "Too low", 0.0
49+
else:
50+
text, reward = "Too high", 0.0
51+
52+
self.turns.append(action_message)
53+
self.turns.append({"role": "assistant", "content": text})
54+
episode_done = reward == 1 or len(self.turns) // 2 >= MAX_TURNS
55+
return message_to_tokens(self.turns), reward, episode_done
56+
```
57+
58+
As agents grow more complex, writing them in callback style becomes increasingly painful. You have to break the control flow whenever an LLM call is required, which fragments the code and makes it harder to maintain.
59+
60+
Agent-lightning hides that translation step: you keep the first style for development and production, while the framework queues tasks to the store, rebuilds trajectories from spans, and feeds them to the training loop. This example shows how to make Tinker's original training loop work with Agent-lightning.
61+
62+
## Included files
63+
64+
| Path | Purpose |
65+
| ---- | ------- |
66+
| `hello.py` | Minimal end-to-end fine-tuning example. Trains a model to repeat small identity strings. |
67+
| `q20_agent.py` | CrewAI flow that powers the 20 Questions player, answerer, and mock search tool. Shared by training and evaluation. **Unrelated to Agent-lightning or Tinker.** |
68+
| `q20_train.py` | Reinforcement-learning driver that adapts the Cookbook loop to Agent-lightning rollouts. Supports dry-run, distributed training, and search tool toggles. **Related to both Agent-lightning and Tinker.** |
69+
| `q20_evaluate.py` | Offline evaluator that reuses the CrewAI flow to benchmark any OpenAI- or Qwen-backed model against the provided dataset. **Related to Tinker only.** |
70+
| `q20_nouns.csv` | Categories and answers used for training and validation. Contains `split` and `search_enabled` metadata. |
71+
| `agl_tinker/` | Bridge package for integrating Agent-lightning with Tinker (see breakdown below). |
72+
| `tests/test_tinker_llm.py` | Sanity tests for the custom LiteLLM provider. Run with `pytest examples/tinker/tests`. |
73+
| `.env.example` | Template for environment variables required by LiteLLM, CrewAI helpers, and the hosted Tinker service. |
74+
75+
`agl_tinker/` components:
76+
77+
| Path | Purpose |
78+
| ---- | ------- |
79+
| `agl_tinker/algo.py` | Agent-lightning `Algorithm` wrapper that plugs the training loop into `agl.Trainer`. |
80+
| `agl_tinker/env.py` | Dummy env and dataset builders that adapt Agent-lightning tasks to Tinker expectations. |
81+
| `agl_tinker/llm.py` | LiteLLM custom provider backed by the Tinker sampling client. |
82+
| `agl_tinker/rollout.py` | Span-to-trajectory reconstruction and rollout batching helpers. |
83+
| `agl_tinker/train.py` | RL training loop adapted from the Tinker Cookbook. |
84+
85+
## Setup
86+
87+
**1. Install dependencies.** From the repo root:
88+
89+
```bash
90+
uv sync --frozen --extra apo --group dev --group agents --group tinker
91+
```
92+
93+
If you are not using `uv`, make sure `tinker`, `tinker_cookbook`, `litellm`, `crewai`, and Agent-lightning are available in the same environment.
94+
95+
**2. Copy the environment template and fill in credentials:**
96+
97+
```bash
98+
cp examples/tinker/.env.example examples/tinker/.env
99+
```
100+
101+
- `OPENAI_API_KEY` / `OPENAI_BASE_URL`: routes helper agents (answerer, search, tool simulations) through a LiteLLM or OpenAI-compatible endpoint.
102+
- `TINKER_API_KEY`: required to talk to the hosted Tinker training service. Skip if you are using OpenAI models only.
103+
- `WANDB_API_KEY`: optional, enables Weights & Biases logging when configured in `q20_train.py`.
104+
- `CREWAI_DISABLE_TELEMETRY=true`: keeps CrewAI from emitting its own telemetry so that Agent-lightning tracing stays coherent.
105+
106+
3. Load the environment before running commands, e.g. `dotenv run -- <command>` or export the variables manually.
107+
108+
## Running the Hello 1024 example
109+
110+
This is the quickest way to see the integration in action. It fine-tunes a Qwen model so it introduces itself with the target identity.
111+
112+
**One-click workflow (spawns store, algorithm, and runners in a single process)**
113+
114+
```bash
115+
dotenv run python hello.py oneclick
116+
```
117+
118+
The script will pick free ports for the LiteLLM proxy and Agent-lightning store, then iterate through the synthetic dataset of identities.
119+
120+
**Distributed workflow (useful for inspecting each component)**
121+
122+
```bash
123+
agl store --port 4747
124+
dotenv run python hello.py algo
125+
dotenv run python hello.py runner
126+
```
127+
128+
Start the commands in separate terminals. The algorithm process connects to the existing store, while the runner process launches eight worker processes by default. Logs are written to `examples/tinker/logs/hello`.
129+
130+
## Training the 20 Questions agent
131+
132+
The 20 Questions setup mirrors the official Cookbook recipe but drives rollouts through the shared CrewAI flow.
133+
134+
**Dry run (in-memory store and LiteLLM proxy)**
135+
136+
```bash
137+
dotenv run python q20_train.py dryrun
138+
```
139+
140+
Useful to verify that the CrewAI flow, reward emission, and span reconstruction succeed on a handful of samples without touching the hosted Tinker service.
141+
142+
**Full distributed training**
143+
144+
```bash
145+
agl store --port 4747
146+
dotenv run python q20_train.py algo --model qwen30b --search --port 4747
147+
dotenv run python q20_train.py runner --port 4747 --n-runners 4
148+
```
149+
150+
`--model` selects the Tinker-hosted checkpoint (`qwen4b` or `qwen30b`). Add `--search` to enable the mocked search tool, which relies on the helper LLM defined in the environment variables (the example uses an LLM-powered search simulation instead of a real API). Training metrics and checkpoints are recorded under `examples/tinker/logs/q20_*`.
151+
152+
You can run additional runner processes at any time; they register with the store and start dequeuing tasks immediately.
153+
154+
## Evaluating a model on 20 Questions
155+
156+
Reuse the CrewAI flow to benchmark any OpenAI-compatible model (hosted on Tinker, OpenAI, or another LiteLLM backend):
157+
158+
```bash
159+
dotenv run python q20_evaluate.py \
160+
--model Qwen/Qwen3-30B-A3B-Instruct-2507 \
161+
--output-file logs/twenty_questions_results.jsonl \
162+
--search
163+
```
164+
165+
Results append to the specified JSONL file so you can compute aggregate stats later.
166+
167+
## How the bridge works
168+
169+
The `agl_tinker` package keeps the rest of the Tinker or Tinker Cookbook's codebase untouched by emulating the interfaces it expects:
170+
171+
- `AGLDatasetBuilder` and `AGLDummyEnv` wrap plain Agent-lightning datasets so batches still yield Tinker `EnvGroupBuilder` objects, even though rollouts run remotely.
172+
- `do_group_of_group_rollouts` (in [`rollout.py`](agl_tinker/rollout.py)) enqueues tasks to the Agent-lightning store, waits for runners to finish, then reconstructs `Trajectory` objects from span triplets collected by `TracerTraceToTriplet`.
173+
- `TinkerLLM` implements LiteLLM's `CustomLLM` so the training loop can update sampling clients and expose them through an OpenAI-compatible endpoint without rewriting agent code.
174+
- `agl_tinker.algo.Tinker` satisfies Agent-lightning's `Algorithm` contract, meaning you can launch training via `agl.Trainer` alongside other algorithms, schedulers, or resources.
175+
176+
Because spans and rewards are emitted by the same rollout function you would deploy, evaluation and production stay in sync—no separate simulator code paths to maintain.
177+
178+
## Troubleshooting tips
179+
180+
- If the runner logs show `Triplet has no token_ids`, ensure your LiteLLM proxy returns logprobs and token IDs, and that the token IDs are present in the store. The provided adapter requires them to rebuild trajectories. See the debugging tutorial for more details.
181+
- CrewAI telemetry must stay disabled (see `.env.example`) so AgentOps traces remain self-contained; otherwise, you may see malformed traces.
182+
- Tune `learning_rate`, `batch_size` and `group_size` carefully. The training is sensitive to these hyper-parameters.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Copyright (c) Microsoft. All rights reserved.

examples/tinker/agl_tinker/algo.py

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Copyright (c) Microsoft. All rights reserved.
2+
3+
"""Agent-lightning glue around the Tinker reinforcement-learning algorithm.
4+
5+
This implements Agent-lightning's [`Algorithm`][agentlightning.Algorithm] interface
6+
for quick one-click running.
7+
"""
8+
9+
from __future__ import annotations
10+
11+
import logging
12+
from typing import Any, Optional
13+
14+
import chz
15+
16+
from agentlightning.adapter import TracerTraceToTriplet
17+
from agentlightning.algorithm import Algorithm
18+
from agentlightning.llm_proxy import LLMProxy
19+
from agentlightning.types import Dataset
20+
21+
from .train import Config, main_training_loop
22+
23+
logger = logging.getLogger(__name__)
24+
25+
26+
class Tinker(Algorithm):
27+
"""A wrapper around `agl_tinker.train` that uses Agent-lightning resources.
28+
29+
Compared to the `agl_tinker.train` function, this class:
30+
31+
* Pulls the store, tracer adapter, and LiteLLM proxy from the ambient
32+
Agent-lightning runtime instead of constructing its own.
33+
* Replaces the dataset configured in ``Config`` with the datasets provided
34+
by Agent-lightning so existing resource loaders (e.g., `agl.Dataset`)
35+
keep working.
36+
* Ensures the adapter is `TracerTraceToTriplet` because rollouts are
37+
reconstructed from spans rather than via Tinker's native data construction.
38+
"""
39+
40+
def __init__(self, config: Config) -> None:
41+
"""Store the training configuration."""
42+
self.config = config
43+
44+
async def run(
45+
self, train_dataset: Optional[Dataset[Any]] = None, val_dataset: Optional[Dataset[Any]] = None
46+
) -> None:
47+
"""Execute the Tinker training loop with Agent-lightning resources.
48+
49+
Args:
50+
train_dataset: Dataset injected by Agent-lightning for training.
51+
val_dataset: Dataset injected by Agent-lightning for evaluation.
52+
53+
Raises:
54+
ValueError: If mandatory datasets are missing or if the adapter is
55+
not a [`TracerTraceToTriplet`][agentlightning.TracerTraceToTriplet] instance.
56+
57+
This mirrors `agl_tinker.train.main` but instead of launching
58+
a brand-new LiteLLM proxy it reuses (or lazily creates) the proxy
59+
managed by the Algorithm base class, so rollouts stay visible to the
60+
Agent-lightning store.
61+
"""
62+
if train_dataset is None or val_dataset is None:
63+
raise ValueError("train_dataset and val_dataset are required")
64+
65+
config = chz.replace( # type: ignore
66+
self.config,
67+
dataset_builder=chz.replace( # type: ignore
68+
self.config.dataset_builder, train_dataset=train_dataset, val_dataset=val_dataset
69+
),
70+
)
71+
72+
store = self.get_store()
73+
adapter = self.get_adapter()
74+
if not isinstance(adapter, TracerTraceToTriplet):
75+
raise ValueError("Adapter must be a TracerTraceToTriplet")
76+
llm_proxy = self.get_llm_proxy()
77+
if llm_proxy is None:
78+
logger.warning("No LLM proxy found, creating one for you.")
79+
80+
llm_proxy = LLMProxy(
81+
port=config.llm_proxy_port,
82+
model_list=[],
83+
store=store,
84+
)
85+
86+
await main_training_loop(config, store, adapter, llm_proxy) # type: ignore

0 commit comments

Comments
 (0)