You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/README.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,5 +10,6 @@ This catalog highlights the examples shipped with Agent-lightning.
10
10
|[search_r1](./search_r1)| Framework-free Search-R1 reinforcement learning training workflow with a retrieval backend. |**Unmaintained** — last verified with Agent-lightning v0.1.2 |
11
11
|[spider](./spider)| Text-to-SQL reinforcement learning training on the Spider dataset using LangGraph. |[](https://github.com/microsoft/agent-lightning/actions/workflows/badge-spider.yml)|
12
12
|[unsloth](./unsloth)| Supervised fine-tuning example powered by Unsloth with 4-bit quantization and LoRA. |[](https://github.com/microsoft/agent-lightning/actions/workflows/badge-unsloth.yml)|
13
+
|[tinker](./tinker)| Reinforcement learning with Tinker as the backend training service. |**Unmaintained** — last verified with Agent-lightning v0.2.1 |
13
14
14
15
*NOTE: CI status avoid taking any workflow running with latest dependencies into account. That's why we reference the corresponding `badge-*` workflows instead. Each example's own README also displays its `examples-*` workflow status whenever the project is maintained by CI.*
Copy file name to clipboardExpand all lines: examples/tinker/README.md
+17-7Lines changed: 17 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
This example shows how to use [Tinker's reinforcement-learning infrastructure](https://tinker-docs.thinkingmachines.ai/) as a fine-tuning backend for agents written against Agent-lightning. You author the agent exactly the way you would for deployment, while the bridge code reconstructs Tinker-compatible trajectories from Agent-lightning traces.
4
4
5
-
**It's tested and compatible with Agent-lightning v0.2.x, but it's not yet maintained on CI due to the cost of running the Tinker training service.**
5
+
**NOTE: The example is tested and compatible with Agent-lightning v0.2.x, but it's not yet maintained on CI due to the cost of running the Tinker training service.**
6
6
7
7
## How this differs from the original Tinker Cookbook RL recipe
As you might expect, when the agents get complex, writing agents in the callback-style will get more and more suffering. It requires you to break the control flow when you need a LLM call, thus making the code fragmented and hard to maintain.
58
+
As agents grow more complex, writing them in callbackstyle becomes increasingly painful. You have to break the control flow whenever an LLM call is required, which fragments the code and makes it harder to maintain.
59
59
60
-
Agent-lightning hides that translation step: you keep the first style for development and production, while the Agent-lightning framework queues tasks to the Agent-lightning store, rebuilds trajectories from spans, and feeds them to the training loop. This example shows how to make Tinker's original training loop work with Agent-lightning.
60
+
Agent-lightning hides that translation step: you keep the first style for development and production, while the framework queues tasks to the store, rebuilds trajectories from spans, and feeds them to the training loop. This example shows how to make Tinker's original training loop work with Agent-lightning.
61
61
62
62
## Included files
63
63
@@ -68,10 +68,20 @@ Agent-lightning hides that translation step: you keep the first style for develo
68
68
|`q20_train.py`| Reinforcement-learning driver that adapts the Cookbook loop to Agent-lightning rollouts. Supports dry-run, distributed training, and search tool toggles. **Related to both Agent-lightning and Tinker.**|
69
69
|`q20_evaluate.py`| Offline evaluator that reuses the CrewAI flow to benchmark any OpenAI- or Qwen-backed model against the provided dataset. **Related to Tinker only.**|
70
70
|`q20_nouns.csv`| Categories and answers used for training and validation. Contains `split` and `search_enabled` metadata. |
71
-
TODO: explain agl_tinker subfolder files.
71
+
|`agl_tinker/`| Bridge package for integrating Agent-lightning with Tinker (see breakdown below). |
72
72
|`tests/test_tinker_llm.py`| Sanity tests for the custom LiteLLM provider. Run with `pytest examples/tinker/tests`. |
73
73
|`.env.example`| Template for environment variables required by LiteLLM, CrewAI helpers, and the hosted Tinker service. |
74
74
75
+
`agl_tinker/` components:
76
+
77
+
| Path | Purpose |
78
+
| ---- | ------- |
79
+
|`agl_tinker/algo.py`| Agent-lightning `Algorithm` wrapper that plugs the training loop into `agl.Trainer`. |
80
+
|`agl_tinker/env.py`| Dummy env and dataset builders that adapt Agent-lightning tasks to Tinker expectations. |
81
+
|`agl_tinker/llm.py`| LiteLLM custom provider backed by the Tinker sampling client. |
82
+
|`agl_tinker/rollout.py`| Span-to-trajectory reconstruction and rollout batching helpers. |
83
+
|`agl_tinker/train.py`| RL training loop adapted from the Tinker Cookbook. |
dotenv run python q20_train.py runner --port 4747 --n-runners 4
138
148
```
139
149
140
-
`--model` selects the Tinker-hosted checkpoint (`qwen4b` or `qwen30b`). Add `--search` to enable the mocked search tool, which relies on the helper LLM defined in the environment variables (again, because we are short on budget to use a real search engine API). Training metrics and checkpoints are recorded under `examples/tinker/logs/q20_*`.
150
+
`--model` selects the Tinker-hosted checkpoint (`qwen4b` or `qwen30b`). Add `--search` to enable the mocked search tool, which relies on the helper LLM defined in the environment variables (the example uses an LLM-powered search simulation instead of a real API). Training metrics and checkpoints are recorded under `examples/tinker/logs/q20_*`.
141
151
142
152
You can run additional runner processes at any time; they register with the store and start dequeuing tasks immediately.
143
153
@@ -167,6 +177,6 @@ Because spans and rewards are emitted by the same rollout function you would dep
167
177
168
178
## Troubleshooting tips
169
179
170
-
- If the runner logs show `Triplet has no token_ids`, ensure your LiteLLM proxy returns logprobs and token IDs and the token IDs can be found in the store. The provided adapter requires them to rebuild trajectories. See the debugging tutorial for more details.
180
+
- If the runner logs show `Triplet has no token_ids`, ensure your LiteLLM proxy returns logprobs and token IDs, and that the token IDs are present in the store. The provided adapter requires them to rebuild trajectories. See the debugging tutorial for more details.
171
181
- CrewAI telemetry must stay disabled (see `.env.example`) so AgentOps traces remain self-contained; otherwise, you may see malformed traces.
172
182
- Tune `learning_rate`, `batch_size` and `group_size` carefully. The training is sensitive to these hyper-parameters.
0 commit comments