Skip to content

Commit 4776134

Browse files
committed
update Readme
1 parent 94e5d5d commit 4776134

File tree

2 files changed

+18
-7
lines changed

2 files changed

+18
-7
lines changed

examples/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,6 @@ This catalog highlights the examples shipped with Agent-lightning.
1010
| [search_r1](./search_r1) | Framework-free Search-R1 reinforcement learning training workflow with a retrieval backend. | **Unmaintained** — last verified with Agent-lightning v0.1.2 |
1111
| [spider](./spider) | Text-to-SQL reinforcement learning training on the Spider dataset using LangGraph. | [![spider workflow status](https://github.com/microsoft/agent-lightning/actions/workflows/badge-spider.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/badge-spider.yml) |
1212
| [unsloth](./unsloth) | Supervised fine-tuning example powered by Unsloth with 4-bit quantization and LoRA. | [![unsloth workflow status](https://github.com/microsoft/agent-lightning/actions/workflows/badge-unsloth.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/badge-unsloth.yml) |
13+
| [tinker](./tinker) | Reinforcement learning with Tinker as the backend training service. | **Unmaintained** — last verified with Agent-lightning v0.2.1 |
1314

1415
*NOTE: CI status avoid taking any workflow running with latest dependencies into account. That's why we reference the corresponding `badge-*` workflows instead. Each example's own README also displays its `examples-*` workflow status whenever the project is maintained by CI.*

examples/tinker/README.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This example shows how to use [Tinker's reinforcement-learning infrastructure](https://tinker-docs.thinkingmachines.ai/) as a fine-tuning backend for agents written against Agent-lightning. You author the agent exactly the way you would for deployment, while the bridge code reconstructs Tinker-compatible trajectories from Agent-lightning traces.
44

5-
**It's tested and compatible with Agent-lightning v0.2.x, but it's not yet maintained on CI due to the cost of running the Tinker training service.**
5+
**NOTE: The example is tested and compatible with Agent-lightning v0.2.x, but it's not yet maintained on CI due to the cost of running the Tinker training service.**
66

77
## How this differs from the original Tinker Cookbook RL recipe
88

@@ -55,9 +55,9 @@ class GuessNumberEnv:
5555
return message_to_tokens(self.turns), reward, episode_done
5656
```
5757

58-
As you might expect, when the agents get complex, writing agents in the callback-style will get more and more suffering. It requires you to break the control flow when you need a LLM call, thus making the code fragmented and hard to maintain.
58+
As agents grow more complex, writing them in callback style becomes increasingly painful. You have to break the control flow whenever an LLM call is required, which fragments the code and makes it harder to maintain.
5959

60-
Agent-lightning hides that translation step: you keep the first style for development and production, while the Agent-lightning framework queues tasks to the Agent-lightning store, rebuilds trajectories from spans, and feeds them to the training loop. This example shows how to make Tinker's original training loop work with Agent-lightning.
60+
Agent-lightning hides that translation step: you keep the first style for development and production, while the framework queues tasks to the store, rebuilds trajectories from spans, and feeds them to the training loop. This example shows how to make Tinker's original training loop work with Agent-lightning.
6161

6262
## Included files
6363

@@ -68,10 +68,20 @@ Agent-lightning hides that translation step: you keep the first style for develo
6868
| `q20_train.py` | Reinforcement-learning driver that adapts the Cookbook loop to Agent-lightning rollouts. Supports dry-run, distributed training, and search tool toggles. **Related to both Agent-lightning and Tinker.** |
6969
| `q20_evaluate.py` | Offline evaluator that reuses the CrewAI flow to benchmark any OpenAI- or Qwen-backed model against the provided dataset. **Related to Tinker only.** |
7070
| `q20_nouns.csv` | Categories and answers used for training and validation. Contains `split` and `search_enabled` metadata. |
71-
TODO: explain agl_tinker subfolder files.
71+
| `agl_tinker/` | Bridge package for integrating Agent-lightning with Tinker (see breakdown below). |
7272
| `tests/test_tinker_llm.py` | Sanity tests for the custom LiteLLM provider. Run with `pytest examples/tinker/tests`. |
7373
| `.env.example` | Template for environment variables required by LiteLLM, CrewAI helpers, and the hosted Tinker service. |
7474

75+
`agl_tinker/` components:
76+
77+
| Path | Purpose |
78+
| ---- | ------- |
79+
| `agl_tinker/algo.py` | Agent-lightning `Algorithm` wrapper that plugs the training loop into `agl.Trainer`. |
80+
| `agl_tinker/env.py` | Dummy env and dataset builders that adapt Agent-lightning tasks to Tinker expectations. |
81+
| `agl_tinker/llm.py` | LiteLLM custom provider backed by the Tinker sampling client. |
82+
| `agl_tinker/rollout.py` | Span-to-trajectory reconstruction and rollout batching helpers. |
83+
| `agl_tinker/train.py` | RL training loop adapted from the Tinker Cookbook. |
84+
7585
## Setup
7686

7787
**1. Install dependencies.** From the repo root:
@@ -93,7 +103,7 @@ cp examples/tinker/.env.example examples/tinker/.env
93103
- `WANDB_API_KEY`: optional, enables Weights & Biases logging when configured in `q20_train.py`.
94104
- `CREWAI_DISABLE_TELEMETRY=true`: keeps CrewAI from emitting its own telemetry so that Agent-lightning tracing stays coherent.
95105

96-
3. Load the environment before running commands, e.g. `dotenv run -- examples here` or export variables manually.
106+
3. Load the environment before running commands, e.g. `dotenv run -- <command>` or export the variables manually.
97107

98108
## Running the Hello 1024 example
99109

@@ -137,7 +147,7 @@ dotenv run python q20_train.py algo --model qwen30b --search --port 4747
137147
dotenv run python q20_train.py runner --port 4747 --n-runners 4
138148
```
139149

140-
`--model` selects the Tinker-hosted checkpoint (`qwen4b` or `qwen30b`). Add `--search` to enable the mocked search tool, which relies on the helper LLM defined in the environment variables (again, because we are short on budget to use a real search engine API). Training metrics and checkpoints are recorded under `examples/tinker/logs/q20_*`.
150+
`--model` selects the Tinker-hosted checkpoint (`qwen4b` or `qwen30b`). Add `--search` to enable the mocked search tool, which relies on the helper LLM defined in the environment variables (the example uses an LLM-powered search simulation instead of a real API). Training metrics and checkpoints are recorded under `examples/tinker/logs/q20_*`.
141151

142152
You can run additional runner processes at any time; they register with the store and start dequeuing tasks immediately.
143153

@@ -167,6 +177,6 @@ Because spans and rewards are emitted by the same rollout function you would dep
167177

168178
## Troubleshooting tips
169179

170-
- If the runner logs show `Triplet has no token_ids`, ensure your LiteLLM proxy returns logprobs and token IDs and the token IDs can be found in the store. The provided adapter requires them to rebuild trajectories. See the debugging tutorial for more details.
180+
- If the runner logs show `Triplet has no token_ids`, ensure your LiteLLM proxy returns logprobs and token IDs, and that the token IDs are present in the store. The provided adapter requires them to rebuild trajectories. See the debugging tutorial for more details.
171181
- CrewAI telemetry must stay disabled (see `.env.example`) so AgentOps traces remain self-contained; otherwise, you may see malformed traces.
172182
- Tune `learning_rate`, `batch_size` and `group_size` carefully. The training is sensitive to these hyper-parameters.

0 commit comments

Comments
 (0)