You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This directory contains a collection of examples that demonstrate how to use the TRL library for various applications. We provide both **scripts** for advanced use cases and **notebooks** for an easy start and interactive experimentation.
4
4
5
-
The examples should work in any of the following settings (with the same script):
5
+
The notebooks are self-contained and can run on **free Colab**, while the scripts can run on **single GPU, multi-GPU, or DeepSpeed** setups.
- fp16 (mixed-precision), fp32 (normal precision), or bf16 (bfloat16 precision)
7
+
**Getting Started**
11
8
12
-
To run it in each of these various modes, first initialize the accelerate configuration with `accelerate config`.
13
-
14
-
To train with a 4-bit or 8-bit model, please run:
9
+
Install TRL and additional dependencies as follows:
15
10
16
11
```bash
17
12
pip install --upgrade trl[quantization]
18
13
```
19
14
20
-
## Accelerate Config
15
+
Check for additional optional dependencies [here](https://github.com/huggingface/trl/blob/main/pyproject.toml).
21
16
22
-
For all the examples, you'll need to generate a 🤗 Accelerate config file with:
17
+
For scripts, you will also need an 🤗 Accelerate config (recommended for multi-gpu settings):
23
18
24
-
```shell
19
+
```bash
25
20
accelerate config # will prompt you to define the training configuration
26
21
```
27
22
28
-
Then, it is encouraged to launch jobs with `accelerate launch`!
23
+
This allows you to run scripts with `accelerate launch` in single or multi-GPU settings.
24
+
25
+
## Notebooks
26
+
27
+
These notebooks are easier to run and are designed for quick experimentation with TRL. The list of notebooks can be found in the [`trl/examples/notebooks/`](https://github.com/huggingface/trl/tree/main/examples/notebooks/) directory.
28
+
29
+
30
+
| Notebook | Description | Open in Colab |
31
+
|----------|-------------|---------------|
32
+
|[`sft_trl_lora_qlora.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_trl_lora_qlora.ipynb)| Supervised Fine-Tuning (SFT) using QLoRA on free Colab |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb)|
33
+
|[`sft_qwen_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_qwen_vl.ipynb)| Supervised Fine-Tuning (SFT) Qwen3-VL with QLoRA using TRL on free Colab |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_qwen_vl.ipynb)|
34
+
|[`grpo_qwen3_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/grpo_qwen3_vl.ipynb)| GRPO Qwen3-VL with QLoRA using TRL on free Colab |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_qwen3_vl.ipynb)|
35
+
36
+
Legacy / Older Notebooks
29
37
30
-
## Maintained Examples
38
+
-[`best_of_n.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/best_of_n.ipynb): This notebook demonstrates how to use the "Best of N" sampling strategy using TRL when fine-tuning your model with PPO.
39
+
-[`gpt2-sentiment.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/gpt2-sentiment.ipynb): This notebook demonstrates how to reproduce the GPT2 imdb sentiment tuning example on a jupyter notebook.
40
+
-[`gpt2-sentiment-control.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/gpt2-sentiment-control.ipynb): This notebook demonstrates how to reproduce the GPT2 sentiment control example on a jupyter notebook.
31
41
32
-
Scripts can be used as examples of how to use TRL trainers. They are located in the [`trl/scripts`](https://github.com/huggingface/trl/blob/main/trl/scripts) directory. Additionally, we provide examples in the [`examples/scripts`](https://github.com/huggingface/trl/blob/main/examples/scripts) directory. These examples are maintained and tested regularly.
42
+
## Scripts
33
43
34
-
| File | Description |
44
+
Scripts are maintained in the [`trl/scripts`](https://github.com/huggingface/trl/blob/main/trl/scripts) and [`examples/scripts`](https://github.com/huggingface/trl/blob/main/examples/scripts) directories. They show how to use different trainers such as `SFTTrainer`, `PPOTrainer`, `DPOTrainer`, `GRPOTrainer`, and more.
45
+
46
+
File | Description |
35
47
| --- | --- |
36
-
|[`examples/scripts/bco.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/bco.py)| This script shows how to use the [`KTOTrainer`] with the BCO loss to fine-tune a model to increase instruction-following, truthfulness, honesty and helpfulness using the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset. |
48
+
|[`examples/scripts/bco.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/bco.py)| This script shows how to use the [`KTOTrainer`] with the BCO loss to fine-tune a model to increase instruction-following, truthfulness, honesty, and helpfulness using the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset. |
37
49
|[`examples/scripts/cpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/cpo.py)| This script shows how to use the [`CPOTrainer`] to fine-tune a model to increase helpfulness and harmlessness using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset. |
38
50
|[`trl/scripts/dpo.py`](https://github.com/huggingface/trl/blob/main/trl/scripts/dpo.py)| This script shows how to use the [`DPOTrainer`] to fine-tune a model. |
39
51
|[`examples/scripts/dpo_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo_vlm.py)| This script shows how to use the [`DPOTrainer`] to fine-tune a Vision Language Model to reduce hallucinations using the [openbmb/RLAIF-V-Dataset](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset) dataset. |
@@ -52,38 +64,28 @@ Scripts can be used as examples of how to use TRL trainers. They are located in
52
64
|[`examples/scripts/ppo/ppo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/ppo/ppo.py)| This script shows how to use the [`PPOTrainer`] to fine-tune a model to improve its ability to continue text with positive sentiment or physically descriptive language. |
53
65
|[`examples/scripts/ppo/ppo_tldr.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/ppo/ppo_tldr.py)| This script shows how to use the [`PPOTrainer`] to fine-tune a model to improve its ability to generate TL;DR summaries. |
54
66
|[`examples/scripts/prm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/prm.py)| This script shows how to use the [`PRMTrainer`] to fine-tune a Process-supervised Reward Model (PRM). |
55
-
|[`examples/scripts/reward_modeling.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/reward_modeling.py)| This script shows how to use the [`RewardTrainer`] to train a Outcome Reward Model (ORM) on your own dataset. |
67
+
|[`examples/scripts/reward_modeling.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/reward_modeling.py)| This script shows how to use the [`RewardTrainer`] to train an Outcome Reward Model (ORM) on your own dataset. |
56
68
|[`examples/scripts/rloo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/rloo.py)| This script shows how to use the [`RLOOTrainer`] to fine-tune a model to improve its ability to solve math questions. |
57
69
|[`examples/scripts/sft.py`](https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py)| This script shows how to use the [`SFTTrainer`] to fine-tune a model. |
58
70
|[`examples/scripts/sft_gemma3.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_gemma3.py)| This script shows how to use the [`SFTTrainer`] to fine-tune a Gemma 3 model. |
59
71
|[`examples/scripts/sft_video_llm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_video_llm.py)| This script shows how to use the [`SFTTrainer`] to fine-tune a Video Language Model. |
60
-
|[`examples/scripts/sft_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm.py)| This script shows how to use the [`SFTTrainer`] to fine-tune a Vision Language Model in a chat setting. The script has only been tested with [LLaVA 1.5](https://huggingface.co/llava-hf/llava-1.5-7b-hf), [LLaVA 1.6](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf), and [Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) models so users may see unexpected behaviour in other model architectures. |
72
+
|[`examples/scripts/sft_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm.py)| This script shows how to use the [`SFTTrainer`] to fine-tune a Vision Language Model in a chat setting. The script has only been tested with [LLaVA 1.5](https://huggingface.co/llava-hf/llava-1.5-7b-hf), [LLaVA 1.6](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf), and [Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) models, so users may see unexpected behaviour in other model architectures. |
61
73
|[`examples/scripts/sft_vlm_gemma3.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm_gemma3.py)| This script shows how to use the [`SFTTrainer`] to fine-tune a Gemma 3 model on vision to text tasks. |
62
74
|[`examples/scripts/sft_vlm_smol_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm_smol_vlm.py)| This script shows how to use the [`SFTTrainer`] to fine-tune a SmolVLM model. |
63
75
|[`examples/scripts/xpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/xpo.py)| This script shows how to use the [`XPOTrainer`] to fine-tune a model. |
64
76
65
-
Here are also some easier-to-run colab notebooks that you can use to get started with TRL:
66
-
67
-
| File | Description |
68
-
| --- | --- |
69
-
|[`examples/notebooks/best_of_n.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/best_of_n.ipynb)| This notebook demonstrates how to use the "Best of N" sampling strategy using TRL when fine-tuning your model with PPO. |
70
-
|[`examples/notebooks/gpt2-sentiment.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/gpt2-sentiment.ipynb)| This notebook demonstrates how to reproduce the GPT2 imdb sentiment tuning example on a jupyter notebook. |
71
-
|[`examples/notebooks/gpt2-control.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/gpt2-control.ipynb)| This notebook demonstrates how to reproduce the GPT2 sentiment control example on a jupyter notebook. |
72
-
73
-
## Distributed training
77
+
## Distributed Training (for scripts)
74
78
75
-
All the scripts can be run on multiple GPUs by providing the path of an 🤗 Accelerate config file when calling `accelerate launch`. To launch one of them on one or multiple GPUs, run the following command (swapping `{NUM_GPUS}`with the number of GPUs in your machine and `--all_arguments_of_the_script` with your arguments).
79
+
You can run scripts on multiple GPUswith 🤗 Accelerate:
You can also adjust the parameters of the 🤗 Accelerate config file to suit your needs (e.g. training in mixed precision).
82
-
83
-
### Distributed training with DeepSpeed
84
-
85
-
Most of the scripts can be run on multiple GPUs together with DeepSpeed ZeRO-{1,2,3} for efficient sharding of the optimizer states, gradients, and model weights. To do so, run the following command (swapping `{NUM_GPUS}` with the number of GPUs in your machine, `--all_arguments_of_the_script` with your arguments, and `--deepspeed_config` with the path to the DeepSpeed config file such as `examples/deepspeed_configs/deepspeed_zero1.yaml`):
0 commit comments