You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Reinforcement Learning |[`GRPOTrainer`]| Efficient Online Training with GRPO and vLLM in TRL |[Sergio Paniego](https://huggingface.co/sergiopaniego)|[Link](https://huggingface.co/learn/cookbook/grpo_vllm_online_training)|[](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/grpo_vllm_online_training.ipynb)|
11
12
| Reinforcement Learning |[`GRPOTrainer`]| Post training an LLM for reasoning with GRPO in TRL |[Sergio Paniego](https://huggingface.co/sergiopaniego)|[Link](https://huggingface.co/learn/cookbook/fine_tuning_llm_grpo_trl)|[](https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/fine_tuning_llm_grpo_trl.ipynb)|
12
13
| Reinforcement Learning |[`GRPOTrainer`]| Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial |[Philipp Schmid](https://huggingface.co/philschmid)|[Link](https://www.philschmid.de/mini-deepseek-r1)|[](https://colab.research.google.com/github/philschmid/deep-learning-pytorch-huggingface/blob/main/training/mini-deepseek-r1-aha-grpo.ipynb)|
13
14
| Reinforcement Learning |[`GRPOTrainer`]| RL on LLaMA 3.1-8B with GRPO and Unsloth optimizations |[Andrea Manzoni](https://huggingface.co/AManzoni)|[Link](https://colab.research.google.com/github/amanzoni1/fine_tuning/blob/main/RL_LLama3_1_8B_GRPO.ipynb)|[](https://colab.research.google.com/github/amanzoni1/fine_tuning/blob/main/RL_LLama3_1_8B_GRPO.ipynb)|
0 commit comments