We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 5382de8 commit 63ac70cCopy full SHA for 63ac70c
README.md
@@ -83,6 +83,12 @@ Please refer to the example datasets to prepare your own dataset.
83
- Text dataset: https://huggingface.co/datasets/hiyouga/math12k
84
- Vision-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
85
86
+## How to Understand GRPO in EasyR1
87
+
88
+
89
90
+- To learn about the GRPO algorithm, you can refer to [Hugging Face's blog](https://huggingface.co/learn/cookbook/fine_tuning_llm_grpo_trl).
91
92
## Other Baselines
93
94
- [CLEVR-70k-Counting](examples/run_qwen2_5_vl_3b_clevr.sh): Train the Qwen2.5-VL-3B-Instruct model on counting problem.
assets/easyr1_grpo.png
743 KB
0 commit comments