## Paper [Dr. GRPO Paper](https://github.com/sail-sg/understand-r1-zero) ## Motivation/Benefits - Fixes optimization bias while maintaining reasoning performance - Reduces average incorrect response length by 38% (Fig.5 in paper) - Backward-compatible with existing GRPO workflows
Paper
Dr. GRPO Paper
Motivation/Benefits