Implement Dual-Clip PPO Algorithm #784

none0663 · 2025-03-27T03:54:54Z

Add the Dual-Clip PPO algorithm to enhance the current PPO implementations. The Dual-Clip PPO introduces a approach by applying a lower bound to the policy ratio when the advantage is less than zero, when multiplied by a huge raito, does not exceed a specified lower bound. The concept is illustrated in the figure below:

So, the finall loss of the ppo is

This adjustment leads to a modified final loss calculation for the PPO, which could potentially improve training stability and performance in certain scenarios. I believe integrating this feature could provide significant benefits, and I look forward to feedback on this suggestion.

CLAassistant · 2025-03-27T03:55:00Z

All committers have signed the CLA.

vermouth1992 · 2025-03-27T05:22:30Z

verl/trainer/config/ppo_trainer.yaml

    grad_clip: 1.0
    clip_ratio: 0.2
+    use_dual_clip: False # add Dual-clip PPO from https://arxiv.org/pdf/1912.09729
+    clip_ratio_c: 3i # lower bound of the value for Dual-clip PPO from https://arxiv.org/pdf/1912.09729


fix type error,73c3349

vermouth1992 · 2025-03-27T23:31:57Z

I guess the lower clip should be added by default, could you remove the use option? Thanks

none0663 · 2025-03-28T07:05:56Z

I guess the lower clip should be added by default, could you remove the use option? Thanks

fix in this commit

none0663 · 2025-03-28T14:48:57Z

Fix the Dual-Clip Implement bug after refer to the code from PPO dual clip and PPOxFamily, only apply the loss when advantages < 0

vermouth1992 · 2025-03-30T02:29:30Z

Could you please fix format?

none0663 · 2025-03-30T03:13:09Z

Could you please fix format?

I have already run the sh scripts/format.sh script to fix format before pushing the code in this commit.(with yapf version 0.43.0)
Is there any other formatting needed to fix?

vermouth1992 · 2025-03-30T13:23:09Z

Could you please fix format?

I have already run the sh scripts/format.sh script to fix format before pushing the code in this commit.(with yapf version 0.43.0) Is there any other formatting needed to fix?

Could you help rebase main? In the main branch, the format issue was fixed

none0663 · 2025-03-31T03:59:37Z

Could you please fix format?

I have already run the sh scripts/format.sh script to fix format before pushing the code in this commit.(with yapf version 0.43.0) Is there any other formatting needed to fix?

Could you help rebase main? In the main branch, the format issue was fixed

Done, please rerun the CI test.

none0663 · 2025-04-01T15:08:30Z

How to pass the others checks？

Add the [Dual-Clip PPO](https://arxiv.org/pdf/1912.09729) algorithm to enhance the current PPO implementations. The Dual-Clip PPO introduces a approach by applying a lower bound to the policy ratio when the advantage is less than zero, when multiplied by a huge raito, does not exceed a specified lower bound. The concept is illustrated in the figure below: <img width="626" alt="Clipboard_Screenshot_1743047374" src="https://github.com/user-attachments/assets/93952edc-30c8-477e-bc3d-4770fabe55b8" /> So, the finall loss of the ppo is <img width="624" alt="Clipboard_Screenshot_1743047410" src="https://github.com/user-attachments/assets/5900490b-f64a-4bde-87d6-8359615b3337" /> This adjustment leads to a modified final loss calculation for the PPO, which could potentially improve training stability and performance in certain scenarios. I believe integrating this feature could provide significant benefits, and I look forward to feedback on this suggestion.

vermouth1992 reviewed Mar 27, 2025

View reviewed changes

none0663 mentioned this pull request Mar 27, 2025

[BUG] pg_loss only clipping negative value #783

Closed

vermouth1992 previously approved these changes Mar 28, 2025

View reviewed changes

none0663 dismissed vermouth1992’s stale review via 66072af March 28, 2025 14:44

none0663 added 5 commits March 31, 2025 10:49

dual_clip_ppo for lower bound clip

f371901

fix yaml clip_ratio_c type error

18fc31c

remove the use option and add pg_clipfrac_lower metric

c5bc815

fix dual bug, only apply when advantages < 0

0d918c8

fix yaml format

7fa176d

none0663 force-pushed the add_dual_clip_ppo branch from 203ef0a to 7fa176d Compare March 31, 2025 02:57

vermouth1992 approved these changes Apr 2, 2025

View reviewed changes

vermouth1992 merged commit 6272b8c into volcengine:main Apr 2, 2025
22 of 29 checks passed

hiyouga mentioned this pull request Apr 2, 2025

[misc] algo improvement hiyouga/EasyR1#184

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Dual-Clip PPO Algorithm #784

Implement Dual-Clip PPO Algorithm #784

Uh oh!

none0663 commented Mar 27, 2025

Uh oh!

CLAassistant commented Mar 27, 2025 •

edited

Loading

Uh oh!

vermouth1992 Mar 27, 2025

Uh oh!

none0663 Mar 27, 2025

Uh oh!

vermouth1992 commented Mar 27, 2025

Uh oh!

none0663 commented Mar 28, 2025 •

edited

Loading

Uh oh!

none0663 commented Mar 28, 2025 •

edited

Loading

Uh oh!

vermouth1992 commented Mar 30, 2025

Uh oh!

none0663 commented Mar 30, 2025 •

edited

Loading

Uh oh!

vermouth1992 commented Mar 30, 2025

Uh oh!

none0663 commented Mar 31, 2025

Uh oh!

none0663 commented Apr 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement Dual-Clip PPO Algorithm #784

Implement Dual-Clip PPO Algorithm #784

Uh oh!

Conversation

none0663 commented Mar 27, 2025

Uh oh!

CLAassistant commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vermouth1992 Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

none0663 Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

vermouth1992 commented Mar 27, 2025

Uh oh!

none0663 commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

none0663 commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vermouth1992 commented Mar 30, 2025

Uh oh!

none0663 commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vermouth1992 commented Mar 30, 2025

Uh oh!

none0663 commented Mar 31, 2025

Uh oh!

none0663 commented Apr 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Mar 27, 2025 •

edited

Loading

none0663 commented Mar 28, 2025 •

edited

Loading

none0663 commented Mar 28, 2025 •

edited

Loading

none0663 commented Mar 30, 2025 •

edited

Loading