Skip to content

[BUG] pg_loss only clipping negative value #783

@HuihuiChyan

Description

@HuihuiChyan

negative_approx_kl = log_prob - old_log_prob
ratio = torch.exp(negative_approx_kl)
ppo_kl = verl_F.masked_mean(-negative_approx_kl, eos_mask)
pg_losses = -advantages * ratio
pg_losses2 = -advantages * torch.clamp(ratio, 1.0 - cliprange, 1.0 + cliprange)
pg_loss = verl_F.masked_mean(torch.max(pg_losses, pg_losses2), eos_mask)
pg_clipfrac = verl_F.masked_mean(torch.gt(pg_losses2, pg_losses).float(), eos_mask)
return pg_loss, pg_clipfrac, ppo_kl

In your implementation, the clipping of pg_loss is applied when pg_loss is super small. However, pg_loss can be a positive value (as advantages can be negative), in that case, should the clipping be applied depending the sign of pg_loss (namely, when pg_loss is positive, clip by torch.max; when pg_loss is negative, clip by torch.min)?

在你们的代码实现中,pg_loss的截断操作的实现方法是 torch.max(pg_losses, pg_losses2)。但是,pg_loss可能是正值,因为advantage可能是负值,这种情况下,是否应该根据pg_loss的符号来决定是用torch.max还是torch.min来截断?

I noticed this because in my PPO experiment, pg_loss goes sky-high at certain steps:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions