Skip to content

Double constraints for updating actor? #109

@KhoiDOO

Description

@KhoiDOO

Hi ShangtongZhang,

I'm confused about the adaptive KL divergence that you used in your code in order to update the actor model (two separate actor and critic model case). In your code, you use both object clip and the adaptive approx-kl, and if the $\text{approx-kl} \le 1.5 \times \text{target-kl}$, the actor model is updated. After reading the PPO, I saw that adaptive KL should belong to TRPO instead cause TRPO has a constraint at Equation 4. Since along with the two constraints including both clip and adaptive KL actor finds it very hard to be updated.

To my viewpoint, I think you are using CLIP and TRPO $L^{KLPEN}$ at the same time, and the $L^{KLPEN}$ should be constructed as

surr = ratio * advantage

if klaffter <= 0.66 * target_kl:
   kl_coef /= 2
elif klafter > 1.5 * target_kl:
   kl_coef *= 2
else:
   print("KL is close enough")

actor_loss = surr - kl_coef * klafter

# Backwarding the actor loss ...

image

After calculating the KL coefficient $\beta$, it's used for calculating the loss and the gradient in Equation 8

image

And, only $L^{KLPEN}$ or $L^{CLIP}$ is used in training

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions