-
Notifications
You must be signed in to change notification settings - Fork 697
Description
Hi ShangtongZhang,
I'm confused about the adaptive KL divergence that you used in your code in order to update the actor model (two separate actor and critic model case). In your code, you use both object clip and the adaptive approx-kl, and if the
To my viewpoint, I think you are using CLIP and TRPO
surr = ratio * advantage
if klaffter <= 0.66 * target_kl:
kl_coef /= 2
elif klafter > 1.5 * target_kl:
kl_coef *= 2
else:
print("KL is close enough")
actor_loss = surr - kl_coef * klafter
# Backwarding the actor loss ...After calculating the KL coefficient
And, only


