Skip to content

Conversation

@paradite
Copy link
Collaborator

Fixes #38

This PR introduces a number of features and bug fixes to make rl-ts PPO algorithm better and closer to the sb3 implementation:

  • Add Categorical distribution and MLPCategoricalActor, and use it for environment with discrete action space
  • Mini-batch updates of batch_size for n_epochs in each iteration, instead of fixed number of iterations per epoch
  • Fix vf loss due to env terminated for reaching max episode step by bootstrapping reward for last step
  • Update advantage formula according to sb3 updated implementation
  • Combined optimizer and loss computation for pi and vf

Overall, the PPO implementation is more stable (lower KL and not hitting max KL limit) and performs better (able to reach optimal reward for cartpole within 50k steps).

The new performance compared against the old implementation:

Screenshot 2023-03-20 at 4 54 51 PM

@StoneT2000
Copy link
Owner

by the way @paradite would you be interested in maintaining this package moving forward? I don't think I'll have any time soon to maintain this. If so we can chat over discord/dm to figure this out.

@bmarotta
Copy link

bmarotta commented Apr 7, 2024

@StoneT2000 : Are you ever going to accept this PR and publish a new package version on npm? I also noticed that PPO doesn't work if the ObservationSpace is a Dict.

@StoneT2000
Copy link
Owner

Sorry I haven't been able to look at this in forever, @paradite what is the best way perhaps to transfer ownership on npm?

@paradite
Copy link
Collaborator Author

Sorry I haven't been able to look at this in forever, @paradite what is the best way perhaps to transfer ownership on npm?

Hi @StoneT2000. If you trust me enough, you can transfer the npm package to me:

https://www.npmjs.com/~paradite

I do have 2FA on npm so it should be fine. I'll do some sanity test and publish a new version.

@StoneT2000
Copy link
Owner

I gave you maintainer access @paradite. I think it may also be better to transfer this repository over to you. It seems you have a repo called rl-ts already, you may need to remove it before i can transfer

@paradite
Copy link
Collaborator Author

I gave you maintainer access @paradite. I think it may also be better to transfer this repository over to you. It seems you have a repo called rl-ts already, you may need to remove it before i can transfer

ok so i used to have a repo at https://github.com/paradite/rl-ts, but i have since renamed and transferred it to https://github.com/ai-simulator/rl-ts-ppo-patch (made it private repo now since it has lots of extra stuff).

now i only have https://github.com/paradite/rl-ts-public which is where this PR is coming from.

looks like GitHub is honoring the old repo name and not allowing transfer, in that case, maybe the next best alternative is for you to either transfer this repo into a public org like rl-ts and add me as a maintainer, or just add me to this personal repo of yours as a maintainer.

@StoneT2000
Copy link
Owner

I've invited you as a maintainer. Feel free to make your own org if you want as well!

@paradite paradite merged commit 0f3e1d3 into StoneT2000:main Apr 25, 2024
@paradite
Copy link
Collaborator Author

Merging first and running tests to see if anything else needs to be changed before publishing a new npm package.

@paradite
Copy link
Collaborator Author

Two issues after merging:

  • The deploy key for this repo is no long working, resulting in failed circleCI runs. (Might need help from @StoneT2000 to resolve)
  • The test is failing for VPG after merging in PPO changes (I've changed the implementation in actor-critic so VPG is likely affected). I'll look into how to fix it. It's been a while since I worked on it so I need to spend some time digging.
Screenshot 2024-04-25 at 4 50 10 PM

@StoneT2000
Copy link
Owner

There are actually no deploy keys atm actually, i think you should be able to add them yourself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update PPO for better performance and closer alignment to sb3

3 participants