-
Notifications
You must be signed in to change notification settings - Fork 5
Ppo patch #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ppo patch #39
Conversation
|
by the way @paradite would you be interested in maintaining this package moving forward? I don't think I'll have any time soon to maintain this. If so we can chat over discord/dm to figure this out. |
|
@StoneT2000 : Are you ever going to accept this PR and publish a new package version on npm? I also noticed that PPO doesn't work if the ObservationSpace is a Dict. |
|
Sorry I haven't been able to look at this in forever, @paradite what is the best way perhaps to transfer ownership on npm? |
Hi @StoneT2000. If you trust me enough, you can transfer the npm package to me: https://www.npmjs.com/~paradite I do have 2FA on npm so it should be fine. I'll do some sanity test and publish a new version. |
|
I gave you maintainer access @paradite. I think it may also be better to transfer this repository over to you. It seems you have a repo called rl-ts already, you may need to remove it before i can transfer |
ok so i used to have a repo at https://github.com/paradite/rl-ts, but i have since renamed and transferred it to https://github.com/ai-simulator/rl-ts-ppo-patch (made it private repo now since it has lots of extra stuff). now i only have https://github.com/paradite/rl-ts-public which is where this PR is coming from. looks like GitHub is honoring the old repo name and not allowing transfer, in that case, maybe the next best alternative is for you to either transfer this repo into a public org like |
|
I've invited you as a maintainer. Feel free to make your own org if you want as well! |
|
Merging first and running tests to see if anything else needs to be changed before publishing a new npm package. |
|
Two issues after merging:
|
|
There are actually no deploy keys atm actually, i think you should be able to add them yourself |

Fixes #38
This PR introduces a number of features and bug fixes to make
rl-tsPPO algorithm better and closer to the sb3 implementation:Categoricaldistribution andMLPCategoricalActor, and use it for environment with discrete action spacebatch_sizeforn_epochsin each iteration, instead of fixed number of iterations per epochOverall, the PPO implementation is more stable (lower KL and not hitting max KL limit) and performs better (able to reach optimal reward for cartpole within 50k steps).
The new performance compared against the old implementation: