Ppo patch #39

paradite · 2023-03-20T09:18:29Z

Fixes #38

This PR introduces a number of features and bug fixes to make rl-ts PPO algorithm better and closer to the sb3 implementation:

Add Categorical distribution and MLPCategoricalActor, and use it for environment with discrete action space
Mini-batch updates of batch_size for n_epochs in each iteration, instead of fixed number of iterations per epoch
Fix vf loss due to env terminated for reaching max episode step by bootstrapping reward for last step
Update advantage formula according to sb3 updated implementation
Combined optimizer and loss computation for pi and vf

Overall, the PPO implementation is more stable (lower KL and not hitting max KL limit) and performs better (able to reach optimal reward for cartpole within 50k steps).

The new performance compared against the old implementation:

StoneT2000 · 2023-03-26T11:05:09Z

by the way @paradite would you be interested in maintaining this package moving forward? I don't think I'll have any time soon to maintain this. If so we can chat over discord/dm to figure this out.

bmarotta · 2024-04-07T18:34:39Z

@StoneT2000 : Are you ever going to accept this PR and publish a new package version on npm? I also noticed that PPO doesn't work if the ObservationSpace is a Dict.

StoneT2000 · 2024-04-15T01:52:35Z

Sorry I haven't been able to look at this in forever, @paradite what is the best way perhaps to transfer ownership on npm?

paradite · 2024-04-15T11:34:22Z

Sorry I haven't been able to look at this in forever, @paradite what is the best way perhaps to transfer ownership on npm?

Hi @StoneT2000. If you trust me enough, you can transfer the npm package to me:

https://www.npmjs.com/~paradite

I do have 2FA on npm so it should be fine. I'll do some sanity test and publish a new version.

StoneT2000 · 2024-04-15T19:24:21Z

I gave you maintainer access @paradite. I think it may also be better to transfer this repository over to you. It seems you have a repo called rl-ts already, you may need to remove it before i can transfer

paradite · 2024-04-18T10:22:19Z

I gave you maintainer access @paradite. I think it may also be better to transfer this repository over to you. It seems you have a repo called rl-ts already, you may need to remove it before i can transfer

ok so i used to have a repo at https://github.com/paradite/rl-ts, but i have since renamed and transferred it to https://github.com/ai-simulator/rl-ts-ppo-patch (made it private repo now since it has lots of extra stuff).

now i only have https://github.com/paradite/rl-ts-public which is where this PR is coming from.

looks like GitHub is honoring the old repo name and not allowing transfer, in that case, maybe the next best alternative is for you to either transfer this repo into a public org like rl-ts and add me as a maintainer, or just add me to this personal repo of yours as a maintainer.

StoneT2000 · 2024-04-18T20:01:43Z

I've invited you as a maintainer. Feel free to make your own org if you want as well!

paradite · 2024-04-25T08:37:45Z

Merging first and running tests to see if anything else needs to be changed before publishing a new npm package.

paradite · 2024-04-25T08:52:31Z

Two issues after merging:

The deploy key for this repo is no long working, resulting in failed circleCI runs. (Might need help from @StoneT2000 to resolve)
The test is failing for VPG after merging in PPO changes (I've changed the implementation in actor-critic so VPG is likely affected). I'll look into how to fix it. It's been a while since I worked on it so I need to spend some time digging.

StoneT2000 · 2024-04-29T23:34:23Z

There are actually no deploy keys atm actually, i think you should be able to add them yourself

paradite added 12 commits March 20, 2023 14:37

build: fix deps

4665bb8

style: formatting

e766745

perf: remove render for tfboard graph

4380b11

feat(distribution): add categorical distribution

7caaeb9

feat(ac): mLPCategoricalActor

7b2ece4

feat: update buffer and mini batch

16d551f

feat: update loss computation

d206550

fix: fix callback

4d22d70

feat: add loss

fcec032

fix: fix vf loss due to terminated episodes

69b59d6

feat: restore default renderer

cf6e307

fix: fix other example scripts

c9c56b9

paradite merged commit 0f3e1d3 into StoneT2000:main Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ppo patch #39

Ppo patch #39

Uh oh!

paradite commented Mar 20, 2023

Uh oh!

StoneT2000 commented Mar 26, 2023

Uh oh!

bmarotta commented Apr 7, 2024

Uh oh!

StoneT2000 commented Apr 15, 2024

Uh oh!

paradite commented Apr 15, 2024

Uh oh!

StoneT2000 commented Apr 15, 2024

Uh oh!

paradite commented Apr 18, 2024

Uh oh!

StoneT2000 commented Apr 18, 2024

Uh oh!

paradite commented Apr 25, 2024

Uh oh!

paradite commented Apr 25, 2024

Uh oh!

StoneT2000 commented Apr 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ppo patch #39

Ppo patch #39

Uh oh!

Conversation

paradite commented Mar 20, 2023

Uh oh!

StoneT2000 commented Mar 26, 2023

Uh oh!

bmarotta commented Apr 7, 2024

Uh oh!

StoneT2000 commented Apr 15, 2024

Uh oh!

paradite commented Apr 15, 2024

Uh oh!

StoneT2000 commented Apr 15, 2024

Uh oh!

paradite commented Apr 18, 2024

Uh oh!

StoneT2000 commented Apr 18, 2024

Uh oh!

paradite commented Apr 25, 2024

Uh oh!

paradite commented Apr 25, 2024

Uh oh!

StoneT2000 commented Apr 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants