Add custom arch for off-policy actor/critic networks #182

araffin · 2020-10-11T14:54:07Z

Description

Feature described in #113, the main difference with the on-policy equivalent is that we don't allow shared layers (only in the feature extractor) and therefore, net_arch can be a dict.

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)

closes #113

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

docs/guide/custom_policy.rst

stable_baselines3/common/torch_layers.py

docs/guide/custom_policy.rst

Miffyli · 2020-10-11T20:30:26Z

Left some small comments. This might be bit of work, but as the issue lies in the correct updating of parameters (a crucial thing for these algorithms), it could make sense to add a test to check that these updates happen correctly. Such code would also be helpful in the future and also for any other off-policy algorithms using similar ideas ^^.

araffin · 2020-10-11T20:58:10Z

it could make sense to add a test to check that these updates happen correctly.

What do you mean exactly?

Miffyli · 2020-10-11T21:10:33Z

What do you mean exactly?

That all main network parameters are updated to what is in target network (check it both ways: that all parameters of main network were updated, and that there was no extra parameters in target network).

Edit: I meant the other way around (main network parameters to target), but same idea applies.

araffin · 2020-10-11T21:14:06Z

That all main network parameters are updated to what is in target network (check it both ways: that all parameters of main network were updated, and that there was no extra parameters in target network).

I think this should go in #135
regarding that issue, we also need to check that the main network was untouched. I'm not sure how easy it is to test that.

Miffyli · 2020-10-11T21:24:03Z

Hmm ok, I am just mainly worried that the parameter updates do not match and this causes some issues, hence tests would be nice. I think just checking that state_dicts make sense before and after some training could do the trick.

Miffyli · 2020-10-12T19:51:42Z

I went through things and realized polyak-update would raise an exception if parameters were not updated (and updating of parameters is already tested). I added one missing piece of the puzzle which should now confirm that updating of parameters checks out. One final thing would be to make sure model.parameters() includes all parameters correctly, but that is unlikely to fail and can be added later. Other than that LGTM!

araffin · 2020-10-12T19:55:21Z

I went through things and realized polyak-update would raise an exception if parameters were not updated (and updating of parameters is already tested).

for param, target_param in zip(params, target_params): won't raise any error if the length is different?
interesting...

includes all parameters correctly, but that is unlikely to fail and can be added later.

yes, I would rather do that in #135 if we do it

* Add custom arch for off-policy actor/critic networks * Fix type hints * Address comments * Make sure number of updated parameters match in polyak * Add zip_strict for strict-length zipping * Fix building docs * Add test for zip strict * Faster tests Co-authored-by: Anssi "Miffyli" Kanervisto <[email protected]>

araffin added 2 commits October 11, 2020 16:52

Add custom arch for off-policy actor/critic networks

a09d77d

Fix type hints

ead484e

araffin requested review from AdamGleave, Miffyli, ernestum and hill-a October 11, 2020 15:02

Miffyli reviewed Oct 11, 2020

View reviewed changes

docs/guide/custom_policy.rst Outdated Show resolved Hide resolved

Miffyli reviewed Oct 11, 2020

View reviewed changes

stable_baselines3/common/torch_layers.py Outdated Show resolved Hide resolved

Miffyli reviewed Oct 11, 2020

View reviewed changes

docs/guide/custom_policy.rst Outdated Show resolved Hide resolved

Address comments

c05f763

Merge branch 'master' into feat/custom-offpolicy-arch

fb443b5

Make sure number of updated parameters match in polyak

18589ab

Miffyli and others added 5 commits October 12, 2020 23:14

Add zip_strict for strict-length zipping

4dbe14c

Fix building docs

44c8f5b

Merge branch 'master' into feat/custom-offpolicy-arch

33a1d85

Add test for zip strict

9ec406f

Faster tests

47a6ce3

araffin merged commit 2599f04 into master Oct 13, 2020

araffin deleted the feat/custom-offpolicy-arch branch October 13, 2020 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add custom arch for off-policy actor/critic networks #182

Add custom arch for off-policy actor/critic networks #182

Uh oh!

araffin commented Oct 11, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Miffyli commented Oct 11, 2020 •

edited

Loading

Uh oh!

araffin commented Oct 11, 2020

Uh oh!

Miffyli commented Oct 11, 2020 •

edited

Loading

Uh oh!

araffin commented Oct 11, 2020

Uh oh!

Miffyli commented Oct 11, 2020

Uh oh!

Miffyli commented Oct 12, 2020

Uh oh!

araffin commented Oct 12, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add custom arch for off-policy actor/critic networks #182

Add custom arch for off-policy actor/critic networks #182

Uh oh!

Conversation

araffin commented Oct 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Types of changes

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Miffyli commented Oct 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

araffin commented Oct 11, 2020

Uh oh!

Miffyli commented Oct 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

araffin commented Oct 11, 2020

Uh oh!

Miffyli commented Oct 11, 2020

Uh oh!

Miffyli commented Oct 12, 2020

Uh oh!

araffin commented Oct 12, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

araffin commented Oct 11, 2020 •

edited

Loading

Miffyli commented Oct 11, 2020 •

edited

Loading

Miffyli commented Oct 11, 2020 •

edited

Loading