Skip to content

Conversation

@araffin
Copy link
Member

@araffin araffin commented Oct 11, 2020

Description

Feature described in #113, the main difference with the on-policy equivalent is that we don't allow shared layers (only in the feature extractor) and therefore, net_arch can be a dict.

Motivation and Context

  • I have raised an issue to propose this change (required for new features and bug fixes)

closes #113

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

@Miffyli
Copy link
Collaborator

Miffyli commented Oct 11, 2020

Left some small comments. This might be bit of work, but as the issue lies in the correct updating of parameters (a crucial thing for these algorithms), it could make sense to add a test to check that these updates happen correctly. Such code would also be helpful in the future and also for any other off-policy algorithms using similar ideas ^^.

@araffin
Copy link
Member Author

araffin commented Oct 11, 2020

it could make sense to add a test to check that these updates happen correctly.

What do you mean exactly?

@Miffyli
Copy link
Collaborator

Miffyli commented Oct 11, 2020

What do you mean exactly?

That all main network parameters are updated to what is in target network (check it both ways: that all parameters of main network were updated, and that there was no extra parameters in target network).

Edit: I meant the other way around (main network parameters to target), but same idea applies.

@araffin
Copy link
Member Author

araffin commented Oct 11, 2020

That all main network parameters are updated to what is in target network (check it both ways: that all parameters of main network were updated, and that there was no extra parameters in target network).

I think this should go in #135
regarding that issue, we also need to check that the main network was untouched. I'm not sure how easy it is to test that.

@Miffyli
Copy link
Collaborator

Miffyli commented Oct 11, 2020

Hmm ok, I am just mainly worried that the parameter updates do not match and this causes some issues, hence tests would be nice. I think just checking that state_dicts make sense before and after some training could do the trick.

@Miffyli
Copy link
Collaborator

Miffyli commented Oct 12, 2020

I went through things and realized polyak-update would raise an exception if parameters were not updated (and updating of parameters is already tested). I added one missing piece of the puzzle which should now confirm that updating of parameters checks out. One final thing would be to make sure model.parameters() includes all parameters correctly, but that is unlikely to fail and can be added later. Other than that LGTM!

@araffin
Copy link
Member Author

araffin commented Oct 12, 2020

I went through things and realized polyak-update would raise an exception if parameters were not updated (and updating of parameters is already tested).

for param, target_param in zip(params, target_params): won't raise any error if the length is different?
interesting...

includes all parameters correctly, but that is unlikely to fail and can be added later.

yes, I would rather do that in #135 if we do it

@araffin araffin merged commit 2599f04 into master Oct 13, 2020
@araffin araffin deleted the feat/custom-offpolicy-arch branch October 13, 2020 10:01
leor-c pushed a commit to leor-c/stable-baselines3 that referenced this pull request Nov 12, 2020
* Add custom arch for off-policy actor/critic networks

* Fix type hints

* Address comments

* Make sure number of updated parameters match in polyak

* Add zip_strict for strict-length zipping

* Fix building docs

* Add test for zip strict

* Faster tests

Co-authored-by: Anssi "Miffyli" Kanervisto <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature request] Allow different network architectures for off-policy actor/critic

3 participants