-
Notifications
You must be signed in to change notification settings - Fork 2k
Add custom arch for off-policy actor/critic networks #182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Left some small comments. This might be bit of work, but as the issue lies in the correct updating of parameters (a crucial thing for these algorithms), it could make sense to add a test to check that these updates happen correctly. Such code would also be helpful in the future and also for any other off-policy algorithms using similar ideas ^^. |
What do you mean exactly? |
That all main network parameters are updated to what is in target network (check it both ways: that all parameters of main network were updated, and that there was no extra parameters in target network). Edit: I meant the other way around (main network parameters to target), but same idea applies. |
I think this should go in #135 |
|
Hmm ok, I am just mainly worried that the parameter updates do not match and this causes some issues, hence tests would be nice. I think just checking that state_dicts make sense before and after some training could do the trick. |
|
I went through things and realized polyak-update would raise an exception if parameters were not updated (and updating of parameters is already tested). I added one missing piece of the puzzle which should now confirm that updating of parameters checks out. One final thing would be to make sure |
yes, I would rather do that in #135 if we do it |
* Add custom arch for off-policy actor/critic networks * Fix type hints * Address comments * Make sure number of updated parameters match in polyak * Add zip_strict for strict-length zipping * Fix building docs * Add test for zip strict * Faster tests Co-authored-by: Anssi "Miffyli" Kanervisto <[email protected]>
Description
Feature described in #113, the main difference with the on-policy equivalent is that we don't allow shared layers (only in the feature extractor) and therefore,
net_archcan be a dict.Motivation and Context
closes #113
Types of changes
Checklist:
make format(required)make check-codestyleandmake lint(required)make pytestandmake typeboth pass. (required)Note: we are using a maximum length of 127 characters per line