Commit 1b49e41
Take mean, rather than sum, of the q-learning loss over batch in DQN baseline.
While this probably makes little to no difference to the optimization, it does allow easier comparison of losses for different agents by making the loss invariant to the batch size.
Resolves #23.
PiperOrigin-RevId: 309683380
Change-Id: Id5fbefbd10af4e9ee58ab8add887fd8e8c50c0331 parent 8e118a7 commit 1b49e41
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
129 | | - | |
| 129 | + | |
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
| |||
0 commit comments