Skip to content

Commit 1b49e41

Browse files
aslanidescopybara-github
authored andcommitted
Take mean, rather than sum, of the q-learning loss over batch in DQN baseline.
While this probably makes little to no difference to the optimization, it does allow easier comparison of losses for different agents by making the loss invariant to the batch size. Resolves #23. PiperOrigin-RevId: 309683380 Change-Id: Id5fbefbd10af4e9ee58ab8add887fd8e8c50c033
1 parent 8e118a7 commit 1b49e41

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

bsuite/baselines/tf/dqn/agent.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ def _training_step(self, transitions: Sequence[tf.Tensor]) -> tf.Tensor:
126126
# One-step Q-learning loss.
127127
target = r_t + d_t * self._discount * qa_t
128128
td_error = qa_tm1 - target
129-
loss = 0.5 * tf.reduce_sum(td_error**2) # []
129+
loss = 0.5 * tf.reduce_mean(td_error**2) # []
130130

131131
# Update the online network via SGD.
132132
variables = self._online_network.trainable_variables

0 commit comments

Comments
 (0)