Skip to content

Commit 97830a3

Browse files
Replace deprecated list with tuple indexing in PPOTrainer (#4356)
1 parent d275418 commit 97830a3

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

trl/trainer/ppo_trainer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -563,7 +563,7 @@ def repeat_generator():
563563
rewards = non_score_reward.clone()
564564
actual_start = torch.arange(rewards.size(0), device=rewards.device)
565565
actual_end = torch.where(sequence_lengths_p1 < rewards.size(1), sequence_lengths_p1, sequence_lengths)
566-
rewards[[actual_start, actual_end]] += scores
566+
rewards[actual_start, actual_end] += scores
567567

568568
# 5. whiten rewards
569569
if args.whiten_rewards:

0 commit comments

Comments
 (0)