In rllm/rllm/trainer/verl/agent_ppo_trainer.py, the code blocks at L304-307 and L309-319 both compute old_log_prob via self.actor_rollout_wg.compute_log_prob(batch) and union the result into batch:
# L304-307
with marked_timer("old_log_prob", timing_raw):
old_log_prob = self.actor_rollout_wg.compute_log_prob(batch)
batch = batch.union(old_log_prob) # First union
# L309-319
with marked_timer("old_log_prob", timing_raw, color="blue"):
old_log_prob = self.actor_rollout_wg.compute_log_prob(batch) # Recomputed
# ... (processing)
batch = batch.union(old_log_prob) # Second union
This repetition triggers an error in union_tensor_dict:
AssertionError: old_log_probs in tensor_dict1 and tensor_dict2 are not the same object
The error occurs because the two old_log_prob computations (from L304-307 and L309-319) produce distinct tensor objects. When the second union is called, batch already contains old_log_probs from the first union, and the new old_log_probs (from the second computation) fail the equal() check in union_tensor_dict.
Is this repeated computation an unintended leftover (e.g., from incomplete refactoring/debugging)? If not, what is the intended logic here, and how to resolve the assertion error caused by the duplicate?
Thanks!