Question: Reason for repeated computation of old_log_probs in L304-307 and L309-319 of agent_ppo_trainer.py


In `rllm/rllm/trainer/verl/agent_ppo_trainer.py`, the code blocks at L304-307 and L309-319 both compute `old_log_prob` via `self.actor_rollout_wg.compute_log_prob(batch)` and union the result into `batch`:  

```python
# L304-307
with marked_timer("old_log_prob", timing_raw):
    old_log_prob = self.actor_rollout_wg.compute_log_prob(batch)
    batch = batch.union(old_log_prob)  # First union

# L309-319
with marked_timer("old_log_prob", timing_raw, color="blue"):
    old_log_prob = self.actor_rollout_wg.compute_log_prob(batch)  # Recomputed
    # ... (processing)
    batch = batch.union(old_log_prob)  # Second union
```  


This repetition triggers an error in `union_tensor_dict`:  
`AssertionError: old_log_probs in tensor_dict1 and tensor_dict2 are not the same object`  

The error occurs because the two `old_log_prob` computations (from L304-307 and L309-319) produce distinct tensor objects. When the second `union` is called, `batch` already contains `old_log_probs` from the first union, and the new `old_log_probs` (from the second computation) fail the `equal()` check in `union_tensor_dict`.  


Is this repeated computation an unintended leftover (e.g., from incomplete refactoring/debugging)? If not, what is the intended logic here, and how to resolve the assertion error caused by the duplicate?  

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Reason for repeated computation of old_log_probs in L304-307 and L309-319 of agent_ppo_trainer.py #303

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: Reason for repeated computation of old_log_probs in L304-307 and L309-319 of agent_ppo_trainer.py #303

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions