-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Background
I am training with verl, utilizing a GRM as the reward model. To achieve pipeline parallelism between actor rollouts and reward scoring, I've modified the PPOTrainer to asynchronously query the GRM immediately after a sample's rollout is complete. I discovered a bug in the process of implementing this.
Bug Description
The utility function union_numpy_dict in verl/protocol.py fails when processing dictionaries that contain NumPy arrays with 3 or more dimensions. The function internally uses pandas.DataFrame for equality comparison, but pd.DataFrame() only accepts 1-D or 2-D inputs, leading to a ValueError.
Steps to Reproduce
The following minimal script will reliably reproduce the error:
import numpy as np
from verl.protocol import union_numpy_dict
# 1. Create a 3D NumPy array
arr_3d = np.arange(8).reshape((2, 2, 2))
# 2. Call union_numpy_dict with dictionaries containing this array
# This will raise a ValueError
union_numpy_dict(
{"a": arr_3d},
{"a": arr_3d}
)Expected Behavior
The function should execute successfully without raising an exception, correctly identifying that the two 3D arrays are equal and returning the merged dictionary.