[training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn (#2741)

kibitzing · web-flow · commit 23aa10533f64 · 2025-07-30T13:07:47.000+08:00
### What does this PR do? This PR updates the `collate_fn` logic inside `verl.utils.dataset.rl_dataset` to consistently handle non-tensor fields as 1D object arrays, preventing runtime errors during concatenation in downstream code such as `recipe/dapo/dapo_ray_trainer.py`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1 * Note: This branch is for testing purposes only and is not intended for merge. * The data used for testing comes from the `train.parquet` and `test.parquet` files released by the [Tool N1 repository](https://github.com/NVlabs/Tool-N1). * part of training script ```python python3 -m recipe.dapo.main_dapo \ data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \ data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \ data.prompt_key=prompt \ data.truncation='left' \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.gen_batch_size=32 \ data.train_batch_size=24 \ actor_rollout_ref.rollout.n=5 \ algorithm.adv_estimator=grpo \ algorithm.filter_groups.enable=True \ algorithm.filter_groups.max_num_gen_batches=10 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \ ... ``` ### Before vs After Behavior (Real Output Logs) * Before: Inconsistent Shape ``` (TaskRunner pid=114826) Training from scratch (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=1. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=2. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=3. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,) ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s) ``` This caused shape inconsistency across steps, leading to downstream errors during concatenation. * After: Consistent (32,) Shape ``` (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=1. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=2. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=3. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=4. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=5. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=6. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) ``` With the updated logic, the shape is consistently (32,). * The issue was traced back to the `"conversations"` field in the Tool N1 dataset. This key contains a list of human–gpt messages. In most examples, it's a single-turn conversation (list with length 1), but in some cases, it's a multi-turn conversation (list with length > 1). ### Design & Code Changes The current `collate_fn` processes non-tensor values with: https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/verl/utils/dataset/rl_dataset.py#L62-L63 While this generally works, it leads to a subtle issue: If `val` is a list of lists and all inner lists happen to be of the same length, NumPy will interpret it as a 2D array with shape (N, L). However, in many RL scenarios, the structure of non-tensor data (e.g. variable-length lists across batches) is not guaranteed to be uniform, which means: - One batch may produce shape `(N, L)` - Another may produce `(N,)` where each element is a list of different lengths - Another may have shape `(N, L')` This causes downstream errors like: `ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)` Specifically, this occurs when multiple step-wise batches are concatenated with: https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/recipe/dapo/dapo_ray_trainer.py#L240 To enforce consistent 1D object arrays regardless of content, this PR replaces the original line with: ```python for key, val in non_tensors.items(): non_tensors[key] = np.empty(len(val), dtype=object) non_tensors[key][:] = val ``` This ensures that`non_tensors[key]` always has shape (N,) which makes concatenation in downstream logic safer. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
diff --git a/recipe/spin/fsdp_workers.py b/recipe/spin/fsdp_workers.py
@@ -18,6 +18,7 @@
 import os
 import warnings
 
+import numpy as np
 import psutil
 import torch
 import torch.distributed
@@ -483,11 +484,13 @@ def _switch_chat_template(self, data: DataProto):
         rm_attention_mask = []
 
         for i in range(data.batch.batch_size[0]):
+            if not isinstance(data.non_tensor_batch["raw_prompt"][i], list | np.ndarray):
+                raise TypeError(
+                    f"raw_prompt must be a list or numpy array, got {type(data.non_tensor_batch['raw_prompt'][i])}"
+                )
+
             # extract raw prompt
-            if isinstance(data.non_tensor_batch["raw_prompt"][i], list):
-                chat: list = data.non_tensor_batch["raw_prompt"][i]
-            else:
-                chat: list = data.non_tensor_batch["raw_prompt"][i].tolist()
+            chat: list = list(data.non_tensor_batch["raw_prompt"][i])
 
             # extract response
             response_ids = data.batch["responses"][i]
diff --git a/tests/utils/dataset/test_rl_collate_fn_on_cpu.py b/tests/utils/dataset/test_rl_collate_fn_on_cpu.py
@@ -0,0 +1,72 @@
+# Copyright 2025 Bytedance Ltd. and/or its affiliates
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import torch
+
+
+def test_rl_collate_fn():
+    from verl.utils.dataset.rl_dataset import collate_fn
+
+    max_prompt_length = 5
+
+    test_data = [
+        {
+            # test tensor
+            "input_ids": torch.randint(0, 10, (max_prompt_length,)),
+            # test fixed length (1) list within a batch
+            "messages": [{"role": "user", "content": "Hi."}],
+            # test variable length list within a batch
+            "raw_prompt_ids": [1, 2, 3, 4],
+            # test string
+            "ability": "math",
+            # test dict
+            "reward_model": {"ground_truth": 5, "style": "rule"},
+            # test empty dict
+            "tools_kwargs": {},
+        },
+        {
+            "input_ids": torch.randint(0, 10, (max_prompt_length,)),
+            "messages": [{"role": "user", "content": "Hello."}],
+            "raw_prompt_ids": [1, 2, 3],
+            "ability": "toolcall",
+            "reward_model": {
+                "ground_truth": '[{"name": "rgb_to_cmyk", "arguments": {"r": 0, "g": 0, "b": 255}}]',
+                "style": "rule",
+            },
+            "tools_kwargs": {},
+        },
+    ]
+
+    batch_size = len(test_data)
+    batch = collate_fn(test_data)
+
+    # Tensor part
+    assert batch["input_ids"].shape == (batch_size, max_prompt_length)
+    assert isinstance(batch["input_ids"], torch.Tensor)
+
+    # Non-tensor parts
+    expected_types = {
+        "messages": list,
+        "raw_prompt_ids": list,
+        "ability": str,
+        "reward_model": dict,
+        "tools_kwargs": dict,
+    }
+
+    for key, dtype in expected_types.items():
+        assert batch[key].shape == (batch_size,), (
+            f"Expected shape {(batch_size,)} for '{key}', but got {batch[key].shape}"
+        )
+        assert isinstance(batch[key][0], dtype), (
+            f"'{key}' should contain elements of type {dtype}, but got {type(batch[key][0])}"
+        )
diff --git a/verl/experimental/agent_loop/agent_loop.py b/verl/experimental/agent_loop/agent_loop.py
@@ -298,8 +298,11 @@ async def generate_sequences(self, batch: DataProto) -> DataProto:
         )
 
         for agent_name, messages, trajectory in zip(agent_names, raw_prompts, trajectory_info, strict=True):
+            if not isinstance(messages, list | np.ndarray):
+                raise TypeError(f"messages must be a list or numpy array, got {type(messages)}")
+
             tasks.append(
-                asyncio.create_task(self._run_agent_loop(agent_name, messages.tolist(), sampling_params, trajectory))
+                asyncio.create_task(self._run_agent_loop(agent_name, list(messages), sampling_params, trajectory))
             )
         outputs = await asyncio.gather(*tasks)
 
diff --git a/verl/utils/dataset/rl_dataset.py b/verl/utils/dataset/rl_dataset.py
@@ -60,7 +60,7 @@ def collate_fn(data_list: list[dict]) -> dict:
         tensors[key] = torch.stack(val, dim=0)
 
     for key, val in non_tensors.items():
-        non_tensors[key] = np.array(val, dtype=object)
+        non_tensors[key] = np.fromiter(val, dtype=object, count=len(val))
 
     return {**tensors, **non_tensors}
 
diff --git a/verl/workers/fsdp_workers.py b/verl/workers/fsdp_workers.py
@@ -22,6 +22,7 @@
 from dataclasses import asdict
 from typing import Any
 
+import numpy as np
 import psutil
 import torch
 import torch.distributed
@@ -1526,11 +1527,13 @@ def _switch_chat_template(self, data: DataProto):
         rm_attention_mask = []
 
         for i in range(data.batch.batch_size[0]):
+            if not isinstance(data.non_tensor_batch["raw_prompt"][i], list | np.ndarray):
+                raise TypeError(
+                    f"raw_prompt must be a list or numpy array, got {type(data.non_tensor_batch['raw_prompt'][i])}"
+                )
+
             # extract raw prompt
-            if isinstance(data.non_tensor_batch["raw_prompt"][i], list):
-                chat: list = data.non_tensor_batch["raw_prompt"][i]
-            else:
-                chat: list = data.non_tensor_batch["raw_prompt"][i].tolist()
+            chat: list = list(data.non_tensor_batch["raw_prompt"][i])
 
             # extract response
             response_ids = data.batch["responses"][i]
diff --git a/verl/workers/rollout/sglang_rollout/sglang_rollout.py b/verl/workers/rollout/sglang_rollout/sglang_rollout.py
@@ -660,15 +660,15 @@ def _batch_level_generate_sequences(self, prompts: DataProto, **kwargs) -> DataP
                 {"prompt_token_ids": raw_prompt_ids} for raw_prompt_ids in non_tensor_batch.pop("raw_prompt_ids")
             ]
 
-        # Ensure token IDs are lists or numpy arrays
         for input_data in sglang_inputs:
-            if isinstance(input_data["prompt_token_ids"], np.ndarray):
-                input_data["prompt_token_ids"] = input_data["prompt_token_ids"].tolist()
-            elif not isinstance(input_data["prompt_token_ids"], list):
+            # Ensure token IDs are lists or numpy arrays
+            if not isinstance(input_data["prompt_token_ids"], list | np.ndarray):
                 raise TypeError(
                     f"prompt_token_ids must be a list or numpy array, got {type(input_data['prompt_token_ids'])}"
                 )
 
+            input_data["prompt_token_ids"] = list(input_data["prompt_token_ids"])
+
         # Extract token IDs and image data for SGLang Engine
         idx_list = [input_data["prompt_token_ids"] for input_data in sglang_inputs]
         image_list = [input_data.get("image_data", None) for input_data in sglang_inputs]
@@ -1266,12 +1266,15 @@ def _preprocess_prompt_to_async_rollout_requests(self, prompts: DataProto, n: in
             else:
                 _interaction_kwargs = {}
 
+            if not isinstance(raw_prompt, list | np.ndarray):
+                raise TypeError(f"raw_prompt must be a list or numpy array, got {type(raw_prompt)}")
+
             req = AsyncRolloutRequest(
                 batch_data_id=data_idx,
                 rollout_offset=0,
                 request_id=str(uuid4()),
                 state=AsyncRolloutRequestStateEnum.PENDING,
-                messages=raw_prompt.tolist(),
+                messages=list(raw_prompt),
                 multi_modal_data=multi_modal_data,
                 tool_schemas=_tool_schemas,
                 tools_kwargs=_tools_kwargs,
diff --git a/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py b/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py
@@ -276,16 +276,15 @@ def generate_sequences(self, prompts: DataProto, **kwargs) -> DataProto:
                 {"prompt_token_ids": raw_prompt_ids} for raw_prompt_ids in non_tensor_batch.pop("raw_prompt_ids")
             ]
 
-        # ensure the type of `prompt_token_ids` passed to vllm is list[int]
-        # https://github.com/volcengine/verl/pull/772
         for input_data in vllm_inputs:
-            if isinstance(input_data["prompt_token_ids"], np.ndarray):
-                input_data["prompt_token_ids"] = input_data["prompt_token_ids"].tolist()
-            elif not isinstance(input_data["prompt_token_ids"], list):
+            # Ensure token IDs are lists or numpy arrays
+            if not isinstance(input_data["prompt_token_ids"], list | np.ndarray):
                 raise TypeError(
                     f"prompt_token_ids must be a list or numpy array, got {type(input_data['prompt_token_ids'])}"
                 )
 
+            input_data["prompt_token_ids"] = list(input_data["prompt_token_ids"])
+
         do_sample = prompts.meta_info.get("do_sample", True)
         is_validate = prompts.meta_info.get("validate", False)
         if not do_sample:

Original file line number	Diff line number	Diff line change
`@@ -298,8 +298,11 @@ async def generate_sequences(self, batch: DataProto) -> DataProto:`
`298`	`298`	`)`
`299`	`299`
`300`	`300`	`for agent_name, messages, trajectory in zip(agent_names, raw_prompts, trajectory_info, strict=True):`
	`301`	`+ if not isinstance(messages, list \| np.ndarray):`
	`302`	`+ raise TypeError(f"messages must be a list or numpy array, got {type(messages)}")`
	`303`	`+`
`301`	`304`	`tasks.append(`
`302`		`- asyncio.create_task(self._run_agent_loop(agent_name, messages.tolist(), sampling_params, trajectory))`
	`305`	`+ asyncio.create_task(self._run_agent_loop(agent_name, list(messages), sampling_params, trajectory))`
`303`	`306`	`)`
`304`	`307`	`outputs = await asyncio.gather(*tasks)`
`305`	`308`