[training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn #2741

kibitzing · 2025-07-24T15:03:27Z

What does this PR do?

This PR updates the collate_fn logic inside verl.utils.dataset.rl_dataset to consistently handle non-tensor fields as 1D object arrays, preventing runtime errors during concatenation in downstream code such as recipe/dapo/dapo_ray_trainer.py.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1
- Note: This branch is for testing purposes only and is not intended for merge.
The data used for testing comes from the train.parquet and test.parquet files released by the Tool N1 repository.
part of training script

python3 -m recipe.dapo.main_dapo \
        data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \
        data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \
        data.prompt_key=prompt \
        data.truncation='left' \
        data.max_prompt_length=2048 \
        data.max_response_length=4096 \
        data.gen_batch_size=32 \
        data.train_batch_size=24 \
        actor_rollout_ref.rollout.n=5 \
        algorithm.adv_estimator=grpo \
        algorithm.filter_groups.enable=True \
        algorithm.filter_groups.max_num_gen_batches=10 \
        actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
        ...

Before vs After Behavior (Real Output Logs)

Before: Inconsistent Shape

(TaskRunner pid=114826) Training from scratch
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=1. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=2. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=3. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,)
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

This caused shape inconsistency across steps, leading to downstream errors during concatenation.

After: Consistent (32,) Shape

(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=1. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=2. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=3. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=4. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=5. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=6. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)

With the updated logic, the shape is consistently (32,).

The issue was traced back to the "conversations" field in the Tool N1 dataset. This key contains a list of human–gpt messages. In most examples, it's a single-turn conversation (list with length 1), but in some cases, it's a multi-turn conversation (list with length > 1).

Design & Code Changes

The current collate_fn processes non-tensor values with:

verl/verl/utils/dataset/rl_dataset.py

Lines 62 to 63 in 1df03f3

    
           for key, val in non_tensors.items(): 
        
               non_tensors[key] = np.array(val, dtype=object)

While this generally works, it leads to a subtle issue:
If val is a list of lists and all inner lists happen to be of the same length, NumPy will interpret it as a 2D array with shape (N, L).
However, in many RL scenarios, the structure of non-tensor data (e.g. variable-length lists across batches) is not guaranteed to be uniform, which means:

One batch may produce shape (N, L)
Another may produce (N,) where each element is a list of different lengths
Another may have shape (N, L')

This causes downstream errors like:
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

Specifically, this occurs when multiple step-wise batches are concatenated with:

verl/recipe/dapo/dapo_ray_trainer.py

Line 240 in 1df03f3

batch = new_batch if batch is None else DataProto.concat([batch, new_batch])

To enforce consistent 1D object arrays regardless of content, this PR replaces the original line with:

for key, val in non_tensors.items():
    non_tensors[key] = np.empty(len(val), dtype=object)
    non_tensors[key][:] = val

This ensures thatnon_tensors[key] always has shape (N,) which makes concatenation in downstream logic safer.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

gemini-code-assist

Code Review

This pull request addresses a critical bug in the collate_fn function, ensuring consistent array shapes for non-tensor data and preventing downstream errors. The change is well-explained and effective. I've suggested using np.fromiter for potential performance improvements.

gemini-code-assist · 2025-07-24T15:05:06Z

verl/utils/dataset/rl_dataset.py

+        non_tensors[key] = np.empty(len(val), dtype=object)
+        non_tensors[key][:] = val


The current implementation uses np.empty followed by assignment. A more concise and potentially performant approach is to use np.fromiter to create the 1D object array directly, as collate_fn is on a critical performance path for data loading.

non_tensors[key] = np.fromiter(val, dtype=object, count=len(val))

Thanks for the suggestion!
I've updated the code to use np.fromiter(val, dtype=object, count=len(val)) as recommended.

…te_fn

eric-haibin-lin

would u mind adding a unit test?

CLAassistant · 2025-07-25T06:29:49Z

All committers have signed the CLA.

kibitzing · 2025-07-25T06:32:31Z

Hello @eric-haibin-lin,

I've added a unit test for rl_dataset.collate_fn to verify the 1D object array behavior for non-tensor fields.
Please let me know if you'd like any adjustments or further coverage!

kibitzing · 2025-07-28T13:26:44Z

Hello,

I've investigated the test failures in this PR, and here’s a summary:

Failure Cause:
5 test cases are failing due to a behavior change directly related to this PR. (1, 2, 3, 4, 5)
The failures stem from tolist() being called on an object that is now a list, whereas it used to be a NumPy array.
Previously, it was returned as an np.array (with dtype=object) because the messages field always had exactly 2 turns — which, when converted to a list of lists with uniform inner length, resulted in NumPy interpreting it as a regular array rather than a nested list.
With the current change, it consistently returns a shape of (N,) and preserves the original dtype, so it will return a plain list in such cases.
Plan to Fix:
To fix the tests, I will remove the unnecessary tolist() calls in the relevant codes that are used by the tests.
Since the returned type is now naturally a list, tolist() is not only redundant but can cause failures if used on native Python lists.
(A safer compromise here would be to replace tolist() with list(object), which works consistently regardless of whether the input is a NumPy array or a plain Python list.)
About Previous Branching Logic:
It looks like this issue had been previously encountered and handled via an if–else branch depending on the object type in this PR fix: prompt_token_ids should be list[int] instead of np.array #772.
With the new logic introduced in this PR, that conditional handling might no longer be necessary. I'd like to discuss whether it's safe to remove that legacy branching logic.
Here are some parts I'd like to discuss and potentially revise:

vermouth1992 · 2025-07-29T13:14:32Z

Could you fix the legacy problem by calling list(obj)? Thanks!

kibitzing · 2025-07-29T13:19:32Z

Sure, I'll update it with list(obj). Thanks for the suggestion!

kibitzing · 2025-07-29T15:40:50Z

Hello, @vermouth1992
I replaced the legacy branching logic with list(obj) as discussed.

While updating, I realized that calling list() directly could lead to unexpected behavior if a str or bytes object ever sneaks in, so I added a type check to explicitly allow only list and np.ndarray(including np.ndarray for now, maybe we can gradually remove this and allow only list).

Let me know if you think a stricter or simpler check would be better!

…ta in collate_fn (volcengine#2741) ### What does this PR do? This PR updates the `collate_fn` logic inside `verl.utils.dataset.rl_dataset` to consistently handle non-tensor fields as 1D object arrays, preventing runtime errors during concatenation in downstream code such as `recipe/dapo/dapo_ray_trainer.py`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1 * Note: This branch is for testing purposes only and is not intended for merge. * The data used for testing comes from the `train.parquet` and `test.parquet` files released by the [Tool N1 repository](https://github.com/NVlabs/Tool-N1). * part of training script ```python python3 -m recipe.dapo.main_dapo \ data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \ data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \ data.prompt_key=prompt \ data.truncation='left' \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.gen_batch_size=32 \ data.train_batch_size=24 \ actor_rollout_ref.rollout.n=5 \ algorithm.adv_estimator=grpo \ algorithm.filter_groups.enable=True \ algorithm.filter_groups.max_num_gen_batches=10 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \ ... ``` ### Before vs After Behavior (Real Output Logs) * Before: Inconsistent Shape ``` (TaskRunner pid=114826) Training from scratch (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=1. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=2. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=3. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,) ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s) ``` This caused shape inconsistency across steps, leading to downstream errors during concatenation. * After: Consistent (32,) Shape ``` (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=1. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=2. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=3. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=4. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=5. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=6. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) ``` With the updated logic, the shape is consistently (32,). * The issue was traced back to the `"conversations"` field in the Tool N1 dataset. This key contains a list of human–gpt messages. In most examples, it's a single-turn conversation (list with length 1), but in some cases, it's a multi-turn conversation (list with length > 1). ### Design & Code Changes The current `collate_fn` processes non-tensor values with: https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/verl/utils/dataset/rl_dataset.py#L62-L63 While this generally works, it leads to a subtle issue: If `val` is a list of lists and all inner lists happen to be of the same length, NumPy will interpret it as a 2D array with shape (N, L). However, in many RL scenarios, the structure of non-tensor data (e.g. variable-length lists across batches) is not guaranteed to be uniform, which means: - One batch may produce shape `(N, L)` - Another may produce `(N,)` where each element is a list of different lengths - Another may have shape `(N, L')` This causes downstream errors like: `ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)` Specifically, this occurs when multiple step-wise batches are concatenated with: https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/recipe/dapo/dapo_ray_trainer.py#L240 To enforce consistent 1D object arrays regardless of content, this PR replaces the original line with: ```python for key, val in non_tensors.items(): non_tensors[key] = np.empty(len(val), dtype=object) non_tensors[key][:] = val ``` This ensures that`non_tensors[key]` always has shape (N,) which makes concatenation in downstream logic safer. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

…ta in collate_fn (volcengine#2741) ### What does this PR do? This PR updates the `collate_fn` logic inside `verl.utils.dataset.rl_dataset` to consistently handle non-tensor fields as 1D object arrays, preventing runtime errors during concatenation in downstream code such as `recipe/dapo/dapo_ray_trainer.py`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1 * Note: This branch is for testing purposes only and is not intended for merge. * The data used for testing comes from the `train.parquet` and `test.parquet` files released by the [Tool N1 repository](https://github.com/NVlabs/Tool-N1). * part of training script ```python python3 -m recipe.dapo.main_dapo \ data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \ data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \ data.prompt_key=prompt \ data.truncation='left' \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.gen_batch_size=32 \ data.train_batch_size=24 \ actor_rollout_ref.rollout.n=5 \ algorithm.adv_estimator=grpo \ algorithm.filter_groups.enable=True \ algorithm.filter_groups.max_num_gen_batches=10 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \ ... ``` ### Before vs After Behavior (Real Output Logs) * Before: Inconsistent Shape ``` (TaskRunner pid=114826) Training from scratch (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=1. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=2. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=3. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,) ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s) ``` This caused shape inconsistency across steps, leading to downstream errors during concatenation. * After: Consistent (32,) Shape ``` (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=1. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=2. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=3. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=4. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=5. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=6. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) ``` With the updated logic, the shape is consistently (32,). * The issue was traced back to the `"conversations"` field in the Tool N1 dataset. This key contains a list of human–gpt messages. In most examples, it's a single-turn conversation (list with length 1), but in some cases, it's a multi-turn conversation (list with length > 1). ### Design & Code Changes The current `collate_fn` processes non-tensor values with: https://github.com/volcengine/verl/blob/0ad027bbe6f382b724a2f35a4d69fb66f79e7809/verl/utils/dataset/rl_dataset.py#L62-L63 While this generally works, it leads to a subtle issue: If `val` is a list of lists and all inner lists happen to be of the same length, NumPy will interpret it as a 2D array with shape (N, L). However, in many RL scenarios, the structure of non-tensor data (e.g. variable-length lists across batches) is not guaranteed to be uniform, which means: - One batch may produce shape `(N, L)` - Another may produce `(N,)` where each element is a list of different lengths - Another may have shape `(N, L')` This causes downstream errors like: `ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)` Specifically, this occurs when multiple step-wise batches are concatenated with: https://github.com/volcengine/verl/blob/0ad027bbe6f382b724a2f35a4d69fb66f79e7809/recipe/dapo/dapo_ray_trainer.py#L240 To enforce consistent 1D object arrays regardless of content, this PR replaces the original line with: ```python for key, val in non_tensors.items(): non_tensors[key] = np.empty(len(val), dtype=object) non_tensors[key][:] = val ``` This ensures that`non_tensors[key]` always has shape (N,) which makes concatenation in downstream logic safer. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

…ta in collate_fn (volcengine#2741) ### What does this PR do? This PR updates the `collate_fn` logic inside `verl.utils.dataset.rl_dataset` to consistently handle non-tensor fields as 1D object arrays, preventing runtime errors during concatenation in downstream code such as `recipe/dapo/dapo_ray_trainer.py`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1 * Note: This branch is for testing purposes only and is not intended for merge. * The data used for testing comes from the `train.parquet` and `test.parquet` files released by the [Tool N1 repository](https://github.com/NVlabs/Tool-N1). * part of training script ```python python3 -m recipe.dapo.main_dapo \ data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \ data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \ data.prompt_key=prompt \ data.truncation='left' \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.gen_batch_size=32 \ data.train_batch_size=24 \ actor_rollout_ref.rollout.n=5 \ algorithm.adv_estimator=grpo \ algorithm.filter_groups.enable=True \ algorithm.filter_groups.max_num_gen_batches=10 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \ ... ``` ### Before vs After Behavior (Real Output Logs) * Before: Inconsistent Shape ``` (TaskRunner pid=114826) Training from scratch (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=1. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=2. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=3. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,) ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s) ``` This caused shape inconsistency across steps, leading to downstream errors during concatenation. * After: Consistent (32,) Shape ``` (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=1. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=2. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=3. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=4. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=5. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=6. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) ``` With the updated logic, the shape is consistently (32,). * The issue was traced back to the `"conversations"` field in the Tool N1 dataset. This key contains a list of human–gpt messages. In most examples, it's a single-turn conversation (list with length 1), but in some cases, it's a multi-turn conversation (list with length > 1). ### Design & Code Changes The current `collate_fn` processes non-tensor values with: https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/verl/utils/dataset/rl_dataset.py#L62-L63 While this generally works, it leads to a subtle issue: If `val` is a list of lists and all inner lists happen to be of the same length, NumPy will interpret it as a 2D array with shape (N, L). However, in many RL scenarios, the structure of non-tensor data (e.g. variable-length lists across batches) is not guaranteed to be uniform, which means: - One batch may produce shape `(N, L)` - Another may produce `(N,)` where each element is a list of different lengths - Another may have shape `(N, L')` This causes downstream errors like: `ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)` Specifically, this occurs when multiple step-wise batches are concatenated with: https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/recipe/dapo/dapo_ray_trainer.py#L240 To enforce consistent 1D object arrays regardless of content, this PR replaces the original line with: ```python for key, val in non_tensors.items(): non_tensors[key] = np.empty(len(val), dtype=object) non_tensors[key][:] = val ``` This ensures that`non_tensors[key]` always has shape (N,) which makes concatenation in downstream logic safer. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

fix: enforce 1D object array shape for non-tensor data in collate_fn

b969ee2

gemini-code-assist bot reviewed Jul 24, 2025

View reviewed changes

refactor: replace object array construction with np.fromiter in colla…

c2a26b2

…te_fn

eric-haibin-lin reviewed Jul 24, 2025

View reviewed changes

test: add unit test for collate_fn in rl_dataset

a8cbac5

chore: apply formatting fixes from pre-commit

ce33651

kibitzing added 2 commits July 28, 2025 13:34

fix: remove tolist() usage where np.array was previously assumed

1108504

chore: apply formatting fixes from pre-commit

afdd48f

kibitzing requested review from SwordFaith, chenhaiq and zhaochenyang20 as code owners July 28, 2025 13:40

kibitzing requested a review from eric-haibin-lin July 29, 2025 11:05

vermouth1992 approved these changes Jul 29, 2025

View reviewed changes

refactor: replace type branching with list(obj) and add type guard

3f42222

kibitzing requested review from PeterSH6 and wuxibin89 as code owners July 29, 2025 15:34

wuxibin89 merged commit 23aa105 into volcengine:main Jul 30, 2025
49 of 53 checks passed

kibitzing mentioned this pull request Aug 2, 2025

BUG: Inconsistent behavior in np.array with dtype=object when inner list lengths are uniform numpy/numpy#29499

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn #2741

[training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn #2741

Uh oh!

kibitzing commented Jul 24, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 24, 2025

Uh oh!

kibitzing Jul 24, 2025

Uh oh!

eric-haibin-lin left a comment

Uh oh!

CLAassistant commented Jul 25, 2025 •

edited

Loading

Uh oh!

kibitzing commented Jul 25, 2025

Uh oh!

kibitzing commented Jul 28, 2025 •

edited

Loading

Uh oh!

vermouth1992 commented Jul 29, 2025

Uh oh!

kibitzing commented Jul 29, 2025

Uh oh!

kibitzing commented Jul 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	for key, val in non_tensors.items():
	non_tensors[key] = np.array(val, dtype=object)

		non_tensors[key] = np.empty(len(val), dtype=object)
		non_tensors[key][:] = val

[training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn #2741

[training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn #2741

Uh oh!

Conversation

kibitzing commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

Before vs After Behavior (Real Output Logs)

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

kibitzing Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

eric-haibin-lin left a comment

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kibitzing commented Jul 25, 2025

Uh oh!

kibitzing commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vermouth1992 commented Jul 29, 2025

Uh oh!

kibitzing commented Jul 29, 2025

Uh oh!

kibitzing commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kibitzing commented Jul 24, 2025 •

edited

Loading

CLAassistant commented Jul 25, 2025 •

edited

Loading

kibitzing commented Jul 28, 2025 •

edited

Loading

kibitzing commented Jul 29, 2025 •

edited

Loading