Skip to content

Conversation

@kibitzing
Copy link
Contributor

@kibitzing kibitzing commented Jul 24, 2025

What does this PR do?

This PR updates the collate_fn logic inside verl.utils.dataset.rl_dataset to consistently handle non-tensor fields as 1D object arrays, preventing runtime errors during concatenation in downstream code such as recipe/dapo/dapo_ray_trainer.py.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

python3 -m recipe.dapo.main_dapo \
        data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \
        data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \
        data.prompt_key=prompt \
        data.truncation='left' \
        data.max_prompt_length=2048 \
        data.max_response_length=4096 \
        data.gen_batch_size=32 \
        data.train_batch_size=24 \
        actor_rollout_ref.rollout.n=5 \
        algorithm.adv_estimator=grpo \
        algorithm.filter_groups.enable=True \
        algorithm.filter_groups.max_num_gen_batches=10 \
        actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
        ...

Before vs After Behavior (Real Output Logs)

  • Before: Inconsistent Shape
(TaskRunner pid=114826) Training from scratch
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=1. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=2. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=3. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,)
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

This caused shape inconsistency across steps, leading to downstream errors during concatenation.

  • After: Consistent (32,) Shape
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=1. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=2. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=3. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=4. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=5. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=6. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)

With the updated logic, the shape is consistently (32,).

  • The issue was traced back to the "conversations" field in the Tool N1 dataset. This key contains a list of human–gpt messages. In most examples, it's a single-turn conversation (list with length 1), but in some cases, it's a multi-turn conversation (list with length > 1).

Design & Code Changes

The current collate_fn processes non-tensor values with:

for key, val in non_tensors.items():
non_tensors[key] = np.array(val, dtype=object)

While this generally works, it leads to a subtle issue:
If val is a list of lists and all inner lists happen to be of the same length, NumPy will interpret it as a 2D array with shape (N, L).
However, in many RL scenarios, the structure of non-tensor data (e.g. variable-length lists across batches) is not guaranteed to be uniform, which means:

  • One batch may produce shape (N, L)
  • Another may produce (N,) where each element is a list of different lengths
  • Another may have shape (N, L')

This causes downstream errors like:
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

Specifically, this occurs when multiple step-wise batches are concatenated with:

batch = new_batch if batch is None else DataProto.concat([batch, new_batch])

To enforce consistent 1D object arrays regardless of content, this PR replaces the original line with:

for key, val in non_tensors.items():
    non_tensors[key] = np.empty(len(val), dtype=object)
    non_tensors[key][:] = val

This ensures thatnon_tensors[key] always has shape (N,) which makes concatenation in downstream logic safer.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical bug in the collate_fn function, ensuring consistent array shapes for non-tensor data and preventing downstream errors. The change is well-explained and effective. I've suggested using np.fromiter for potential performance improvements.

Comment on lines 63 to 64
non_tensors[key] = np.empty(len(val), dtype=object)
non_tensors[key][:] = val
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation uses np.empty followed by assignment. A more concise and potentially performant approach is to use np.fromiter to create the 1D object array directly, as collate_fn is on a critical performance path for data loading.

non_tensors[key] = np.fromiter(val, dtype=object, count=len(val))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion!
I've updated the code to use np.fromiter(val, dtype=object, count=len(val)) as recommended.

Copy link
Collaborator

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would u mind adding a unit test?

@CLAassistant
Copy link

CLAassistant commented Jul 25, 2025

CLA assistant check
All committers have signed the CLA.

@kibitzing
Copy link
Contributor Author

Hello @eric-haibin-lin,

I've added a unit test for rl_dataset.collate_fn to verify the 1D object array behavior for non-tensor fields.
Please let me know if you'd like any adjustments or further coverage!

@kibitzing
Copy link
Contributor Author

kibitzing commented Jul 28, 2025

Hello,

I've investigated the test failures in this PR, and here’s a summary:

  1. Failure Cause:
    5 test cases are failing due to a behavior change directly related to this PR. (1, 2, 3, 4, 5)
    The failures stem from tolist() being called on an object that is now a list, whereas it used to be a NumPy array.
    Previously, it was returned as an np.array (with dtype=object) because the messages field always had exactly 2 turns — which, when converted to a list of lists with uniform inner length, resulted in NumPy interpreting it as a regular array rather than a nested list.
    With the current change, it consistently returns a shape of (N,) and preserves the original dtype, so it will return a plain list in such cases.

  2. Plan to Fix:
    To fix the tests, I will remove the unnecessary tolist() calls in the relevant codes that are used by the tests.
    Since the returned type is now naturally a list, tolist() is not only redundant but can cause failures if used on native Python lists.
    (A safer compromise here would be to replace tolist() with list(object), which works consistently regardless of whether the input is a NumPy array or a plain Python list.)

  3. About Previous Branching Logic:
    It looks like this issue had been previously encountered and handled via an if–else branch depending on the object type in this PR fix: prompt_token_ids should be list[int] instead of np.array #772.
    With the new logic introduced in this PR, that conditional handling might no longer be necessary. I'd like to discuss whether it's safe to remove that legacy branching logic.
    Here are some parts I'd like to discuss and potentially revise:

@vermouth1992
Copy link
Collaborator

Could you fix the legacy problem by calling list(obj)? Thanks!

@kibitzing
Copy link
Contributor Author

Sure, I'll update it with list(obj). Thanks for the suggestion!

@kibitzing
Copy link
Contributor Author

kibitzing commented Jul 29, 2025

Hello, @vermouth1992
I replaced the legacy branching logic with list(obj) as discussed.

While updating, I realized that calling list() directly could lead to unexpected behavior if a str or bytes object ever sneaks in, so I added a type check to explicitly allow only list and np.ndarray(including np.ndarray for now, maybe we can gradually remove this and allow only list).

Let me know if you think a stricter or simpler check would be better!

@wuxibin89 wuxibin89 merged commit 23aa105 into volcengine:main Jul 30, 2025
49 of 53 checks passed
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Jul 31, 2025
…ta in collate_fn (volcengine#2741)

### What does this PR do?

This PR updates the `collate_fn` logic inside
`verl.utils.dataset.rl_dataset` to consistently handle non-tensor fields
as 1D object arrays, preventing runtime errors during concatenation in
downstream code such as `recipe/dapo/dapo_ray_trainer.py`.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

* Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1
* Note: This branch is for testing purposes only and is not intended for
merge.

* The data used for testing comes from the `train.parquet` and
`test.parquet` files released by the [Tool N1
repository](https://github.com/NVlabs/Tool-N1).
* part of training script
```python
python3 -m recipe.dapo.main_dapo \
        data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \
        data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \
        data.prompt_key=prompt \
        data.truncation='left' \
        data.max_prompt_length=2048 \
        data.max_response_length=4096 \
        data.gen_batch_size=32 \
        data.train_batch_size=24 \
        actor_rollout_ref.rollout.n=5 \
        algorithm.adv_estimator=grpo \
        algorithm.filter_groups.enable=True \
        algorithm.filter_groups.max_num_gen_batches=10 \
        actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
        ...
```

### Before vs After Behavior (Real Output Logs)
* Before: Inconsistent Shape
```
(TaskRunner pid=114826) Training from scratch
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=1. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=2. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=3. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,)
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)
```
This caused shape inconsistency across steps, leading to downstream
errors during concatenation.

* After: Consistent (32,) Shape

```
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=1. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=2. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=3. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=4. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=5. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=6. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
```
With the updated logic, the shape is consistently (32,).

* The issue was traced back to the `"conversations"` field in the Tool
N1 dataset. This key contains a list of human–gpt messages. In most
examples, it's a single-turn conversation (list with length 1), but in
some cases, it's a multi-turn conversation (list with length > 1).

### Design & Code Changes

The current `collate_fn` processes non-tensor values with:


https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/verl/utils/dataset/rl_dataset.py#L62-L63

While this generally works, it leads to a subtle issue:
If `val` is a list of lists and all inner lists happen to be of the same
length, NumPy will interpret it as a 2D array with shape (N, L).
However, in many RL scenarios, the structure of non-tensor data (e.g.
variable-length lists across batches) is not guaranteed to be uniform,
which means:

- One batch may produce shape `(N, L)`
- Another may produce `(N,)` where each element is a list of different
lengths
- Another may have shape `(N, L')`

This causes downstream errors like:
`ValueError: all the input arrays must have same number of dimensions,
but the array at index 0 has 2 dimension(s) and the array at index 1 has
1 dimension(s)`

Specifically, this occurs when multiple step-wise batches are
concatenated with:


https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/recipe/dapo/dapo_ray_trainer.py#L240

To enforce consistent 1D object arrays regardless of content, this PR
replaces the original line with:

```python
for key, val in non_tensors.items():
    non_tensors[key] = np.empty(len(val), dtype=object)
    non_tensors[key][:] = val
```
This ensures that`non_tensors[key]` always has shape (N,) which makes
concatenation in downstream logic safer.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Juniper1021 pushed a commit to Juniper1021/verl that referenced this pull request Aug 7, 2025
…ta in collate_fn (volcengine#2741)

### What does this PR do?

This PR updates the `collate_fn` logic inside
`verl.utils.dataset.rl_dataset` to consistently handle non-tensor fields
as 1D object arrays, preventing runtime errors during concatenation in
downstream code such as `recipe/dapo/dapo_ray_trainer.py`.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

* Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1
* Note: This branch is for testing purposes only and is not intended for
merge.

* The data used for testing comes from the `train.parquet` and
`test.parquet` files released by the [Tool N1
repository](https://github.com/NVlabs/Tool-N1).
* part of training script
```python
python3 -m recipe.dapo.main_dapo \
        data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \
        data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \
        data.prompt_key=prompt \
        data.truncation='left' \
        data.max_prompt_length=2048 \
        data.max_response_length=4096 \
        data.gen_batch_size=32 \
        data.train_batch_size=24 \
        actor_rollout_ref.rollout.n=5 \
        algorithm.adv_estimator=grpo \
        algorithm.filter_groups.enable=True \
        algorithm.filter_groups.max_num_gen_batches=10 \
        actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
        ...
```

### Before vs After Behavior (Real Output Logs)
* Before: Inconsistent Shape
```
(TaskRunner pid=114826) Training from scratch
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=1. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=2. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=3. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,)
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)
```
This caused shape inconsistency across steps, leading to downstream
errors during concatenation.

* After: Consistent (32,) Shape

```
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=1. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=2. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=3. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=4. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=5. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=6. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
```
With the updated logic, the shape is consistently (32,).

* The issue was traced back to the `"conversations"` field in the Tool
N1 dataset. This key contains a list of human–gpt messages. In most
examples, it's a single-turn conversation (list with length 1), but in
some cases, it's a multi-turn conversation (list with length > 1).

### Design & Code Changes

The current `collate_fn` processes non-tensor values with:


https://github.com/volcengine/verl/blob/0ad027bbe6f382b724a2f35a4d69fb66f79e7809/verl/utils/dataset/rl_dataset.py#L62-L63

While this generally works, it leads to a subtle issue:
If `val` is a list of lists and all inner lists happen to be of the same
length, NumPy will interpret it as a 2D array with shape (N, L).
However, in many RL scenarios, the structure of non-tensor data (e.g.
variable-length lists across batches) is not guaranteed to be uniform,
which means:

- One batch may produce shape `(N, L)`
- Another may produce `(N,)` where each element is a list of different
lengths
- Another may have shape `(N, L')`

This causes downstream errors like:
`ValueError: all the input arrays must have same number of dimensions,
but the array at index 0 has 2 dimension(s) and the array at index 1 has
1 dimension(s)`

Specifically, this occurs when multiple step-wise batches are
concatenated with:


https://github.com/volcengine/verl/blob/0ad027bbe6f382b724a2f35a4d69fb66f79e7809/recipe/dapo/dapo_ray_trainer.py#L240

To enforce consistent 1D object arrays regardless of content, this PR
replaces the original line with:

```python
for key, val in non_tensors.items():
    non_tensors[key] = np.empty(len(val), dtype=object)
    non_tensors[key][:] = val
```
This ensures that`non_tensors[key]` always has shape (N,) which makes
concatenation in downstream logic safer.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
…ta in collate_fn (volcengine#2741)

### What does this PR do?

This PR updates the `collate_fn` logic inside
`verl.utils.dataset.rl_dataset` to consistently handle non-tensor fields
as 1D object arrays, preventing runtime errors during concatenation in
downstream code such as `recipe/dapo/dapo_ray_trainer.py`.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

* Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1
* Note: This branch is for testing purposes only and is not intended for
merge.

* The data used for testing comes from the `train.parquet` and
`test.parquet` files released by the [Tool N1
repository](https://github.com/NVlabs/Tool-N1).
* part of training script
```python
python3 -m recipe.dapo.main_dapo \
        data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \
        data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \
        data.prompt_key=prompt \
        data.truncation='left' \
        data.max_prompt_length=2048 \
        data.max_response_length=4096 \
        data.gen_batch_size=32 \
        data.train_batch_size=24 \
        actor_rollout_ref.rollout.n=5 \
        algorithm.adv_estimator=grpo \
        algorithm.filter_groups.enable=True \
        algorithm.filter_groups.max_num_gen_batches=10 \
        actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
        ...
```

### Before vs After Behavior (Real Output Logs)
* Before: Inconsistent Shape
```
(TaskRunner pid=114826) Training from scratch
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=1. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=2. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=3. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,)
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)
```
This caused shape inconsistency across steps, leading to downstream
errors during concatenation.

* After: Consistent (32,) Shape

```
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=1. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=2. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=3. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=4. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=5. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=6. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
```
With the updated logic, the shape is consistently (32,).

* The issue was traced back to the `"conversations"` field in the Tool
N1 dataset. This key contains a list of human–gpt messages. In most
examples, it's a single-turn conversation (list with length 1), but in
some cases, it's a multi-turn conversation (list with length > 1).

### Design & Code Changes

The current `collate_fn` processes non-tensor values with:


https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/verl/utils/dataset/rl_dataset.py#L62-L63

While this generally works, it leads to a subtle issue:
If `val` is a list of lists and all inner lists happen to be of the same
length, NumPy will interpret it as a 2D array with shape (N, L).
However, in many RL scenarios, the structure of non-tensor data (e.g.
variable-length lists across batches) is not guaranteed to be uniform,
which means:

- One batch may produce shape `(N, L)`
- Another may produce `(N,)` where each element is a list of different
lengths
- Another may have shape `(N, L')`

This causes downstream errors like:
`ValueError: all the input arrays must have same number of dimensions,
but the array at index 0 has 2 dimension(s) and the array at index 1 has
1 dimension(s)`

Specifically, this occurs when multiple step-wise batches are
concatenated with:


https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/recipe/dapo/dapo_ray_trainer.py#L240

To enforce consistent 1D object arrays regardless of content, this PR
replaces the original line with:

```python
for key, val in non_tensors.items():
    non_tensors[key] = np.empty(len(val), dtype=object)
    non_tensors[key][:] = val
```
This ensures that`non_tensors[key]` always has shape (N,) which makes
concatenation in downstream logic safer.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants