-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn #2741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn #2741
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a critical bug in the collate_fn function, ensuring consistent array shapes for non-tensor data and preventing downstream errors. The change is well-explained and effective. I've suggested using np.fromiter for potential performance improvements.
verl/utils/dataset/rl_dataset.py
Outdated
| non_tensors[key] = np.empty(len(val), dtype=object) | ||
| non_tensors[key][:] = val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation uses np.empty followed by assignment. A more concise and potentially performant approach is to use np.fromiter to create the 1D object array directly, as collate_fn is on a critical performance path for data loading.
non_tensors[key] = np.fromiter(val, dtype=object, count=len(val))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion!
I've updated the code to use np.fromiter(val, dtype=object, count=len(val)) as recommended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would u mind adding a unit test?
|
Hello @eric-haibin-lin, I've added a unit test for rl_dataset.collate_fn to verify the 1D object array behavior for non-tensor fields. |
|
Hello, I've investigated the test failures in this PR, and here’s a summary:
|
|
Could you fix the legacy problem by calling list(obj)? Thanks! |
|
Sure, I'll update it with |
|
Hello, @vermouth1992 While updating, I realized that calling Let me know if you think a stricter or simpler check would be better! |
…ta in collate_fn (volcengine#2741) ### What does this PR do? This PR updates the `collate_fn` logic inside `verl.utils.dataset.rl_dataset` to consistently handle non-tensor fields as 1D object arrays, preventing runtime errors during concatenation in downstream code such as `recipe/dapo/dapo_ray_trainer.py`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1 * Note: This branch is for testing purposes only and is not intended for merge. * The data used for testing comes from the `train.parquet` and `test.parquet` files released by the [Tool N1 repository](https://github.com/NVlabs/Tool-N1). * part of training script ```python python3 -m recipe.dapo.main_dapo \ data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \ data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \ data.prompt_key=prompt \ data.truncation='left' \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.gen_batch_size=32 \ data.train_batch_size=24 \ actor_rollout_ref.rollout.n=5 \ algorithm.adv_estimator=grpo \ algorithm.filter_groups.enable=True \ algorithm.filter_groups.max_num_gen_batches=10 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \ ... ``` ### Before vs After Behavior (Real Output Logs) * Before: Inconsistent Shape ``` (TaskRunner pid=114826) Training from scratch (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=1. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=2. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=3. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,) ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s) ``` This caused shape inconsistency across steps, leading to downstream errors during concatenation. * After: Consistent (32,) Shape ``` (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=1. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=2. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=3. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=4. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=5. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=6. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) ``` With the updated logic, the shape is consistently (32,). * The issue was traced back to the `"conversations"` field in the Tool N1 dataset. This key contains a list of human–gpt messages. In most examples, it's a single-turn conversation (list with length 1), but in some cases, it's a multi-turn conversation (list with length > 1). ### Design & Code Changes The current `collate_fn` processes non-tensor values with: https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/verl/utils/dataset/rl_dataset.py#L62-L63 While this generally works, it leads to a subtle issue: If `val` is a list of lists and all inner lists happen to be of the same length, NumPy will interpret it as a 2D array with shape (N, L). However, in many RL scenarios, the structure of non-tensor data (e.g. variable-length lists across batches) is not guaranteed to be uniform, which means: - One batch may produce shape `(N, L)` - Another may produce `(N,)` where each element is a list of different lengths - Another may have shape `(N, L')` This causes downstream errors like: `ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)` Specifically, this occurs when multiple step-wise batches are concatenated with: https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/recipe/dapo/dapo_ray_trainer.py#L240 To enforce consistent 1D object arrays regardless of content, this PR replaces the original line with: ```python for key, val in non_tensors.items(): non_tensors[key] = np.empty(len(val), dtype=object) non_tensors[key][:] = val ``` This ensures that`non_tensors[key]` always has shape (N,) which makes concatenation in downstream logic safer. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
…ta in collate_fn (volcengine#2741) ### What does this PR do? This PR updates the `collate_fn` logic inside `verl.utils.dataset.rl_dataset` to consistently handle non-tensor fields as 1D object arrays, preventing runtime errors during concatenation in downstream code such as `recipe/dapo/dapo_ray_trainer.py`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1 * Note: This branch is for testing purposes only and is not intended for merge. * The data used for testing comes from the `train.parquet` and `test.parquet` files released by the [Tool N1 repository](https://github.com/NVlabs/Tool-N1). * part of training script ```python python3 -m recipe.dapo.main_dapo \ data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \ data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \ data.prompt_key=prompt \ data.truncation='left' \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.gen_batch_size=32 \ data.train_batch_size=24 \ actor_rollout_ref.rollout.n=5 \ algorithm.adv_estimator=grpo \ algorithm.filter_groups.enable=True \ algorithm.filter_groups.max_num_gen_batches=10 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \ ... ``` ### Before vs After Behavior (Real Output Logs) * Before: Inconsistent Shape ``` (TaskRunner pid=114826) Training from scratch (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=1. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=2. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=3. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,) ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s) ``` This caused shape inconsistency across steps, leading to downstream errors during concatenation. * After: Consistent (32,) Shape ``` (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=1. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=2. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=3. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=4. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=5. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=6. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) ``` With the updated logic, the shape is consistently (32,). * The issue was traced back to the `"conversations"` field in the Tool N1 dataset. This key contains a list of human–gpt messages. In most examples, it's a single-turn conversation (list with length 1), but in some cases, it's a multi-turn conversation (list with length > 1). ### Design & Code Changes The current `collate_fn` processes non-tensor values with: https://github.com/volcengine/verl/blob/0ad027bbe6f382b724a2f35a4d69fb66f79e7809/verl/utils/dataset/rl_dataset.py#L62-L63 While this generally works, it leads to a subtle issue: If `val` is a list of lists and all inner lists happen to be of the same length, NumPy will interpret it as a 2D array with shape (N, L). However, in many RL scenarios, the structure of non-tensor data (e.g. variable-length lists across batches) is not guaranteed to be uniform, which means: - One batch may produce shape `(N, L)` - Another may produce `(N,)` where each element is a list of different lengths - Another may have shape `(N, L')` This causes downstream errors like: `ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)` Specifically, this occurs when multiple step-wise batches are concatenated with: https://github.com/volcengine/verl/blob/0ad027bbe6f382b724a2f35a4d69fb66f79e7809/recipe/dapo/dapo_ray_trainer.py#L240 To enforce consistent 1D object arrays regardless of content, this PR replaces the original line with: ```python for key, val in non_tensors.items(): non_tensors[key] = np.empty(len(val), dtype=object) non_tensors[key][:] = val ``` This ensures that`non_tensors[key]` always has shape (N,) which makes concatenation in downstream logic safer. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
…ta in collate_fn (volcengine#2741) ### What does this PR do? This PR updates the `collate_fn` logic inside `verl.utils.dataset.rl_dataset` to consistently handle non-tensor fields as 1D object arrays, preventing runtime errors during concatenation in downstream code such as `recipe/dapo/dapo_ray_trainer.py`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1 * Note: This branch is for testing purposes only and is not intended for merge. * The data used for testing comes from the `train.parquet` and `test.parquet` files released by the [Tool N1 repository](https://github.com/NVlabs/Tool-N1). * part of training script ```python python3 -m recipe.dapo.main_dapo \ data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \ data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \ data.prompt_key=prompt \ data.truncation='left' \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.gen_batch_size=32 \ data.train_batch_size=24 \ actor_rollout_ref.rollout.n=5 \ algorithm.adv_estimator=grpo \ algorithm.filter_groups.enable=True \ algorithm.filter_groups.max_num_gen_batches=10 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \ ... ``` ### Before vs After Behavior (Real Output Logs) * Before: Inconsistent Shape ``` (TaskRunner pid=114826) Training from scratch (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=1. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=2. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=3. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,) ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s) ``` This caused shape inconsistency across steps, leading to downstream errors during concatenation. * After: Consistent (32,) Shape ``` (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=1. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=2. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=3. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=4. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=5. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=6. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) ``` With the updated logic, the shape is consistently (32,). * The issue was traced back to the `"conversations"` field in the Tool N1 dataset. This key contains a list of human–gpt messages. In most examples, it's a single-turn conversation (list with length 1), but in some cases, it's a multi-turn conversation (list with length > 1). ### Design & Code Changes The current `collate_fn` processes non-tensor values with: https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/verl/utils/dataset/rl_dataset.py#L62-L63 While this generally works, it leads to a subtle issue: If `val` is a list of lists and all inner lists happen to be of the same length, NumPy will interpret it as a 2D array with shape (N, L). However, in many RL scenarios, the structure of non-tensor data (e.g. variable-length lists across batches) is not guaranteed to be uniform, which means: - One batch may produce shape `(N, L)` - Another may produce `(N,)` where each element is a list of different lengths - Another may have shape `(N, L')` This causes downstream errors like: `ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)` Specifically, this occurs when multiple step-wise batches are concatenated with: https://github.com/volcengine/verl/blob/1df03f3abf96f59cb90c684f93a71ee0bbb57f49/recipe/dapo/dapo_ray_trainer.py#L240 To enforce consistent 1D object arrays regardless of content, this PR replaces the original line with: ```python for key, val in non_tensors.items(): non_tensors[key] = np.empty(len(val), dtype=object) non_tensors[key][:] = val ``` This ensures that`non_tensors[key]` always has shape (N,) which makes concatenation in downstream logic safer. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
What does this PR do?
This PR updates the
collate_fnlogic insideverl.utils.dataset.rl_datasetto consistently handle non-tensor fields as 1D object arrays, preventing runtime errors during concatenation in downstream code such asrecipe/dapo/dapo_ray_trainer.py.Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1
The data used for testing comes from the
train.parquetandtest.parquetfiles released by the Tool N1 repository.part of training script
Before vs After Behavior (Real Output Logs)
This caused shape inconsistency across steps, leading to downstream errors during concatenation.
With the updated logic, the shape is consistently (32,).
"conversations"field in the Tool N1 dataset. This key contains a list of human–gpt messages. In most examples, it's a single-turn conversation (list with length 1), but in some cases, it's a multi-turn conversation (list with length > 1).Design & Code Changes
The current
collate_fnprocesses non-tensor values with:verl/verl/utils/dataset/rl_dataset.py
Lines 62 to 63 in 1df03f3
While this generally works, it leads to a subtle issue:
If
valis a list of lists and all inner lists happen to be of the same length, NumPy will interpret it as a 2D array with shape (N, L).However, in many RL scenarios, the structure of non-tensor data (e.g. variable-length lists across batches) is not guaranteed to be uniform, which means:
(N, L)(N,)where each element is a list of different lengths(N, L')This causes downstream errors like:
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)Specifically, this occurs when multiple step-wise batches are concatenated with:
verl/recipe/dapo/dapo_ray_trainer.py
Line 240 in 1df03f3
To enforce consistent 1D object arrays regardless of content, this PR replaces the original line with:
This ensures that
non_tensors[key]always has shape (N,) which makes concatenation in downstream logic safer.Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)