Skip to content

Conversation

@HollowMan6
Copy link
Collaborator

@HollowMan6 HollowMan6 commented Aug 30, 2025

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Related to:

This PR adds support when we mix multi-modal with pure text data in the dataset, where we can set the value for the image_key or video_key to None. Most importantly, if an empty list is passed, for Qwen2VLImageProcessorFast, when we handle the pure text data, we should keep the images/videos parameters as None instead of an empty list, otherwise, currently, the bug in transformers will make things end up in an error:

  File "torchdata/stateful_dataloader/worker.py", line 242, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[union-attr]
  File "torch/utils/data/_utils/fetch.py", line 52, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "verl/utils/dataset/rl_dataset.py", line 248, in __getitem__
    model_inputs = self.processor(text=[raw_prompt], images=images, videos=videos, return_tensors="pt")
  File "transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 150, in __call__
    image_inputs = self.image_processor(images=images, **output_kwargs["images_kwargs"])
  File "transformers/image_processing_utils_fast.py", line 637, in __call__
    return self.preprocess(images, *args, **kwargs)
  File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 151, in preprocess
    return super().preprocess(images, videos, **kwargs)
  File "transformers/image_processing_utils_fast.py", line 662, in preprocess
    return self._preprocess_image_like_inputs(
  File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 173, in _preprocess_image_like_inputs
    batch_feature = self._preprocess(images, **kwargs)
  File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 211, in _preprocess
    grouped_images, grouped_images_index = group_images_by_shape(images, disable_grouping=disable_grouping)
  File "transformers/image_transforms.py", line 917, in group_images_by_shape
    device = images[0][0].device if is_nested else images[0].device
IndexError: list index out of range

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

This PR adds support when we mix multi-modal with pure text
data in the dataset, where we can set the value for the
image_key or video_key to None (empty list is already supported). Most
importantly, for `Qwen2VLImageProcessorFast`, when we handle the
pure text data, we should keep the `images`/`videos` parameters as
None instead of an empty list, otherwise, currently, the bug in transformers
will make things end up in an error:

```log
  File "torchdata/stateful_dataloader/worker.py", line 242, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[union-attr]
  File "torch/utils/data/_utils/fetch.py", line 52, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "verl/utils/dataset/rl_dataset.py", line 248, in __getitem__
    model_inputs = self.processor(text=[raw_prompt], images=images, videos=videos, return_tensors="pt")
  File "transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 150, in __call__
    image_inputs = self.image_processor(images=images, **output_kwargs["images_kwargs"])
  File "transformers/image_processing_utils_fast.py", line 637, in __call__
    return self.preprocess(images, *args, **kwargs)
  File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 151, in preprocess
    return super().preprocess(images, videos, **kwargs)
  File "transformers/image_processing_utils_fast.py", line 662, in preprocess
    return self._preprocess_image_like_inputs(
  File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 173, in _preprocess_image_like_inputs
    batch_feature = self._preprocess(images, **kwargs)
  File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 211, in _preprocess
    grouped_images, grouped_images_index = group_images_by_shape(images, disable_grouping=disable_grouping)
  File "transformers/image_transforms.py", line 917, in group_images_by_shape
    device = images[0][0].device if is_nested else images[0].device
IndexError: list index out of range
```

Signed-off-by: Hollow Man <[email protected]>
@HollowMan6
Copy link
Collaborator Author

To get mixed multi-modal with pure text data training working, aside from this PR, I also have to set data.return_multi_modal_inputs=False. Although I’m doing single-turn training. Otherwise we will have KeyError: 'pixel_values' for verl/workers/actor/dp_actor.py, line 107, in <listcomp> [inputs[key] for inputs in micro_batch["multi_modal_inputs"]], dim=0. I’m not quite sure about the proper way to fix this, as we can have different modal data within one micro_batch.

Besides, I also have to not using Ulysses sequence parallelism, as setting any value greater than 1 will cause the following error. I'm not sure about how to fix this as well:

  File "verl/single_controller/ray/base.py", line 701, in func
    return getattr(self.worker_dict[key], name)(*args, **kwargs)
  File "verl/single_controller/base/decorator.py", line 430, in inner
    return func(*args, **kwargs)
  File "verl/utils/profiler/profile.py", line 256, in wrapper
    return func(self_instance, *args, **kwargs_inner)
  File "verl/workers/fsdp_workers.py", line 809, in compute_log_prob
    output, entropys = self.actor.compute_log_prob(data=data, calculate_entropy=True)
  File "verl/utils/profiler/performance.py", line 105, in f
    return self.log(decorated_function, *args, **kwargs)
  File "verl/utils/profiler/performance.py", line 118, in log
    output = func(*args, **kwargs)
  File "verl/workers/actor/dp_actor.py", line 343, in compute_log_prob
    entropy, log_probs = self._forward_micro_batch(
  File "verl/workers/actor/dp_actor.py", line 177, in _forward_micro_batch
    output = self.actor_module(
  File "torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1857, in _call_impl
    return inner()
  File "torch/nn/modules/module.py", line 1805, in inner
    result = forward_call(*args, **kwargs)
  File "transformers/utils/generic.py", line 940, in wrapper
    output = func(self, *args, **kwargs)
  File "transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1485, in forward
    outputs = self.model(
  File "torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1313, in forward
    outputs = self.language_model(
  File "torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "verl/models/transformers/monkey_patch.py", line 194, in ulysses_wrapped_decoder_forward
    return original_forward(self, *args, **call_kwargs)
  File "verl/models/transformers/monkey_patch.py", line 194, in ulysses_wrapped_decoder_forward
    return original_forward(self, *args, **call_kwargs)
  File "transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 883, in forward
    "full_attention": create_causal_mask(**mask_kwargs),
  File "transformers/masking_utils.py", line 787, in create_causal_mask
    early_exit, attention_mask, packed_sequence_mask, kv_length, kv_offset = _preprocess_mask_arguments(
  File "transformers/masking_utils.py", line 738, in _preprocess_mask_arguments
    position_ids = position_ids.expand(batch_size, -1)
RuntimeError: The expanded size of the tensor (1) must match the existing size (0) at non-singleton dimension 0.  Target sizes: [1, -1].  Tensor sizes: [0, 64248]

Any suggestions to these 2 issues are welcome!

@gemini-code-assist
Copy link
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@HollowMan6
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug that occurs when processing datasets containing a mix of multi-modal and text-only data, specifically when an empty list is provided for images or videos. The changes correctly ensure that if image or video data is absent or an empty list, the corresponding variables are set to None before being passed to the processor. This fix is applied consistently in recipe/deepeyes/deepeyes.py and verl/utils/dataset/rl_dataset.py. The modifications are well-targeted, effectively resolving the IndexError and making the data loading process more robust. The code quality is good, and I have no further comments.

@vermouth1992 vermouth1992 merged commit 1422720 into volcengine:main Aug 31, 2025
53 of 58 checks passed
@HollowMan6 HollowMan6 deleted the multimodal branch August 31, 2025 09:58
hiyouga pushed a commit that referenced this pull request Sep 19, 2025
…3315)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Address the first issue in
#3281 (comment)

More work on top of #1999

Currently, the code gets the keys from the first row within the
microbatch, This can go wrong if the dataset is a mixture of pure-text
with multi-modal, where the first data in the microbatch is a pure-text
one (no `pixel_values` or `image_grid_thw` exists in the key), and the
microbatch still contains multi-modal data.

This PR fixes this issue by collecting all available keys for
`multi_modal_inputs` within the microbatch, and so that we can
concatenate those multi-modal tensors together without ignoring some of
them under the above situation.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <[email protected]>
VocabVictor pushed a commit to VocabVictor/verl-plus that referenced this pull request Sep 24, 2025
…(#3315)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Address the first issue in
volcengine/verl#3281 (comment)

More work on top of volcengine/verl#1999

Currently, the code gets the keys from the first row within the
microbatch, This can go wrong if the dataset is a mixture of pure-text
with multi-modal, where the first data in the microbatch is a pure-text
one (no `pixel_values` or `image_grid_thw` exists in the key), and the
microbatch still contains multi-modal data.

This PR fixes this issue by collecting all available keys for
`multi_modal_inputs` within the microbatch, and so that we can
concatenate those multi-modal tensors together without ignoring some of
them under the above situation.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <[email protected]>
wuxibin89 pushed a commit that referenced this pull request Sep 25, 2025
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Address 2nd issue in
#3281 (comment)

Currently, if we use ulysses sp, we rely on `multi_modal_inputs` to
check if it's a multi-modal model, but this can go wrong when we set
`data.return_multi_modal_inputs=False`, as that field won't exist even
if it's the VLM model.

As a result, it would be a reliable way to check by seeing if
`vision_config` field is in `self.actor_module.config` referring to
https://github.com/volcengine/verl/blob/1985eb14ff3e28985087804f939b5ea3a26e32cf/verl/workers/fsdp_workers.py#L317-L320

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <[email protected]>
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
…olcengine#3315)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Address the first issue in
volcengine#3281 (comment)

More work on top of volcengine#1999

Currently, the code gets the keys from the first row within the
microbatch, This can go wrong if the dataset is a mixture of pure-text
with multi-modal, where the first data in the microbatch is a pure-text
one (no `pixel_values` or `image_grid_thw` exists in the key), and the
microbatch still contains multi-modal data.

This PR fixes this issue by collecting all available keys for
`multi_modal_inputs` within the microbatch, and so that we can
concatenate those multi-modal tensors together without ignoring some of
them under the above situation.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <[email protected]>
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
…3282)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Address 2nd issue in
volcengine#3281 (comment)

Currently, if we use ulysses sp, we rely on `multi_modal_inputs` to
check if it's a multi-modal model, but this can go wrong when we set
`data.return_multi_modal_inputs=False`, as that field won't exist even
if it's the VLM model.

As a result, it would be a reliable way to check by seeing if
`vision_config` field is in `self.actor_module.config` referring to
https://github.com/volcengine/verl/blob/1985eb14ff3e28985087804f939b5ea3a26e32cf/verl/workers/fsdp_workers.py#L317-L320

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <[email protected]>
techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
…olcengine#3315)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Address the first issue in
volcengine#3281 (comment)

More work on top of volcengine#1999

Currently, the code gets the keys from the first row within the
microbatch, This can go wrong if the dataset is a mixture of pure-text
with multi-modal, where the first data in the microbatch is a pure-text
one (no `pixel_values` or `image_grid_thw` exists in the key), and the
microbatch still contains multi-modal data.

This PR fixes this issue by collecting all available keys for
`multi_modal_inputs` within the microbatch, and so that we can
concatenate those multi-modal tensors together without ignoring some of
them under the above situation.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <[email protected]>
techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
…3282)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Address 2nd issue in
volcengine#3281 (comment)

Currently, if we use ulysses sp, we rely on `multi_modal_inputs` to
check if it's a multi-modal model, but this can go wrong when we set
`data.return_multi_modal_inputs=False`, as that field won't exist even
if it's the VLM model.

As a result, it would be a reliable way to check by seeing if
`vision_config` field is in `self.actor_module.config` referring to
https://github.com/volcengine/verl/blob/1985eb14ff3e28985087804f939b5ea3a26e32cf/verl/workers/fsdp_workers.py#L317-L320

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <[email protected]>
mtian8 pushed a commit to mtian8/verl that referenced this pull request Nov 1, 2025
…olcengine#3315)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Address the first issue in
volcengine#3281 (comment)

More work on top of volcengine#1999

Currently, the code gets the keys from the first row within the
microbatch, This can go wrong if the dataset is a mixture of pure-text
with multi-modal, where the first data in the microbatch is a pure-text
one (no `pixel_values` or `image_grid_thw` exists in the key), and the
microbatch still contains multi-modal data.

This PR fixes this issue by collecting all available keys for
`multi_modal_inputs` within the microbatch, and so that we can
concatenate those multi-modal tensors together without ignoring some of
them under the above situation.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <[email protected]>
mtian8 pushed a commit to mtian8/verl that referenced this pull request Nov 1, 2025
…3282)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Address 2nd issue in
volcengine#3281 (comment)

Currently, if we use ulysses sp, we rely on `multi_modal_inputs` to
check if it's a multi-modal model, but this can go wrong when we set
`data.return_multi_modal_inputs=False`, as that field won't exist even
if it's the VLM model.

As a result, it would be a reliable way to check by seeing if
`vision_config` field is in `self.actor_module.config` referring to
https://github.com/volcengine/verl/blob/1985eb14ff3e28985087804f939b5ea3a26e32cf/verl/workers/fsdp_workers.py#L317-L320

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <[email protected]>
wangboxiong320 pushed a commit to wangboxiong320/verl that referenced this pull request Nov 1, 2025
…3282)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Address 2nd issue in
volcengine#3281 (comment)

Currently, if we use ulysses sp, we rely on `multi_modal_inputs` to
check if it's a multi-modal model, but this can go wrong when we set
`data.return_multi_modal_inputs=False`, as that field won't exist even
if it's the VLM model.

As a result, it would be a reliable way to check by seeing if
`vision_config` field is in `self.actor_module.config` referring to
https://github.com/volcengine/verl/blob/df61d820212db3f9229db7cc0fb2b8a091f15c99/verl/workers/fsdp_workers.py#L317-L320

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants