fix: Passing empty list to `images`/`videos` for some multi-modal models #40569

HollowMan6 · 2025-08-30T22:56:38Z

What does this PR do?

Alternative root fix for volcengine/verl#3281

For example, this PR fixes the following error when we pass an empty list of images to Qwen2.5-VL:

  File "torchdata/stateful_dataloader/worker.py", line 242, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[union-attr]
  File "torch/utils/data/_utils/fetch.py", line 52, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "verl/utils/dataset/rl_dataset.py", line 248, in __getitem__
    model_inputs = self.processor(text=[raw_prompt], images=images, videos=videos, return_tensors="pt")
  File "transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 150, in __call__
    image_inputs = self.image_processor(images=images, **output_kwargs["images_kwargs"])
  File "transformers/image_processing_utils_fast.py", line 637, in __call__
    return self.preprocess(images, *args, **kwargs)
  File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 151, in preprocess
    return super().preprocess(images, videos, **kwargs)
  File "transformers/image_processing_utils_fast.py", line 662, in preprocess
    return self._preprocess_image_like_inputs(
  File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 173, in _preprocess_image_like_inputs
    batch_feature = self._preprocess(images, **kwargs)
  File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 211, in _preprocess
    grouped_images, grouped_images_index = group_images_by_shape(images, disable_grouping=disable_grouping)
  File "transformers/image_transforms.py", line 917, in group_images_by_shape
    device = images[0][0].device if is_nested else images[0].device
IndexError: list index out of range

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker, @amyeroberts, @qubvel

zucchini-nlp

Thanks for the PR @doubao2021-ai !

We had another one in progress here (#36682) which is more complete and fixes all models. I'd prefer that one to be merged

HollowMan6 · 2025-09-02T11:22:48Z

Hi @zucchini-nlp, thanks for your reply! I think #36682 doesn't consider the situation when there's no image at all and batch size is 1, i.e., images = []. Feel free to take over this PR as I don't have the bandwidth to fix all models. Thanks in advance!

zucchini-nlp · 2025-09-02T11:27:36Z

@HollowMan6 if there are no images at all, it is recommended to simply pass images=None which is the default for no input

HollowMan6 · 2025-09-02T11:29:52Z

Yeah, that's what I had proposed in volcengine/verl#3281, but maybe it can be good as well to enforce some checks at the transformers library side.

For example, this PR fixes the following error when we pass an empty list of `images` to Qwen2.5-VL: ```log File "torchdata/stateful_dataloader/worker.py", line 242, in _worker_loop data = fetcher.fetch(index) # type: ignore[union-attr] File "torch/utils/data/_utils/fetch.py", line 52, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "torch/utils/data/_utils/fetch.py", line 52, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "verl/utils/dataset/rl_dataset.py", line 248, in __getitem__ model_inputs = self.processor(text=[raw_prompt], images=images, videos=videos, return_tensors="pt") File "transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 150, in __call__ image_inputs = self.image_processor(images=images, **output_kwargs["images_kwargs"]) File "transformers/image_processing_utils_fast.py", line 637, in __call__ return self.preprocess(images, *args, **kwargs) File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 151, in preprocess return super().preprocess(images, videos, **kwargs) File "transformers/image_processing_utils_fast.py", line 662, in preprocess return self._preprocess_image_like_inputs( File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 173, in _preprocess_image_like_inputs batch_feature = self._preprocess(images, **kwargs) File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 211, in _preprocess grouped_images, grouped_images_index = group_images_by_shape(images, disable_grouping=disable_grouping) File "transformers/image_transforms.py", line 917, in group_images_by_shape device = images[0][0].device if is_nested else images[0].device IndexError: list index out of range ``` Signed-off-by: Hollow Man <[email protected]>

github-actions · 2025-10-12T21:59:38Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: glm4v, llama4, qwen2_vl, qwen3_vl

zucchini-nlp · 2025-10-13T08:35:45Z

Hey @HollowMan6 , the feature was added in transformers already in one of the old PRs. I think you can close it now. Passing empty lists for images should work images = [[im1, im2], [], [im3]]

HollowMan6 · 2025-10-13T08:41:12Z

Oh okay, good to know! Thanks!

HollowMan6 force-pushed the noimage branch from f2cb477 to 10d3f6a Compare August 30, 2025 22:59

doubao2021-ai approved these changes Aug 31, 2025

View reviewed changes

zucchini-nlp reviewed Sep 1, 2025

View reviewed changes

HollowMan6 force-pushed the noimage branch from 10d3f6a to 4006316 Compare September 2, 2025 11:08

HollowMan6 force-pushed the noimage branch from 4006316 to 3905826 Compare October 12, 2025 21:58

HollowMan6 closed this Oct 13, 2025

HollowMan6 deleted the noimage branch October 13, 2025 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Passing empty list to `images`/`videos` for some multi-modal models #40569

fix: Passing empty list to `images`/`videos` for some multi-modal models #40569

Uh oh!

HollowMan6 commented Aug 30, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

HollowMan6 commented Sep 2, 2025 •

edited

Loading

Uh oh!

zucchini-nlp commented Sep 2, 2025

Uh oh!

HollowMan6 commented Sep 2, 2025

Uh oh!

github-actions bot commented Oct 12, 2025

Uh oh!

zucchini-nlp commented Oct 13, 2025 •

edited

Loading

Uh oh!

HollowMan6 commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Passing empty list to images/videos for some multi-modal models #40569

fix: Passing empty list to images/videos for some multi-modal models #40569

Uh oh!

Conversation

HollowMan6 commented Aug 30, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

HollowMan6 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Sep 2, 2025

Uh oh!

HollowMan6 commented Sep 2, 2025

Uh oh!

github-actions bot commented Oct 12, 2025

Uh oh!

zucchini-nlp commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HollowMan6 commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Passing empty list to `images`/`videos` for some multi-modal models #40569

fix: Passing empty list to `images`/`videos` for some multi-modal models #40569

HollowMan6 commented Sep 2, 2025 •

edited

Loading

zucchini-nlp commented Oct 13, 2025 •

edited

Loading