-
Notifications
You must be signed in to change notification settings - Fork 31.1k
fix: Passing empty list to images/videos for some multi-modal models
#40569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
zucchini-nlp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @doubao2021-ai !
We had another one in progress here (#36682) which is more complete and fixes all models. I'd prefer that one to be merged
|
Hi @zucchini-nlp, thanks for your reply! I think #36682 doesn't consider the situation when there's no image at all and batch size is 1, i.e., |
|
@HollowMan6 if there are no images at all, it is recommended to simply pass |
|
Yeah, that's what I had proposed in volcengine/verl#3281, but maybe it can be good as well to enforce some checks at the transformers library side. |
For example, this PR fixes the following error when we pass an empty list
of `images` to Qwen2.5-VL:
```log
File "torchdata/stateful_dataloader/worker.py", line 242, in _worker_loop
data = fetcher.fetch(index) # type: ignore[union-attr]
File "torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "verl/utils/dataset/rl_dataset.py", line 248, in __getitem__
model_inputs = self.processor(text=[raw_prompt], images=images, videos=videos, return_tensors="pt")
File "transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 150, in __call__
image_inputs = self.image_processor(images=images, **output_kwargs["images_kwargs"])
File "transformers/image_processing_utils_fast.py", line 637, in __call__
return self.preprocess(images, *args, **kwargs)
File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 151, in preprocess
return super().preprocess(images, videos, **kwargs)
File "transformers/image_processing_utils_fast.py", line 662, in preprocess
return self._preprocess_image_like_inputs(
File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 173, in _preprocess_image_like_inputs
batch_feature = self._preprocess(images, **kwargs)
File "transformers/models/qwen2_vl/image_processing_qwen2_vl_fast.py", line 211, in _preprocess
grouped_images, grouped_images_index = group_images_by_shape(images, disable_grouping=disable_grouping)
File "transformers/image_transforms.py", line 917, in group_images_by_shape
device = images[0][0].device if is_nested else images[0].device
IndexError: list index out of range
```
Signed-off-by: Hollow Man <[email protected]>
|
[For maintainers] Suggested jobs to run (before merge) run-slow: glm4v, llama4, qwen2_vl, qwen3_vl |
|
Hey @HollowMan6 , the feature was added in transformers already in one of the old PRs. I think you can close it now. Passing empty lists for images should work |
|
Oh okay, good to know! Thanks! |
What does this PR do?
Alternative root fix for volcengine/verl#3281
For example, this PR fixes the following error when we pass an empty list of
imagesto Qwen2.5-VL:Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker, @amyeroberts, @qubvel