-
Notifications
You must be signed in to change notification settings - Fork 31k
Fix max_pixels/min_pixels ignored in Qwen2VLImageProcessorFast.from_pretrained
#41954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix max_pixels/min_pixels ignored in Qwen2VLImageProcessorFast.from_pretrained
#41954
Conversation
…retrained() When loading Qwen2VL image processor with from_pretrained(max_pixels=X), the parameter was stored as an attribute but not synchronized with the size dict that controls actual image resizing behavior. This caused images to be resized using default max_pixels=16,777,216 instead of the user's specified value, resulting in excessive token counts. This fix implements max_pixels and min_pixels as properties that automatically synchronize with the size dict when set via setattr() during from_pretrained() loading. The properties ensure size['longest_edge'] and size['shortest_edge'] stay in sync with max_pixels and min_pixels attributes respectively. The property-based approach is safe and localized to Qwen2VL, avoiding potential impacts on other image processors.
|
Hi everyone, if anyone is reviewing this PR or wants to review it, do not hesitate to tell me if there's anything I can do to get the rest of the tests to go through (first time contributing to HF but I made sure that the bug exists and is solved with my PR). |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: qwen2_vl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @ReinforcedKnowledge ! Thanks a lot for raising this issue, I was able to reproduce it, but the issue actually comes from bad logic in from_pretrained, and can affect other image processors, so I opened another PR to fix it here #41997.
Thanks again for flagging this!
|
@yonigozlan Thank you for your response! I wanted to update the |
|
Sorry I missed that you had already pointed out the issue with |
|
It's okay! Don't hesitate to close this PR when you merge yours if needed |
|
Closing this as the fix PR is merged :) |
TL;DR
When loading Qwen2VL image processor with
from_pretrained(max_pixels=X), the parameter is stored as an attribute but not synchronized with the size dict that controls actual image resizing behavior. This causes images to be resized using defaultmax_pixels=16,777,216instead of the user's specified value, resulting in excessive token counts.This fix implements
max_pixelsandmin_pixelsas properties that automatically synchronize with the size dict when set viasetattr()duringfrom_pretrained()loading. The properties ensuresize['longest_edge']andsize['shortest_edge']stay in sync withmax_pixelsandmin_pixelsattributes respectively.The property-based approach is safe and localized to
Qwen2VL, avoiding potential impacts on other image processors but this issue might concern other processors and I'm not totally fan of the property lookup.We could argue there is no need for two sources of truth for the same thing (
size['longest_edge']andmax_pixels) but we can also argue that the base class should / might be more clever about these user kwargs that don't synchronize with processor configs around their__init__logic.Since it's my first PR / issue to
transformersI thought I'd give a more detailed explanation:Issue
When loading Qwen2VL/Qwen3VL image processor with:
The expected behavior:
processor.max_pixels = 200,000processor.size['longest_edge'] = 200,000That way the images get resized to ~200k pixels giving us around ~195 tokens.
Actual behavior (before fix):
processor.max_pixels = 200,000which is correctprocessor.size['longest_edge'] = 16,777,216which is incorrect (loaded from config)Root cause analysis
The Qwen2VL processor uses dual representation for pixel constraints:
self.max_pixels: an attributeself.size['longest_edge']: the actual value used during image processingThe
__init__method is designed to keep these synchronized:However, when loading via
from_pretrained()we lose this synchronization because:ImageProcessingMixin.from_dict()calls[cls(**config_dict)](https://github.com/huggingface/transformers/blob/02c324f43fe0ef5d484e846417e5f3bf4484524c/src/transformers/image_processing_base.py#L374)withoutmax_pixelsmax_pixels=None)from_dict()setsmax_pixelsviasetattr()which is too late, just sets the value to the attributeSo as a result:
max_pixelsis stored correctly butsize['longest_edge']is never updatedSome other fixes IMHO
The "obvious" fix would be to add special handling in
image_processing_base.py:This would forward
max_pixelsto__init__()before the synchronization logic runs. This would allow us not to track every past and future processor that has similar logic as Qwen2VL's image processor. But the issue is that:__init__(). If a processor's__init__doesn't have**kwargs, it would raise aTypeErrorI guess and we can't guarantee all processors handle this parameter.But Qwen2VL's config doesn't have
max_pixelsin the dict, so this pattern wouldn't help.That's why I went with the properties instead, minimal scope,
max_pixels(ormin_pixels) andsize['longest_edge'](orsize['shortest_edge']) stay synchronized wether the class is instantiated directly, or withfrom_pretrainedand wether the user down the line changes the attribute directly or not.This PR implements a safer, processor-specific fix using Python
Future Considerations
While this PR fixes the issue for Qwen2VL/Qwen3VL, the underlying issue in
from_dict()could affect other processors with similar synchronization patterns.Related issue
#41955