-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[Bugfix] Fix max image size for PaddleOCR-VL #28442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Roger Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix a bug in dummy data generation for PaddleOCR-VL to correctly calculate the maximum image embedding size. The changes involve removing a redundant if condition for resizing and updating the logic in get_image_size_with_most_features.
While the intention is correct, I've found a flaw in the new logic for get_image_size_with_most_features. The current implementation is brittle and only works correctly if the maximum number of tokens is a perfect square. I've provided a critical review comment with a more robust implementation to ensure it works for all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: Roger Wang <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a bug in calculating the maximum image size for PaddleOCR-VL models, which is important for memory profiling. The change correctly computes the image dimensions that yield the maximum number of features based on max_pixels. Additionally, it cleans up some redundant code related to image resizing. The core logic of the fix is sound, but I've identified a potential edge case that could lead to a crash.
Signed-off-by: Roger Wang <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Signed-off-by: Roger Wang <[email protected]> (cherry picked from commit 4fd4b74)
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Xingyu Liu <[email protected]>
Purpose
This PR fixes the bug in dummy data generation for the image data that would result in the max number of features/embeddings as well as cleaning up some code since it's always assumed that resizing is turned on.
Max image embedding size is 196.
Max image embedding size is 3600.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.