[Bugfix] Fix max image size for PaddleOCR-VL #28442

ywang96 · 2025-11-11T03:57:55Z

Purpose

This PR fixes the bug in dummy data generation for the image data that would result in the max number of features/embeddings as well as cleaning up some code since it's always assumed that resizing is turned on.

On main

Encoder cache will be initialized with a budget of 2048 tokens, and profiled with 10 image items of the maximum feature size.

Max image embedding size is 196.

This branch

Encoder cache will be initialized with a budget of 3600 tokens, and profiled with 1 image items of the maximum feature size.

Max image embedding size is 3600.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roger Wang <[email protected]>

gemini-code-assist

Code Review

This pull request aims to fix a bug in dummy data generation for PaddleOCR-VL to correctly calculate the maximum image embedding size. The changes involve removing a redundant if condition for resizing and updating the logic in get_image_size_with_most_features.

While the intention is correct, I've found a flaw in the new logic for get_image_size_with_most_features. The current implementation is brittle and only works correctly if the maximum number of tokens is a perfect square. I've provided a critical review comment with a more robust implementation to ensure it works for all cases.

vllm/model_executor/models/paddleocr_vl.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

vllm/model_executor/models/paddleocr_vl.py

Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2025-11-11T04:11:45Z

/gemini review

gemini-code-assist

Code Review

This pull request fixes a bug in calculating the maximum image size for PaddleOCR-VL models, which is important for memory profiling. The change correctly computes the image dimensions that yield the maximum number of features based on max_pixels. Additionally, it cleans up some redundant code related to image resizing. The core logic of the fix is sound, but I've identified a potential edge case that could lead to a crash.

vllm/model_executor/models/paddleocr_vl.py

Signed-off-by: Roger Wang <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: Roger Wang <[email protected]> (cherry picked from commit 4fd4b74)

Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Xingyu Liu <[email protected]>

ywang96 added 2 commits November 11, 2025 03:53

update

bcc885b

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into fix-paddle

8cebc72

gemini-code-assist bot reviewed Nov 11, 2025

View reviewed changes

vllm/model_executor/models/paddleocr_vl.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 11, 2025

View reviewed changes

vllm/model_executor/models/paddleocr_vl.py Outdated Show resolved Hide resolved

refine

aae63f5

Signed-off-by: Roger Wang <[email protected]>

gemini-code-assist bot reviewed Nov 11, 2025

View reviewed changes

vllm/model_executor/models/paddleocr_vl.py Show resolved Hide resolved

DarkLight1337 approved these changes Nov 11, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 11, 2025 05:02

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 11, 2025

DarkLight1337 merged commit 4fd4b74 into vllm-project:main Nov 11, 2025
54 checks passed

ywang96 added this to the v0.11.1 milestone Nov 13, 2025

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025

[Bugfix] Fix max image size for PaddleOCR-VL (vllm-project#28442)

66066b6

Signed-off-by: Roger Wang <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

khluu pushed a commit that referenced this pull request Nov 16, 2025

[Bugfix] Fix max image size for PaddleOCR-VL (#28442)

3a9ea77

Signed-off-by: Roger Wang <[email protected]> (cherry picked from commit 4fd4b74)

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Bugfix] Fix max image size for PaddleOCR-VL (vllm-project#28442)

bad665b

Signed-off-by: Roger Wang <[email protected]>

charlotte12l pushed a commit to charlotte12l/vllm that referenced this pull request Dec 5, 2025

[Bugfix] Fix max image size for PaddleOCR-VL (vllm-project#28442)

658d714

Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Xingyu Liu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix max image size for PaddleOCR-VL #28442

[Bugfix] Fix max image size for PaddleOCR-VL #28442

Uh oh!

ywang96 commented Nov 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

ywang96 commented Nov 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bugfix] Fix max image size for PaddleOCR-VL #28442

[Bugfix] Fix max image size for PaddleOCR-VL #28442

Uh oh!

Conversation

ywang96 commented Nov 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ywang96 commented Nov 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ywang96 commented Nov 11, 2025 •

edited by github-actions bot

Loading