[New Model]Donut model #23229

princepride · 2025-08-20T02:55:01Z

Purpose

FIX #18850.
This PR adds support for Donut-like models, resolving issue #18850.

It implements the Donut model and the structurally similar Dolphin model. Since the Donut model uses a Swin Transformer as its vision backbone, this PR also includes the implementation for the Swin model.

Test Plan

Donut Model Test

Script

python examples/offline_inference/encoder_decoder_multimodal.py -m donut

Result

Dolphin Model Tests

Script

python examples/offline_inference/dolphin.py

Result

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

gemini-code-assist

Code Review

This pull request introduces support for Donut-like models, including Donut, Dolphin, and the Swin Transformer backbone. The changes are extensive, adding new model implementations and example inference scripts. My review focuses on correctness and potential bugs. I've identified a critical bug in the Donut model implementation that would cause a TypeError during image validation, and a high-severity issue in the Dolphin example script that could lead to a ZeroDivisionError. I've provided code suggestions to fix both issues.

vllm/model_executor/models/donut.py

examples/offline_inference/dolphin.py

princepride · 2025-08-20T02:58:08Z

@DarkLight1337 @Isotr0py Could you review it? I messed up the previous PR (#23187) during a rebase, so I've created a new one. Thank you.

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

github-actions · 2025-08-20T03:28:06Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

docs/models/supported_models.md

tests/models/registry.py

vllm/model_executor/models/registry.py

vllm/model_executor/models/swin.py

vllm/model_executor/models/donut.py

vllm/model_executor/models/swin.py

Isotr0py · 2025-08-20T05:14:21Z

examples/offline_inference/dolphin.py

Let's add examples in examples/offline_inference/encoder_decoder_multimodal.py and examples/offline_inference/vision_language.py instead of adding new files in examples/offline_inference/.

I was looking at the images in the S3 bucket and it seems there aren't any suitable for OCR tasks. This is particularly true for the Dolphin model, whose OCR task is similar to executing a workflow. Different prompts will determine whether to segment or parse the document. At the same time, depending on the parsing tags, it will decide whether to parse text or icons. That's why I've added two example files.

I used fetch_image to load the image and moved donut to encoder_decoder_multimodal.py

I also removed the --task in dolphin

It looks like the independent dolphin example still got merged. We don't really want to have model specific examples as that will clutter the examples and make it harder for new users to find what they need

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

vllm/model_executor/models/swin.py

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

princepride · 2025-08-22T03:09:15Z

@Isotr0py @DarkLight1337 I adjust the example code and update the task plan, please review it. thank you.

tests/models/multimodal/processing/test_common.py

tests/models/registry.py

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

DarkLight1337 · 2025-08-24T10:38:21Z

Thanks!

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

xxx

954427a

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

princepride requested review from DarkLight1337, WoosukKwon, alexm-redhat, comaniac, hmellor, njhill, robertgshaw2-redhat, youkaichao, ywang96 and zhuohan123 as code owners August 20, 2025 02:55

mergify bot added documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) new-model Requests to new models v1 labels Aug 20, 2025

gemini-code-assist bot reviewed Aug 20, 2025

View reviewed changes

vllm/model_executor/models/donut.py Outdated Show resolved Hide resolved

examples/offline_inference/dolphin.py Show resolved Hide resolved

xxx

61dd593

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

DarkLight1337 reviewed Aug 20, 2025

View reviewed changes

Isotr0py reviewed Aug 20, 2025

View reviewed changes

princepride added 2 commits August 20, 2025 14:23

xxx

9f6602f

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

xxx

1fd5d8a

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

DarkLight1337 reviewed Aug 20, 2025

View reviewed changes

vllm/model_executor/models/swin.py Outdated Show resolved Hide resolved

princepride added 3 commits August 21, 2025 10:53

xxx

7da83d7

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

xxx

7818022

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

xxx

ec1d907

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

princepride requested review from DarkLight1337 and Isotr0py August 22, 2025 03:07

DarkLight1337 reviewed Aug 24, 2025

View reviewed changes

tests/models/multimodal/processing/test_common.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 24, 2025

View reviewed changes

tests/models/multimodal/processing/test_common.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 24, 2025

View reviewed changes

tests/models/registry.py Outdated Show resolved Hide resolved

keep the placement in alphabetical order

9f4627a

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

DarkLight1337 approved these changes Aug 24, 2025

View reviewed changes

princepride requested a review from DarkLight1337 August 24, 2025 10:38

DarkLight1337 enabled auto-merge (squash) August 24, 2025 10:38

DarkLight1337 merged commit 416f059 into vllm-project:main Aug 24, 2025
42 checks passed

johnnynunez pushed a commit to johnnynunez/vllm that referenced this pull request Aug 24, 2025

[New Model]Donut model (vllm-project#23229)

bdf3c3d

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[New Model]Donut model (vllm-project#23229)

b5a93f9

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[New Model]Donut model (vllm-project#23229)

2a0643a

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[New Model]Donut model (vllm-project#23229)

aee6b56

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

[New Model]Donut model (vllm-project#23229)

8056e25

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025

[New Model]Donut model (vllm-project#23229)

6035369

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

ekagra-ranjan pushed a commit to ekagra-ranjan/vllm that referenced this pull request Sep 4, 2025

[New Model]Donut model (vllm-project#23229)

9154599

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

mfournioux mentioned this pull request Sep 23, 2025

Remove V0 Encoder-Decoder Support #24907

Merged

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[New Model]Donut model (vllm-project#23229)

cb3ab13

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Uh oh!

[New Model]Donut model #23229

[New Model]Donut model #23229

Uh oh!

Conversation

princepride commented Aug 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Donut Model Test

Dolphin Model Tests

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

princepride commented Aug 20, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Isotr0py Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

princepride Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

princepride Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

princepride Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

princepride commented Aug 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Aug 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

princepride commented Aug 20, 2025 •

edited by github-actions bot

Loading