-
-
Notifications
You must be signed in to change notification settings - Fork 12.9k
[New Model]Donut model #23229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Model]Donut model #23229
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for Donut-like models, including Donut, Dolphin, and the Swin Transformer backbone. The changes are extensive, adding new model implementations and example inference scripts. My review focuses on correctness and potential bugs. I've identified a critical bug in the Donut model implementation that would cause a TypeError during image validation, and a high-severity issue in the Dolphin example script that could lead to a ZeroDivisionError. I've provided code suggestions to fix both issues.
|
@DarkLight1337 @Isotr0py Could you review it? I messed up the previous PR (#23187) during a rebase, so I've created a new one. Thank you. |
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add examples in examples/offline_inference/encoder_decoder_multimodal.py and examples/offline_inference/vision_language.py instead of adding new files in examples/offline_inference/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the images in the S3 bucket and it seems there aren't any suitable for OCR tasks. This is particularly true for the Dolphin model, whose OCR task is similar to executing a workflow. Different prompts will determine whether to segment or parse the document. At the same time, depending on the parsing tags, it will decide whether to parse text or icons. That's why I've added two example files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used fetch_image to load the image and moved donut to encoder_decoder_multimodal.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also removed the --task in dolphin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the independent dolphin example still got merged. We don't really want to have model specific examples as that will clutter the examples and make it harder for new users to find what they need
|
@Isotr0py @DarkLight1337 I adjust the example code and update the task plan, please review it. thank you. |
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
Thanks! |
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Purpose
FIX #18850.
This PR adds support for Donut-like models, resolving issue #18850.
It implements the Donut model and the structurally similar Dolphin model. Since the Donut model uses a Swin Transformer as its vision backbone, this PR also includes the implementation for the Swin model.
Test Plan
Donut Model Test
Script
Result
Dolphin Model Tests
Script
Result