Add support for DeepseekAI's DeepseekVL #36248

geetu040 · 2025-02-18T07:41:43Z

What does this PR do?

This PR adds DeepseekAI's DeepseekVL model to Hugging Face Transformers.

DeepseekVL is an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.

Relevant Links

Research Paper: DeepSeek-VL: Towards Real-World Vision-Language Understanding
Authors: Haoyu Lu, Wen Liu, Bo Zhang, et al.
Implementation: github.com/deepseek-ai/DeepSeek-VL
Models Weights: huggingface.co/collections/deepseek-ai/deepseek-vl

CC: @Benjamin-eecs, @RERV (github contributors of deepseek-ai/DeepSeek-VL)

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker, @Rocketknight1, @Cyrilvallez, @zucchini-nlp

TODOs

geetu040 · 2025-02-24T04:51:00Z

@zucchini-nlp , @Rocketknight1, @Cyrilvallez

The Deepseek-VL uses Sam as backbone for encoding high-resolution images.
And to be more specific, the backbone is SamVisionEncoder instead of SamModel, which is not available as a public class. By which I mean that, you can do following with SamModel but not with SamVisionEncoder

from transformers import SamConfig, SamModel
config = SamConfig()
model = SamModel(config)

I think that we should rename SamVisionEncoder -> SamVisionModel, inherit it from SamPreTrainedModel and make it accessible to the user. I don't think it breaks backward compatibility in any way.

Otherwise, we would have to copy all the classes that build SamVisionEncoder for deepseek. There is nothing wrong with this either but having a SamVisionModel along with a SamModel makes sense, since it might benefit someone else as well.

If you think having a SamVisionModel makes sense, should that be done in a separate PR?

Btw, final results would look like this

from transformers import SamVisionConfig, SamVisionModel
config = SamVisionConfig()
model = SamVisionModel(config)

and SamVisionConfig is already available publically.

zucchini-nlp · 2025-02-24T08:26:22Z

@geetu040 we had similar situation with ideficsVision afair. Yes, in that case, we can just make it public and add in the docs. Renaming though would be breaking, imo we can leave name as is

geetu040 · 2025-02-25T05:25:04Z

@zucchini-nlp is it okay to do it in the same PR? or should I create a new one

zucchini-nlp · 2025-02-25T08:31:51Z

@geetu040 imo a new PR will make it easier for us to iterate and review

geetu040 · 2025-02-26T09:41:51Z

Hi @zucchini-nlp, I am working on the SamVisionEncoder (going to create the PR soon) and I have a quick question.
I realized that SamVisionAttention and SamVisionSdpaAttention produce attn_weights of different shapes when output_attentions=True.

Can you please answer these 2 questions:

Is this allowed in transformers for the 2 attentions to produce outputs of different shapes?
And lets suppose we do something that changes the shape of output_attentions, does that break backward compatibility?

zucchini-nlp · 2025-02-26T10:07:36Z

@geetu040 no, that is not expected to have different shapes. Usually using sdpa attention means that no attn_weights are returned, so it should be available only through 'eager' attention modules

I see that the weights are calculated on top of SDPA by manual matmul of key and query, which imo defeats the purpose of using SDPA in the first place. Can you remove the returned attention and raise warning similar to what is done in ViT?

geetu040 · 2025-02-26T10:08:44Z

@zucchini-nlp sure I'll do that.

src/transformers/models/deepseek_vl/modeling_deepseek_vl.py

zucchini-nlp · 2025-07-18T07:21:05Z

Left a few tiny comments 😄

geetu040 · 2025-07-18T12:59:54Z

@ArthurZucker @zucchini-nlp

Thanks for the reviews! I've addressed all the feedback, this should be ready for another look.

For checkpoints, see this: #36248-issuecomment

ArthurZucker · 2025-07-21T10:01:09Z

On it!

This reverts commit db625d0.

github-actions · 2025-07-22T10:01:42Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, deepseek_vl, deepseek_vl_hybrid, janus

ArthurZucker

Let's go thanks for the hardwork and good PR! 🚀

* upload initial code * update deepseek-vl adaptor * update hierarchy of vision model classes * udpate aligner model * add text model * Added Image Processor * Added Image Processor * Added Image Processor * apply masks * remove projection; add aligner * remove interpolate_pos_encoding * remove unused params in config * cleaning * Add the __init__ file * added processing deepseek_vl class * modified the deepseek-vl processor * modified the deepseek-vl processor * update __init__ * Update the image processor class name * Added Deepseek to src/transformers/__init__.py file * Added Deepseek to image_processing_auto.py * update the __init__ file * update deepseek_vl image processor * Update Deepseek Processor * upload fast image processor * Revert "upload fast image processor" This reverts commit 68c8fd5. * update image processor * flatten heirarchy * remove DeepseekVLModel * major update (complete modeling) * auto modeling and other files * formatting * fix quality * replace torchvision in modeling * set default do_normalize to False * add fast image processor template using tool * update image processors * add fast image processor to other files * update liscense * Added deepseek image testcases * update image test * update processor * write CHAT_TEMPLATE * update model for processor * fix processor * minor fixes and formatting * fix image processing and tests * fix interpolation in sam * fix output_attentions in DeepseekVLModel * upload test_modeling * fix tests because of vocab size * set use_high_res_vision=False in tests * fix all modeling tests * fix styling * remove explicit background_color from image processors * added test_processor * added test_processor * fix processor tests * update docs * update docs * update docs * update conversion script * Fixed typos * minor fixes from review - remove model_id comments in examples - remove from pre-trained auto mapping - move to image-text-to-text from vision-to-seq in auto mapping - add image_token_index to __init__ for config - remove outdated temporary config in conversion script - update example to use chat_template in docstring example - update liscense 2021->2025 * fix type in config docstring Co-authored-by: Raushan Turganbay <[email protected]> * update get_image_features * fix config * improve DeepseekVLImageProcessor.preprocess * return image_hidden_states * use AutoTokenizer and AutoImageProcessor in Processor * fix model outputs * make num_image_tokens configurable * fix docstring of processor * move system prompt to chat template * fix repo consistency * fix return_dict * replace SamVisionEncoder with SamVisionModel * update to remove deepcopy * 🛠️ Major Architectural Changes (Adds DeepseekVLHybrid) * fix quality checks * add missing hybrid in auto modeling * run make style * update sam_hq * update high_res_size in test * update docs following huggingface#36979 * update code with auto_docstring * update conversion scripts * fix style * fix failing test because of tuple * set weights_only=True in conversion script * use safetensors.torch.load_file instead of torch.load in conversion script * make output_dir optional in conversion script * fix code snippets in docs (now the examples work fine) * integration tests for DeepseekVL * update expected texts * make style * integration tests for DeepseekVLHybrid * fix class name * update expected texts for hybrid * run "make style" * update since changes in main * run make-style * nits since changes in main * undo changes in sam * fix tests * fix tests; update with main * update with main: output_attention/output_hidden_states * fix copied part in deepseek_vl * run fix-copies * fix output_hidden_states * sam: fix _init_weigths * use modular for DeepseekVL * make image processor more modular * modular: use JanusPreTrainedModel * janus: provide kwargs in loss * update processors in conversion script * Revert "sam: fix _init_weigths" This reverts commit db625d0. * run fix-copies --------- Co-authored-by: Shakib-IO <[email protected]> Co-authored-by: Raushan Turganbay <[email protected]>

geetu040 and others added 15 commits February 18, 2025 12:27

upload initial code

f3d1896

update deepseek-vl adaptor

b904f22

update hierarchy of vision model classes

7d44bee

udpate aligner model

a3734d6

Merge branch 'main' into deepseek-vl

d0305b2

add text model

abea4eb

Added Image Processor

65886ec

Added Image Processor

19a7666

Added Image Processor

9c3c544

apply masks

1e49a1f

Merge remote-tracking branch 'fork/deepseek-vl' into deepseek-vl

972ee16

remove projection; add aligner

52c80c1

remove interpolate_pos_encoding

d362c9d

remove unused params in config

7d51093

cleaning

8d32560

This was referenced Mar 2, 2025

Create and Expose SamVisionModel as public for better accessibility #36493

Merged

🚨🚨🚨 Fix sdpa in sam and refactor relative position embeddings #36422

Merged

Add the __init__ file

c72cc51

Shakib-IO force-pushed the deepseek-vl branch from 8d32560 to c72cc51 Compare March 2, 2025 17:55

Shakib-IO added 4 commits March 3, 2025 22:01

added processing deepseek_vl class

16a4f4f

modified the deepseek-vl processor

a55b781

modified the deepseek-vl processor

834ecba

update __init__

4249fc3

geetu040 added 2 commits July 16, 2025 21:14

use modular for DeepseekVL

0a2ff70

make image processor more modular

b88e03f

zucchini-nlp reviewed Jul 18, 2025

View reviewed changes

src/transformers/models/deepseek_vl/modeling_deepseek_vl.py Outdated Show resolved Hide resolved

zucchini-nlp reviewed Jul 18, 2025

View reviewed changes

src/transformers/models/deepseek_vl/modeling_deepseek_vl.py Outdated Show resolved Hide resolved

zucchini-nlp requested a review from ArthurZucker July 18, 2025 07:21

geetu040 added 5 commits July 18, 2025 16:33

Merge branch 'main' into deepseek-vl

6e36f94

Merge branch 'main' into deepseek-vl

adc2292

Merge branch 'main' into deepseek-vl

52bdcf8

modular: use JanusPreTrainedModel

91295c2

janus: provide kwargs in loss

96862d8

geetu040 added 2 commits July 18, 2025 18:15

update processors in conversion script

029d82c

Merge branch 'main' into deepseek-vl

8ce7d42

geetu040 added 2 commits July 22, 2025 15:00

Revert "sam: fix _init_weigths"

1c8bf23

This reverts commit db625d0.

Merge branch 'main' into deepseek-vl

41bc04e

run fix-copies

e8d05ef

ArthurZucker approved these changes Jul 25, 2025

View reviewed changes

ArthurZucker merged commit 69cff31 into huggingface:main Jul 25, 2025
23 checks passed

rasmi mentioned this pull request Aug 6, 2025

convert_deepseek_vl_weights_to_hf.py not included in v4.55.0 release. #39966

Closed

2 tasks

geetu040 mentioned this pull request Oct 16, 2025

Add DeepseekVLV2 Model #41333

Open

Add support for DeepseekAI's DeepseekVL #36248

Add support for DeepseekAI's DeepseekVL #36248

Uh oh!

Conversation

geetu040 commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

TODOs

Uh oh!

geetu040 commented Feb 24, 2025

Uh oh!

zucchini-nlp commented Feb 24, 2025

Uh oh!

geetu040 commented Feb 25, 2025

Uh oh!

zucchini-nlp commented Feb 25, 2025

Uh oh!

geetu040 commented Feb 26, 2025

Uh oh!

zucchini-nlp commented Feb 26, 2025

Uh oh!

geetu040 commented Feb 26, 2025

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp commented Jul 18, 2025

Uh oh!

geetu040 commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker commented Jul 21, 2025

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

geetu040 commented Feb 18, 2025 •

edited

Loading

geetu040 commented Jul 18, 2025 •

edited

Loading