[VLM] Post-layernorm override and quant config in vision encoder #9217

DarkLight1337 · 2024-10-10T04:17:02Z

Also, QuantizationConfig and associated prefix argument is now passed to vision towers to maintain consistency. Nevertheless, since vision tower is not quantized by existing methods yet, we ignore it and add a code comment explaining this.

FIX #9186

github-actions · 2024-10-10T04:17:13Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2024-10-10T04:20:19Z

cc @litianjian

litianjian · 2024-10-10T08:52:12Z

cc @litianjian

OK， I will update the llava models after this PR.

DarkLight1337 · 2024-10-11T15:50:31Z

@ywang96 PTAL when you have time.

litianjian · 2024-10-22T14:50:49Z

@DarkLight1337 Would you mind sharing the current progress？

DarkLight1337 · 2024-10-22T15:04:40Z

It's ready for review, @ywang96 is away so I have asked @Isotr0py for a review.

DarkLight1337 · 2024-10-22T15:06:12Z

Sorry for the delay, I have been focused on other stuff lately.

Isotr0py

LGTM. Thanks for enabling this!

…der + fix quant args (vllm-project#9217) Co-authored-by: Isotr0py <[email protected]> Signed-off-by: Alvant <[email protected]>

…der + fix quant args (vllm-project#9217) Co-authored-by: Isotr0py <[email protected]> Signed-off-by: Erkin Sagiroglu <[email protected]>

gshtras · 2024-10-28T20:58:09Z

vllm/model_executor/models/mllama.py

-            MllamaVisionEncoderLayer(config, is_gated)
-            for _ in range(num_layers)
+            MllamaVisionEncoderLayer(config,
+                                     quant_config=quant_config,


This change breaks partially quantized models such as https://huggingface.co/amd/Llama-3.2-11B-Vision-Instruct-FP8-KV
Since the vision part is not quantized, now because of this config an on the fly quantization is being applied (https://github.com/ROCm/vllm/blob/main/vllm/model_executor/layers/quantization/utils/w8a8_utils.py#L126), which can't work with vision MLP due to it being 3-dimentional. So it eventually crashes at https://github.com/ROCm/vllm/blob/main/vllm/_custom_ops.py#L711

@mgoin can you help with this so we can handle both partially and fully quantized models?

Proposed a fix in #9800

@DarkLight1337 I am working on this for as many models as possible in #9772

…der + fix quant args (vllm-project#9217) Co-authored-by: Isotr0py <[email protected]> Signed-off-by: qishuai <[email protected]>

…der + fix quant args (vllm-project#9217) Co-authored-by: Isotr0py <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

…der + fix quant args (vllm-project#9217) Co-authored-by: Isotr0py <[email protected]> Signed-off-by: LeiWang1999 <[email protected]>

DarkLight1337 added 3 commits October 10, 2024 04:06

Enable require_post_norm in vision encoders

b7abed2

Fix quant_config not passed to vision tower

dd4e42f

Remove redundant load_weights

b9ae0d9

DarkLight1337 requested a review from ywang96 October 10, 2024 04:17

DarkLight1337 added 4 commits October 10, 2024 08:42

Add explicit note that quantization not supported

4ed2425

Format

d167685

Add note

a839727

format

90deb9d

Add prefix

143ccc0

DarkLight1337 added 4 commits October 11, 2024 16:33

Fix prefixes

a3ca5fd

Update idefics

527bf39

format

a35f49c

Merge branch 'main' into require-post-norm

fdc6687

DarkLight1337 changed the title ~~[VLM] Enable overriding whether post layernorm is used in vision encoder~~ [VLM] Enable overriding whether post layernorm is used in vision encoder + normalize quant args Oct 18, 2024

DarkLight1337 changed the title ~~[VLM] Enable overriding whether post layernorm is used in vision encoder + normalize quant args~~ [VLM] Enable overriding whether post layernorm is used in vision encoder + fix quant args Oct 18, 2024

DarkLight1337 requested a review from Isotr0py October 22, 2024 15:04

DarkLight1337 and others added 7 commits October 22, 2024 15:13

Merge branch 'main' into require-post-norm

7149e2e

Add num_hidden_layers_override

f69c008

Merge branch 'main' into require-post-norm

99e1917

Try use quant_config

6e8670f

patch internvl awq config

e676b72

make mypy happy

989c4de

Fix typo

0224b75

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 23, 2024

Move prefix to be closer to variable declaration

ef4c253

DarkLight1337 force-pushed the require-post-norm branch from 576c0f2 to ef4c253 Compare October 23, 2024 07:56

Isotr0py approved these changes Oct 23, 2024

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 23, 2024 11:02

DarkLight1337 merged commit c18e1a3 into main Oct 23, 2024
63 checks passed

DarkLight1337 deleted the require-post-norm branch October 23, 2024 11:28

DarkLight1337 mentioned this pull request Oct 23, 2024

[Bugfix] Fix _init_vision_model in NVLM_D model #9611

Merged

mgoin mentioned this pull request Oct 23, 2024

[Bugfix] Use "vision_model" prefix for MllamaVisionModel #9628

Merged

litianjian mentioned this pull request Oct 24, 2024

[Bugfix]Disable the post_norm layer of the vision encoder for LLaVA models #9653

Merged

gshtras reviewed Oct 28, 2024

View reviewed changes

gshtras mentioned this pull request Oct 29, 2024

[Bugfix][Quantization]Fix support for non quantized visual layers in otherwise quantized mllama model, including missing scaling factors #9800

Closed

DarkLight1337 mentioned this pull request Oct 29, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

54 tasks

DarkLight1337 changed the title ~~[VLM] Enable overriding whether post layernorm is used in vision encoder + fix quant args~~ [VLM] Enable overriding post layernorm usage + fix quant args Nov 1, 2024

DarkLight1337 changed the title ~~[VLM] Enable overriding post layernorm usage + fix quant args~~ [VLM] Enable post layernorm override and quant config in vision encoder Nov 1, 2024

DarkLight1337 changed the title ~~[VLM] Enable post layernorm override and quant config in vision encoder~~ [VLM] Post-layernorm override and quant config in vision encoder Nov 1, 2024

Uh oh!

[VLM] Post-layernorm override and quant config in vision encoder #9217

[VLM] Post-layernorm override and quant config in vision encoder #9217

Uh oh!

Conversation

DarkLight1337 commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 10, 2024

Uh oh!

DarkLight1337 commented Oct 10, 2024

Uh oh!

litianjian commented Oct 10, 2024

Uh oh!

DarkLight1337 commented Oct 11, 2024

Uh oh!

litianjian commented Oct 22, 2024

Uh oh!

DarkLight1337 commented Oct 22, 2024

Uh oh!

DarkLight1337 commented Oct 22, 2024

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gshtras Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gshtras Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgoin Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

DarkLight1337 commented Oct 10, 2024 •

edited

Loading

DarkLight1337 Oct 29, 2024 •

edited

Loading

gshtras Oct 29, 2024 •

edited

Loading

mgoin Oct 29, 2024 •

edited

Loading