-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Model] Add Gemma3 GGUF multimodal support #27772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 51 commits
Commits
Show all changes
53 commits
Select commit
Hold shift + click to select a range
6cb4b22
Add Gemma3 GGUF multimodal support
lucianommartins f9f1db1
Remove deprecated V0 compatibility code
lucianommartins 387d5ed
Address architectural feedback:
lucianommartins dd162d6
Address feedbacks
lucianommartins be3fa26
Fix ReadTheDocs type annotations in utils.py
lucianommartins 67a6006
Fix type annotation imports - move to module level
lucianommartins 1b36cb5
fix: restore model_arch parameter for GGUF dtype handling
lucianommartins 2e90620
fix: restore model_arch parameter for GGUF dtype handling
lucianommartins 2a55c43
Address reviewer feedback: remove hardcoded values, apply fail-fast
lucianommartins ee439e4
refactor: eliminate code duplication and generalize multimodal GGUF s…
lucianommartins 2b91abf
refactor: address code review feedback for Gemma3 GGUF
lucianommartins 18276d4
Addressing reviews/feedbacks.
lucianommartins 0ce2423
reverting cosmetic changes.
lucianommartins 50a66c8
refactor: reorganize GGUF utilities and extract config patching
lucianommartins a480c72
Add Gemma3 GGUF multimodal generation tests
lucianommartins e86ae43
test: split Gemma3 tests into separate GGUF and HF files
lucianommartins 7cc3406
clean
Isotr0py 7c42bf3
avoid reading GGUF multiple times
Isotr0py 4401fa5
better hf_config patch
Isotr0py ba1536b
clean test
Isotr0py 556a044
remove redundant vibe coding test
Isotr0py d3a69b9
update autom tensor mapping
Isotr0py f6f48ec
gguf: implement automatic mmproj weight mapping with filtering
lucianommartins dcd2d5e
fix(gguf): resolve Gemma3 multimodal parameter prefix mismatch
lucianommartins ea8a03c
fix(gguf): resolve Gemma3 multimodal parameter prefix mismatch
lucianommartins 2f5eac0
bump gguf version
Isotr0py 4579031
refactor(gguf): use official GGUF constants for vision config extraction
lucianommartins 05ce7f3
clean
Isotr0py 19539e5
fix Gemma3 GGUF multimodal test with correct processor loading
lucianommartins 8c7c735
unify unqunatized weights handling
Isotr0py 466f1ef
revert unnecessary changes
Isotr0py a7934c9
update gemma3mm embed_input_ids
Isotr0py 8365694
fix
Isotr0py b699a6d
clean model config
Isotr0py f915c21
clean processor to load processor from tokenizer
Isotr0py 6d0ae74
feat(gguf): Finish gguf loader code cleanup
lucianommartins 9e0e6f5
move tokenizer validation to model_config
Isotr0py 9846458
fix test
Isotr0py 2f2de1c
revert gemma3 text backbone
Isotr0py 211131f
compatability with unsloth gguf
Isotr0py f0b0b03
fix broken inc quant
Isotr0py 803a166
revert unnecessary changes
Isotr0py be831a7
revert unnecessary changes
Isotr0py 12a26cf
clean and correct some comments
Isotr0py 5caf036
update test
Isotr0py 9077fc4
fix deadlock
Isotr0py 8d6cfd4
gemini
Isotr0py 86a99d2
Add vocab_size override and improve automatic weight mapping
lucianommartins 9c071b9
fix: Update CI test configurations for Gemma3 compatibility
lucianommartins 5a245bd
refactor(gguf): remove redundant vocab_size extraction
lucianommartins ef3a0ab
fix(v1): add hasattr check before calling generate_attention_masks
lucianommartins ea1525a
revert
Isotr0py 46d8746
Merge branch 'main' into main
Isotr0py File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
115 changes: 115 additions & 0 deletions
115
tests/models/multimodal/generation/test_multimodal_gguf.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,115 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
|
|
||
| from typing import Literal, NamedTuple | ||
|
|
||
| import pytest | ||
| from huggingface_hub import hf_hub_download | ||
| from pytest import MarkDecorator | ||
|
|
||
| from tests.quantization.utils import is_quant_method_supported | ||
| from vllm.assets.image import ImageAsset | ||
| from vllm.utils.torch_utils import set_default_torch_num_threads | ||
|
|
||
| from ....conftest import PromptImageInput, VllmRunner | ||
| from ...utils import check_logprobs_close | ||
|
|
||
|
|
||
| class GGUFMMTestConfig(NamedTuple): | ||
| original_model: str | ||
| gguf_repo: str | ||
| gguf_backbone: str | ||
| gguf_mmproj: str | ||
| prompt: list[str] | ||
| mm_data: dict[Literal["images"], PromptImageInput] | ||
| max_model_len: int = 4096 | ||
| marks: list[MarkDecorator] = [] | ||
|
|
||
| @property | ||
| def gguf_model(self): | ||
| hf_hub_download(self.gguf_repo, filename=self.gguf_mmproj) | ||
| return hf_hub_download(self.gguf_repo, filename=self.gguf_backbone) | ||
|
|
||
|
|
||
| GEMMA3_CONFIG = GGUFMMTestConfig( | ||
| original_model="google/gemma-3-4b-it", | ||
| gguf_repo="google/gemma-3-4b-it-qat-q4_0-gguf", | ||
| gguf_backbone="gemma-3-4b-it-q4_0.gguf", | ||
| gguf_mmproj="mmproj-model-f16-4B.gguf", | ||
| prompt=["<start_of_image>Describe this image in detail:"], | ||
| mm_data={"images": [ImageAsset("stop_sign").pil_image]}, | ||
| marks=[pytest.mark.core_model], | ||
| ) | ||
|
|
||
| MODELS_TO_TEST = [GEMMA3_CONFIG] | ||
|
|
||
|
|
||
| def run_multimodal_gguf_test( | ||
| vllm_runner: type[VllmRunner], | ||
| model: GGUFMMTestConfig, | ||
| dtype: str, | ||
| max_tokens: int, | ||
| num_logprobs: int, | ||
| ): | ||
| # Run gguf model. | ||
| with ( | ||
| set_default_torch_num_threads(1), | ||
| vllm_runner( | ||
| model_name=model.gguf_model, | ||
| enforce_eager=True, | ||
| tokenizer_name=model.original_model, | ||
| dtype=dtype, | ||
| max_model_len=model.max_model_len, | ||
| ) as gguf_model, | ||
| ): | ||
| gguf_outputs = gguf_model.generate_greedy_logprobs( | ||
| prompts=model.prompt, | ||
| max_tokens=max_tokens, | ||
| num_logprobs=num_logprobs, | ||
| **model.mm_data, | ||
| ) | ||
|
|
||
| # Run unquantized model. | ||
| with vllm_runner( | ||
| model_name=model.original_model, | ||
| enforce_eager=True, # faster tests | ||
| dtype=dtype, | ||
| max_model_len=model.max_model_len, | ||
| ) as original_model: | ||
| original_outputs = original_model.generate_greedy_logprobs( | ||
| prompts=model.prompt, | ||
| max_tokens=max_tokens, | ||
| num_logprobs=num_logprobs, | ||
| **model.mm_data, | ||
| ) | ||
|
|
||
| check_logprobs_close( | ||
| outputs_0_lst=original_outputs, | ||
| outputs_1_lst=gguf_outputs, | ||
| name_0="original", | ||
| name_1="gguf", | ||
| ) | ||
|
|
||
|
|
||
| @pytest.mark.skipif( | ||
| not is_quant_method_supported("gguf"), | ||
| reason="gguf is not supported on this GPU type.", | ||
| ) | ||
| @pytest.mark.parametrize( | ||
| "model", | ||
| [ | ||
| pytest.param(test_config, marks=test_config.marks) | ||
| for test_config in MODELS_TO_TEST | ||
| ], | ||
| ) | ||
| @pytest.mark.parametrize("dtype", ["bfloat16"]) | ||
| @pytest.mark.parametrize("max_tokens", [32]) | ||
| @pytest.mark.parametrize("num_logprobs", [10]) | ||
| def test_models( | ||
| vllm_runner: type[VllmRunner], | ||
| model: GGUFMMTestConfig, | ||
| dtype: str, | ||
| max_tokens: int, | ||
| num_logprobs: int, | ||
| ) -> None: | ||
| run_multimodal_gguf_test(vllm_runner, model, dtype, max_tokens, num_logprobs) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.