-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Model] Siglip2 Model Support #27566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: piood <[email protected]>
|
Documentation preview: https://vllm--27566.org.readthedocs.build/en/27566/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request extends vLLM's capabilities to support SigLIP2 models, building upon the existing SigLIP architecture. The changes include updating the dummy token ID for tokenizer compatibility, adding SigLIP2 to the test suite and documentation, and enhancing the model configuration loading to automatically infer model architecture when it's not explicitly defined. My review focuses on the robustness of this new configuration logic. I've identified one high-severity issue where the check for a missing architecture field could be more robust to handle cases like an empty list, which would otherwise cause model loading to fail.
Signed-off-by: piood <[email protected]>
Signed-off-by: piood <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request extends the SigLIP model support to SigLIP2, including handling model configurations and updating the dummy token ID. The changes involve modifications to the documentation, test files, and model configuration files. The review focuses on ensuring the correctness of the dummy token ID and the robustness of the configuration handling.
| if not config.architectures: | ||
| if config.model_type not in MODEL_MAPPING_NAMES: | ||
| raise ValueError(f"Model type {config.model_type} not supported") | ||
| model_type = MODEL_MAPPING_NAMES[config.model_type] | ||
| config.update({"architectures": [model_type]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added logic to automatically determine the architecture for models without an explicit architectures field is good for robustness. However, raising a ValueError if the model_type is not found in MODEL_MAPPING_NAMES might be too strict. A more graceful fallback could involve attempting to load the model with trust_remote_code=True or issuing a warning and proceeding with a default architecture. This could prevent the system from failing completely when encountering a new or less common model type. Also, consider adding a log message to indicate when this automatic architecture detection is being used, which can help with debugging and understanding the system's behavior.
| if not config.architectures: | |
| if config.model_type not in MODEL_MAPPING_NAMES: | |
| raise ValueError(f"Model type {config.model_type} not supported") | |
| model_type = MODEL_MAPPING_NAMES[config.model_type] | |
| config.update({"architectures": [model_type]}) | |
| if not config.architectures: | |
| if config.model_type not in MODEL_MAPPING_NAMES: | |
| logger.warning(f"Model type {config.model_type} not found in MODEL_MAPPING_NAMES. Attempting to proceed without explicit architecture.") | |
| # Optionally, try loading with trust_remote_code or assign a default architecture here | |
| config.update({"architectures": ["AutoModel"]}) # Example: setting a default architecture | |
| else: | |
| model_type = MODEL_MAPPING_NAMES[config.model_type] | |
| config.update({"architectures": [model_type]}) |
Signed-off-by: piood <[email protected]>
Signed-off-by: piood <[email protected]>
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
|
FYI this is breaking |
|
Curious why this was merged with CI failures? |
|
Oh, I didn't think this PR is related to chat utils so I thought the failure was unrelated |
|
Really we should never assume this no matter how unlikely it seems. Unless the same failure has been seen on main. Since this keeps happening (breakage due to incorrect assumption which then has much wider blast radius). |
|
I agree in principle, unfortunately entrypoints tests has been quite flaky these few weeks so it's easy to accidentally miss these true positives... |
|
Does the HF implementation of the model support dynamic image size? In any case I doubt that has anything to do with the issue you faced since the issue occurs even with text only prompt. |
I'm not talking about the bug. I'm using Siglip2, and the output of Processor contains But when it comes to inference, the model only get pixel_values, others are discarded. |
|
It looks like on transformers side, SigLIP2 supports resizing image embeddings whereas SigLIP does not |
Signed-off-by: piood <[email protected]>
Signed-off-by: piood <[email protected]>
|
I have push #28365 to fix siglip batch text output error. @DarkLight1337 @sleepwalker2017 |
Signed-off-by: piood <[email protected]>
|
Hi @piood @DarkLight1337 ,
Any tips on how we can add these variants ? |
|
@apoorv-stackav Second and third problem, i don't found in locally, in my own env the dimensional is right for |
|
The max_length parameter is implemented by the Is it easy to implement max_length in |
|
Indeed the padding should be done by tokenizer. You should be able to do this by passing |
|
@apoorv-stackav, I believe the SigLIP model only uses the hidden state of the last token for the sequence embedding. Therefore, padding shouldn't affect the final result. Please correct me if I'm wrong. |
|
@piood , thanks for the great work to add support for siglip2, I tested it with |
|
Which version of transformers are you using? You might have to update it. |
|
Also this PR doesn't cover the NaViT variant |
|
@zac-wang-nv siglip2navit implement in |
|
Hi, @piood @DarkLight1337 Model configSo the text encoder is 1152 → 1536 projection, while the vision encoder already operates at 1536. Serving commandRequestBehavior
QuestionIs there a configuration/task flag I’m missing to bypass the text prompt path for image-only requests, or is this indeed a bug in the merge logic for multimodal embeddings? I’d love guidance on whether I should pad the text embeddings to 1536 before merging or if there’s an existing vision-only embedding pathway I can use. Thanks! |
Signed-off-by: piood <[email protected]>
|
Same problem here (v0.11.2): |
|
@FlyingYanglu @jtruetsch There are some tricky issues here. For Since fixing this might require many changes to Maybe we can keep |
|
Let me open a PR to facilitate that |
|
Opened #29741 |
|
Thank you so much for the help! @piood @DarkLight1337 |
Hi team, @piood @DarkLight1337 — sorry for the trouble again. It looks like padding-to-max_length is the officially expected behavior for SigLIP2 text preprocessing (as noted in the HF docs), so I’m trying to keep my tokenizer/preprocessor aligned with that. However, I found a couple of issues:
Because of this, the only reliable way to match HF behavior right now is to pass in pre-tokenized IDs manually. It would be super helpful if |
|
@noooop can you help with this? |
|
Open #29794 to support tokenization_kwargs override. |
Purpose
Extend SigLIP model support to SigLIP2, building upon the existing SigLIP embedding architecture.
This PR extends multimodal embedding capabilities to support SigLIP2 models, which are widely used for vision-language tasks.
To resolve [New Model]: Google SigLip 2 #13663 and resolve [Feature]: Support serving of CLIP/SigLIP embeddings #25581 - This PR builds upon SigLIP embedding support [Model] Siglip Embedding Support #27324 to add SigLIP2 support.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.