[OpenVINO] Support Qwen3VL model#1551
Conversation
This reverts commit 8e7cdd2.
|
@IlyasMoutawwakil @echarlaix could you please review the PR again? |
|
|
||
|
|
||
| @register_in_tasks_manager( | ||
| "qwen3_vl_text", |
There was a problem hiding this comment.
could you extend on why we need this config (instead of adding any modifications to the qwen3_vl config, depending on _behavior) ?
There was a problem hiding this comment.
If we want to avoid a separate text config, the qwen3_vl config should support handling of past key values. Simple inheritance from TextDecoderWithPositionIdsOnnxConfig does not solve this issue, as the config still behaves like a visual-language config and the past key values functionality is not triggered for some reason.
I tried several approaches to work around this, but none of them work properly. If avoiding a separate config is crucial, I suggest making this in a separate PR, as this model is awaited by the customer and it seems to be not trivial.
| **kwargs, | ||
| ): | ||
| # Clear cached rope delta from previous generations | ||
| self.rope_deltas = None |
There was a problem hiding this comment.
not directly related to this PR but what if the models forward is called multiple times (not through generate) ?
There was a problem hiding this comment.
I suppose accuracy will be low, as rope_deltas need to be cleared before each forward.
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
|
@echarlaix thanks for the review! I will apply remaining comments today. |
|
@echarlaix @IlyasMoutawwakil please review this PR. |
|
@popovaan, @echarlaix, I recommend to run slow tests after precommit tests are complete. Otherwise, multiple model download requests from both scopes (precommit and slow tests) can lead to network errors. |
## Description This PR enables Qwen3-VL model in GenAI VLM pipeline. Supports SDPA + PA backends in VLM pipeline and Continuous Batching pipeline (both `generate()` and `add_request()` APIs). Depends on ~[Optimum Intel PR](huggingface/optimum-intel#1551 latest Optimum Intel and `transformers>=4.57.0` for model exporting. CVS-175825 Resolves #2998 ## Checklist: - [x] Tests have been updated or added to cover the new code. - [x] This patch fully addresses the ticket. - [x] I have made corresponding changes to the documentation.
What does this PR do?
Conversion cmd-line for
Qwen/Qwen3-VL-2B-Instruct:optimum-cli export openvino -m Qwen/Qwen3-VL-2B-Instruct ./Qwen3-VLInference of
Qwen/Qwen3-VL-2B-Instructusing OpenVINO backend:Continuation of #1452
Before submitting