Skip to content

[OpenVINO] Support Qwen3VL model#1551

Merged
echarlaix merged 100 commits intohuggingface:mainfrom
popovaan:qwen3vl
Feb 9, 2026
Merged

[OpenVINO] Support Qwen3VL model#1551
echarlaix merged 100 commits intohuggingface:mainfrom
popovaan:qwen3vl

Conversation

@popovaan
Copy link
Copy Markdown
Collaborator

@popovaan popovaan commented Dec 11, 2025

What does this PR do?

Conversion cmd-line for Qwen/Qwen3-VL-2B-Instruct:

optimum-cli export openvino -m Qwen/Qwen3-VL-2B-Instruct ./Qwen3-VL

Inference of Qwen/Qwen3-VL-2B-Instruct using OpenVINO backend:

from transformers import AutoTokenizer, AutoProcessor
from transformers.video_utils import load_video
from huggingface_hub import hf_hub_download
from optimum.intel.openvino import OVModelForVisualCausalLM

model_dir = "./Qwen3-VL/"

tokenizer = AutoTokenizer.from_pretrained(model_dir)
processor = AutoProcessor.from_pretrained(model_dir)
model = OVModelForVisualCausalLM.from_pretrained(model_dir)

# Prepare video input
video_path = hf_hub_download(
                repo_id="raushan-testing-hf/videos-test",
                filename="sample_demo_1.mp4",
                repo_type="dataset",
            )
input_video, _ = load_video(video_path, num_frames=10, backend="opencv")
question = "Why is this video funny?"
inputs = model.preprocess_inputs(processor=processor, text=question, video=input_video)

# Run inference
output_ids = model.generate(**inputs, max_new_tokens=10)
output_text = tokenizer.decode(output_ids[0])

print(output_text)

Continuation of #1452

Before submitting

  • [N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@popovaan
Copy link
Copy Markdown
Collaborator Author

popovaan commented Feb 2, 2026

@IlyasMoutawwakil @echarlaix could you please review the PR again?

Copy link
Copy Markdown
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR @popovaan !!



@register_in_tasks_manager(
"qwen3_vl_text",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you extend on why we need this config (instead of adding any modifications to the qwen3_vl config, depending on _behavior) ?

Copy link
Copy Markdown
Collaborator Author

@popovaan popovaan Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to avoid a separate text config, the qwen3_vl config should support handling of past key values. Simple inheritance from TextDecoderWithPositionIdsOnnxConfig does not solve this issue, as the config still behaves like a visual-language config and the past key values functionality is not triggered for some reason.

I tried several approaches to work around this, but none of them work properly. If avoiding a separate config is crucial, I suggest making this in a separate PR, as this model is awaited by the customer and it seems to be not trivial.

**kwargs,
):
# Clear cached rope delta from previous generations
self.rope_deltas = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not directly related to this PR but what if the models forward is called multiple times (not through generate) ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose accuracy will be low, as rope_deltas need to be cleared before each forward.

popovaan and others added 6 commits February 4, 2026 11:17
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
@popovaan
Copy link
Copy Markdown
Collaborator Author

popovaan commented Feb 5, 2026

@echarlaix thanks for the review! I will apply remaining comments today.
@IlyasMoutawwakil could you please review this PR again too?

@rkazants rkazants requested a review from echarlaix February 5, 2026 11:46
@popovaan
Copy link
Copy Markdown
Collaborator Author

popovaan commented Feb 5, 2026

@echarlaix @IlyasMoutawwakil please review this PR.

@rkazants rkazants mentioned this pull request Feb 9, 2026
@echarlaix echarlaix added the openvino-slow Runs OpenVINO slow tests with different versions of transformers label Feb 9, 2026
Copy link
Copy Markdown
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @popovaan for the great addition !

@rkazants rkazants removed the openvino-slow Runs OpenVINO slow tests with different versions of transformers label Feb 9, 2026
@rkazants
Copy link
Copy Markdown
Collaborator

rkazants commented Feb 9, 2026

@popovaan, @echarlaix, I recommend to run slow tests after precommit tests are complete. Otherwise, multiple model download requests from both scopes (precommit and slow tests) can lead to network errors.

@rkazants rkazants added openvino-slow Runs OpenVINO slow tests with different versions of transformers labels Feb 9, 2026
@echarlaix echarlaix merged commit 6ff3bfc into huggingface:main Feb 9, 2026
29 of 56 checks passed
github-merge-queue bot pushed a commit to openvinotoolkit/openvino.genai that referenced this pull request Mar 12, 2026
## Description
This PR enables Qwen3-VL model in GenAI VLM pipeline.
Supports SDPA + PA backends in VLM pipeline and Continuous Batching
pipeline (both `generate()` and `add_request()` APIs).

Depends on ~[Optimum Intel
PR](huggingface/optimum-intel#1551 latest
Optimum Intel and `transformers>=4.57.0` for model exporting.

CVS-175825

Resolves #2998 

## Checklist:
- [x] Tests have been updated or added to cover the new code.
- [x] This patch fully addresses the ticket.
- [x] I have made corresponding changes to the documentation.
@echarlaix echarlaix mentioned this pull request Mar 30, 2026
@peterchen-intel peterchen-intel mentioned this pull request Mar 31, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

openvino-slow Runs OpenVINO slow tests with different versions of transformers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants