[video-comprehension] Fix failing example. #2313

ugolowic · 2025-10-20T10:29:51Z

This commit fixes errors in video-comprehension example involving VideLLava and CLIP models and caused by latest transformers upgrade.

Adapt modeling_video_llava to the transformers upgrade by adding GaudiVideoLlavaModel.forward
Fix mismatched matrix sizes in CLIP attention.
Fix access to non-existing attribute in GaudiGenerationMixin.

This is a fix for failing README example here: https://github.com/huggingface/optimum-habana/tree/main/examples/video-comprehension

github-actions · 2025-10-20T10:30:27Z

The code quality check failed, please run make style.

HuggingFaceDocBuilderDev · 2025-10-20T10:46:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

astachowiczhabana · 2025-10-20T12:11:00Z

LGTM

regisss · 2025-10-21T09:15:16Z

optimum/habana/transformers/generation/utils.py

-        elif cache_position is None:
-            past_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
-            cache_position = torch.arange(past_length, input_ids.shape[1], dtype=torch.long, device=input_ids.device)
+        model_inputs["cache_position"] = cache_position


Are we sure this doesn't break generation for other models?

Good point.
This change is in line with transformers upgrade, but I'm running text generation slow tests. It looks fine for now, I'll update when they're done.

Hi @ugolowic can you submit the results?

I ran text generation slow tests on main branch and on this change rebased onto the main branch and the results are exactly the same.

This commit fixes errors in video-comprehension example involving VideLLava and CLIP models and caused by latest transformers upgrade. * Adapt modeling_video_llava to the transformers upgrade by adding GaudiVideoLlavaModel.forward * Fix mismatched matrix sizes in CLIP attention. * Fix access to non-existing attribute in GaudiGenerationMixin. Signed-off-by: Urszula <[email protected]>

astachowiczhabana · 2025-10-29T14:40:31Z

optimum/habana/transformers/generation/utils.py

-        elif cache_position is None:
-            past_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
-            cache_position = torch.arange(past_length, input_ids.shape[1], dtype=torch.long, device=input_ids.device)
+        model_inputs["cache_position"] = cache_position


Hi @ugolowic can you submit the results?

regisss

LGTM!

ugolowic requested a review from astachowiczhabana October 20, 2025 10:29

ugolowic requested review from regisss and vivekgoe as code owners October 20, 2025 10:29

ugolowic force-pushed the video-comprehension-fix branch from 5572c8f to f7cd918 Compare October 20, 2025 10:31

astachowiczhabana approved these changes Oct 20, 2025

View reviewed changes

karol-brejna-i self-assigned this Oct 21, 2025

regisss reviewed Oct 21, 2025

View reviewed changes

ugolowic force-pushed the video-comprehension-fix branch from f7cd918 to 316601d Compare November 5, 2025 08:55

astachowiczhabana approved these changes Nov 5, 2025

View reviewed changes

astachowiczhabana approved these changes Nov 6, 2025

View reviewed changes

regisss approved these changes Nov 6, 2025

View reviewed changes

regisss merged commit 33536f1 into huggingface:main Nov 6, 2025
3 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[video-comprehension] Fix failing example. #2313

[video-comprehension] Fix failing example. #2313

Uh oh!

ugolowic commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 20, 2025

Uh oh!

astachowiczhabana commented Oct 20, 2025

Uh oh!

regisss Oct 21, 2025

Uh oh!

ugolowic Oct 21, 2025

Uh oh!

astachowiczhabana Oct 29, 2025

Uh oh!

ugolowic Nov 5, 2025

Uh oh!

astachowiczhabana Oct 29, 2025

Uh oh!

regisss left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[video-comprehension] Fix failing example. #2313

[video-comprehension] Fix failing example. #2313

Uh oh!

Conversation

ugolowic commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 20, 2025

Uh oh!

astachowiczhabana commented Oct 20, 2025

Uh oh!

regisss Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

ugolowic Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

astachowiczhabana Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

ugolowic Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

astachowiczhabana Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants