Attempt to fix VLM gradient enabling #41993

molbap · 2025-11-03T14:26:28Z

What does this PR do?

As per title. Linked to huggingface/peft#2880.
Follows more or less closely the already existing implementations for idefics2-3 and smolvlm, trying to cover several types of VLMs (they are named differently across the lib.)

github-actions · 2025-11-03T14:27:38Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen2_vl

HuggingFaceDocBuilderDev · 2025-11-03T14:36:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

githubnemo · 2025-11-03T22:42:03Z

tests/models/qwen2_vl/test_modeling_qwen2_vl.py

+            if vision_module is not None:
+                for parameter in vision_module.parameters():
+                    parameter.requires_grad = True


From my understanding of peft#2880, the problem is mainly that the entry point of the model doesn't require gradients (not a trainable parameter, just for gradient checkpointing) so that targeting modules after that doesn't work with reentrant gradient checkpointing. Isn't setting all vision parameters to requires_grad=True masking the changes done in enable_input_requires_grad and therefore always true, regardless of what that helper function does? Maybe targeting something that is clearly not an input, something resembling an attention layer for example, is better?

I see, hmm- followed the implem of idefics2/smolvlm as I remembered they faced this issue at the time. You're right that this isn't necessary, we register twice. The lowest module trick should work though, and I'm not sure targeting an attention layer works either. Currently @BenjaminBossan 's script outputs grad norms properly with gradient checkpointing enabled and PEFT disabled on this branch, so it seems to do the trick?

no GC

{'loss': 9.4971, 'grad_norm': 23.421083450317383, 'learning_rate': 2e-05, 'epoch': 0.33} {'loss': 7.9526, 'grad_norm': 675.1868896484375, 'learning_rate': 1.866666666666667e-05, 'epoch': 0.67}

with GC

{'loss': 9.4971, 'grad_norm': 23.421083450317383, 'learning_rate': 2e-05, 'epoch': 0.33} {'loss': 7.9526, 'grad_norm': 675.1868896484375, 'learning_rate': 1.866666666666667e-05, 'epoch': 0.67}

in either case, agree double registering is useless, will remove!

Yeah, I think the implementation is fine. I'm just worried that the test is masking the behavior of the fix and is therefore not honest enough. Sorry if I didn't make that clear.

No that's fair, I'll revamp the test for a narrower scope!

zucchini-nlp

I think this solution works only for VLMs and also depends a lot on how the vision model is named. I'm sure we listed all possible names, but new models can get creative with it

So I'm thinking that we could potentially make it works ootx for all MLLMS (audio/vision/omni) by checking for each PreTrainedModel within the model and then setting grads on that models' inputs (model.get_input_embeddings())

We use similar trick when setting attention implementations and check for PreTrainedModel's, so it could be a good option. WDYT?

molbap · 2025-11-07T13:18:45Z

Thanks, yes it's a far less brittle option. There's a few (really a few and hopefully should be 0 after v5) modules that were just nn.Modules instead of PreTrainedModel so they would be off the hook, other than these few exceptions should work out well, will push something like that today

attempt to fix gradients

61679f8

githubnemo reviewed Nov 3, 2025

View reviewed changes

molbap requested a review from zucchini-nlp November 6, 2025 21:20

zucchini-nlp reviewed Nov 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attempt to fix VLM gradient enabling #41993

Attempt to fix VLM gradient enabling #41993

molbap commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

githubnemo Nov 3, 2025

Uh oh!

molbap Nov 4, 2025

Uh oh!

githubnemo Nov 5, 2025

Uh oh!

molbap Nov 5, 2025

Uh oh!

zucchini-nlp left a comment •

edited

Loading

Uh oh!

molbap commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Attempt to fix VLM gradient enabling #41993

Are you sure you want to change the base?

Attempt to fix VLM gradient enabling #41993

Conversation

molbap commented Nov 3, 2025

What does this PR do?

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

githubnemo Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

molbap Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

githubnemo Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

molbap Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

molbap commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zucchini-nlp left a comment •

edited

Loading