Fix GPT-2 Flash Attention 2 generation with left-padding #41966

Abdennacer-Badaoui · 2025-10-31T14:58:40Z

What does this PR do?

Fixes Flash Attention 2 generation with left-padding in GPT-2 by ensuring is_causal=True is set even when padding masks are provided.

Flash Attention 2 handles causal masking and padding independently, but the original code incorrectly disabled causal masking when any attention mask was present.

Fixes: test_flash_attn_2_generate_padding_left

HuggingFaceDocBuilderDev · 2025-10-31T15:14:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Abdennacer-Badaoui · 2025-10-31T15:27:47Z

@vasqu , ready for review :)

Cyrilvallez

Makes sense, fa should know that it's indeed causal! cc @vasqu here as well!

However, I think we can even completely remove the shape check, as sdpa checks it internally to make sure of the alignment, and fa should align the mask correctly already if I remember correctly.

src/transformers/models/gpt2/modeling_gpt2.py

did not mean to approve oupsi

vasqu

I'm with Cyril here, let's simplify the forward, essentially removing the whole is_causal logic during the forward and only change it within in the init based on if we are cross attn or not

src/transformers/models/gpt2/modeling_gpt2.py

vasqu · 2025-11-10T12:49:30Z

All the is_causal logic is correctly handled by the integrations; we just need to make sure we define it correctly on init

Abdennacer-Badaoui · 2025-11-10T13:52:47Z

Makes sense. Thanks @vasqu, @Cyrilvallez. Pushing the changes now.

github-actions · 2025-11-10T14:00:43Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: decision_transformer, gpt2

src/transformers/models/gpt2/modeling_gpt2.py

vasqu · 2025-11-10T14:26:12Z

run-slow: decision_transformer, gpt2

github-actions · 2025-11-10T14:27:30Z

This comment contains run-slow, running the specified jobs:

models: ["models/decision_transformer", "models/gpt2"]
quantizations: []

vasqu · 2025-11-10T14:41:06Z

run-slow: decision_transformer, gpt2

github-actions · 2025-11-10T14:41:35Z

CI Results

Workflow Run ⚙️

⚠️ No test being reported (jobs are skipped or cancelled)!

github-actions · 2025-11-10T14:42:52Z

This comment contains run-slow, running the specified jobs:

models: ["models/decision_transformer", "models/gpt2"]
quantizations: []

github-actions · 2025-11-10T14:53:40Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

vasqu

Thx a lot, looking good now!

vasqu · 2025-11-10T15:02:10Z

@Abdennacer-Badaoui can you run the fa tests locally and confirm that those work? Seems like our CI doesn't have FA installed anymore so just wanna double check

Abdennacer-Badaoui · 2025-11-10T15:11:57Z

@vasqu, I tested the FA failing tests for which I opened the PR. they’re all passing.

vasqu · 2025-11-10T15:14:46Z

Thx a lot ❤️

Abdennacer-Badaoui force-pushed the fix/test_flash_attn_2_generate_padding_left branch from 2877ab9 to 4f20a1f Compare October 31, 2025 15:13

remi-or requested review from Cyrilvallez and vasqu and removed request for Cyrilvallez November 3, 2025 14:00

Cyrilvallez previously approved these changes Nov 10, 2025

View reviewed changes

src/transformers/models/gpt2/modeling_gpt2.py Outdated Show resolved Hide resolved

vasqu reviewed Nov 10, 2025

View reviewed changes

src/transformers/models/gpt2/modeling_gpt2.py Outdated Show resolved Hide resolved

src/transformers/models/gpt2/modeling_gpt2.py Outdated Show resolved Hide resolved

Abdennacer-Badaoui added 3 commits November 10, 2025 13:59

Fix GPT-2 Flash Attention 2 generation with left-padding

239f612

repo consistency

72edb13

define is_causal in init

2edb434

Abdennacer-Badaoui force-pushed the fix/test_flash_attn_2_generate_padding_left branch from 015bba0 to 2edb434 Compare November 10, 2025 13:59

vasqu reviewed Nov 10, 2025

View reviewed changes

src/transformers/models/gpt2/modeling_gpt2.py Outdated Show resolved Hide resolved

fix

6244021

vasqu approved these changes Nov 10, 2025

View reviewed changes

vasqu merged commit 4dd4a8f into huggingface:main Nov 10, 2025
24 checks passed

Abdennacer-Badaoui deleted the fix/test_flash_attn_2_generate_padding_left branch November 10, 2025 15:17

Fix GPT-2 Flash Attention 2 generation with left-padding #41966

Fix GPT-2 Flash Attention 2 generation with left-padding #41966

Uh oh!

Conversation

Abdennacer-Badaoui commented Oct 31, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 31, 2025

Uh oh!

Abdennacer-Badaoui commented Oct 31, 2025

Uh oh!

Cyrilvallez left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vasqu commented Nov 10, 2025

Uh oh!

Abdennacer-Badaoui commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

Uh oh!

vasqu commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

vasqu commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

CI Results

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

CI Results

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu commented Nov 10, 2025

Uh oh!

Abdennacer-Badaoui commented Nov 10, 2025

Uh oh!

Uh oh!

vasqu commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cyrilvallez left a comment •

edited

Loading