You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix attention sink implementation in flex attention (huggingface#41083)
* Fix attention sink implementation in flex attention
* fix dim
* fix
* Remove print
* raisae error when return_lse is False yet s_aux is providewd
* Clean test files for merge
* Update src/transformers/integrations/flex_attention.py
Co-authored-by: Arthur <[email protected]>
* force return lse
* Add to doc
---------
Co-authored-by: Arthur <[email protected]>
Copy file name to clipboardExpand all lines: docs/source/en/model_doc/gpt_oss.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,6 +35,8 @@ The abstract from the paper is the following:
35
35
*<INSERTPAPERABSTRACTHERE>*
36
36
37
37
Tips:
38
+
-**Attention Sinks with Flex Attention**: When using flex attention, attention sinks require special handling. Unlike with standard attention implementations where sinks can be added directly to attention scores, flex attention `score_mod` function operates on individual score elements rather than the full attention matrix. Therefore, attention sinks renormalization have to be applied after the flex attention computations by renormalizing the outputs using the log-sum-exp (LSE) values returned by flex attention.
0 commit comments