-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Add Nystromformer #14659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add Nystromformer #14659
Changes from 55 commits
Commits
Show all changes
62 commits
Select commit
Hold shift + click to select a range
4ea35ec
Initial commit
novice03 c3161cf
Config and modelling changes
novice03 c585c49
Modelling and test changes
novice03 b2d4d43
Code quality fixes
novice03 226069d
Modeling changes and conversion script
novice03 b4058a0
Minor modeling changes and conversion script
novice03 a890a24
Modeling changes
novice03 99b56a5
Correct modeling, add tests and documentation
novice03 bc4cbec
Code refactor
novice03 fcab712
Remove tokenizers
novice03 f39df6c
Merge branch 'add_nystromformer' of https://github.com/novice03/trans…
novice03 28db895
Code refactor
novice03 1e5c17d
Update __init__.py
novice03 6670757
Fix bugs
novice03 519329f
Update src/transformers/__init__.py
novice03 e3579da
Update src/transformers/__init__.py
novice03 d7444f1
Update src/transformers/models/nystromformer/__init__.py
novice03 8740d6d
Update docs/source/model_doc/nystromformer.mdx
novice03 4438125
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 577511e
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 b13c5dd
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 bd0da9e
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 51b2a93
Update src/transformers/models/nystromformer/convert_nystromformer_or…
novice03 a4efb44
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 eac563c
Update modeling and test_modeling
novice03 163702c
Code refactor
novice03 ceb83f2
Merge branch 'master' into add_nystromformer
novice03 089945f
.rst to .mdx
novice03 091c7ad
doc changes
novice03 2553d19
Doc changes
novice03 0129631
Update modeling_nystromformer.py
novice03 9c74356
Doc changes
novice03 fd9f168
Fix copies
novice03 4332b82
Apply suggestions from code review
novice03 016cb9d
Apply suggestions from code review
novice03 d208c78
Update configuration_nystromformer.py
novice03 b394eea
Fix copies
novice03 05d2d9b
Update tests/test_modeling_nystromformer.py
novice03 9d648fa
Update test_modeling_nystromformer.py
novice03 552cc12
Merge branch 'master' into add_nystromformer
novice03 ca3e934
Merge branch 'huggingface:master' into add_nystromformer
novice03 c362660
Merge branch 'huggingface:master' into add_nystromformer
novice03 875a0bc
Apply suggestions from code review
novice03 748092a
Fix code style
novice03 55edee2
Update modeling_nystromformer.py
novice03 6fc762d
Update modeling_nystromformer.py
novice03 c6a8d35
Fix code style
novice03 3e45dba
Merge branch 'huggingface:master' into add_nystromformer
novice03 f971632
Reformat modeling file
novice03 e891e05
Merge branch 'huggingface:master' into add_nystromformer
novice03 b678f5c
Update modeling_nystromformer.py
novice03 7268ec3
Merge branch 'huggingface:master' into add_nystromformer
novice03 02fe323
Modify NystromformerForMultipleChoice
novice03 b4770d7
Merge branch 'add_nystromformer' of https://github.com/novice03/trans…
novice03 5f3a389
Fix code quality
novice03 e0e4be6
Apply suggestions from code review
novice03 55cbd32
Code style changes and torch.no_grad()
novice03 5a0b891
Merge branch 'huggingface:master' into add_nystromformer
novice03 7e9e50a
make style
novice03 7da8e1c
Merge branch 'huggingface:master' into add_nystromformer
novice03 bd77cfe
Merge branch 'add_nystromformer' of https://github.com/novice03/trans…
novice03 9a5663b
Apply suggestions from code review
novice03 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| <!--Copyright 2022 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
| --> | ||
|
|
||
| # Nyströmformer | ||
|
|
||
| ## Overview | ||
|
|
||
| The Nyströmformer model was proposed in *<Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention> | ||
| <<https://arxiv.org/abs/2102.03902>>*__ by <Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn | ||
| Fung, Yin Li, and Vikas Singh>. | ||
|
|
||
| The abstract from the paper is the following: | ||
|
|
||
| *Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component | ||
| that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or | ||
| dependence of other tokens on each specific token. While beneficial, the quadratic complexity of self-attention on the | ||
| input sequence length has limited its application to longer sequences -- a topic being actively studied in the | ||
| community. To address this limitation, we propose Nyströmformer -- a model that exhibits favorable scalability as a | ||
| function of sequence length. Our idea is based on adapting the Nyström method to approximate standard self-attention | ||
| with O(n) complexity. The scalability of Nyströmformer enables application to longer sequences with thousands of | ||
| tokens. We perform evaluations on multiple downstream tasks on the GLUE benchmark and IMDB reviews with standard | ||
| sequence length, and find that our Nyströmformer performs comparably, or in a few cases, even slightly better, than | ||
| standard self-attention. On longer sequence tasks in the Long Range Arena (LRA) benchmark, Nyströmformer performs | ||
| favorably relative to other efficient self-attention methods. Our code is available at this https URL.* | ||
|
|
||
| This model was contributed by [novice03](https://huggingface.co/novice03). The original code can be found [here](https://github.com/mlpen/Nystromformer). | ||
|
|
||
| ## NystromformerConfig | ||
|
|
||
| [[autodoc]] NystromformerConfig | ||
|
|
||
| ## NystromformerModel | ||
|
|
||
| [[autodoc]] NystromformerModel | ||
| - forward | ||
|
|
||
| ## NystromformerForMaskedLM | ||
|
|
||
| [[autodoc]] NystromformerForMaskedLM | ||
| - forward | ||
|
|
||
| ## NystromformerForSequenceClassification | ||
|
|
||
| [[autodoc]] NystromformerForSequenceClassification | ||
| - forward | ||
|
|
||
| ## NystromformerForMultipleChoice | ||
|
|
||
| [[autodoc]] NystromformerForMultipleChoice | ||
| - forward | ||
|
|
||
| ## NystromformerForTokenClassification | ||
|
|
||
| [[autodoc]] NystromformerForTokenClassification | ||
| - forward | ||
|
|
||
| ## NystromformerForQuestionAnswering | ||
|
|
||
| [[autodoc]] NystromformerForQuestionAnswering | ||
| - forward | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -76,6 +76,7 @@ | |
| mobilebert, | ||
| mpnet, | ||
| mt5, | ||
| nystromformer, | ||
| openai, | ||
| pegasus, | ||
| perceiver, | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.