Skip to content
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
4ea35ec
Initial commit
novice03 Dec 7, 2021
c3161cf
Config and modelling changes
novice03 Dec 7, 2021
c585c49
Modelling and test changes
novice03 Dec 7, 2021
b2d4d43
Code quality fixes
novice03 Dec 7, 2021
226069d
Modeling changes and conversion script
novice03 Dec 10, 2021
b4058a0
Minor modeling changes and conversion script
novice03 Dec 30, 2021
a890a24
Modeling changes
novice03 Jan 1, 2022
99b56a5
Correct modeling, add tests and documentation
novice03 Jan 2, 2022
bc4cbec
Code refactor
novice03 Jan 2, 2022
fcab712
Remove tokenizers
novice03 Jan 2, 2022
f39df6c
Merge branch 'add_nystromformer' of https://github.com/novice03/trans…
novice03 Jan 2, 2022
28db895
Code refactor
novice03 Jan 2, 2022
1e5c17d
Update __init__.py
novice03 Jan 2, 2022
6670757
Fix bugs
novice03 Jan 2, 2022
519329f
Update src/transformers/__init__.py
novice03 Jan 2, 2022
e3579da
Update src/transformers/__init__.py
novice03 Jan 2, 2022
d7444f1
Update src/transformers/models/nystromformer/__init__.py
novice03 Jan 2, 2022
8740d6d
Update docs/source/model_doc/nystromformer.mdx
novice03 Jan 2, 2022
4438125
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 Jan 2, 2022
577511e
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 Jan 2, 2022
b13c5dd
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 Jan 2, 2022
bd0da9e
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 Jan 2, 2022
51b2a93
Update src/transformers/models/nystromformer/convert_nystromformer_or…
novice03 Jan 2, 2022
a4efb44
Update src/transformers/models/nystromformer/configuration_nystromfor…
novice03 Jan 2, 2022
eac563c
Update modeling and test_modeling
novice03 Jan 2, 2022
163702c
Code refactor
novice03 Jan 2, 2022
ceb83f2
Merge branch 'master' into add_nystromformer
novice03 Jan 2, 2022
089945f
.rst to .mdx
novice03 Jan 2, 2022
091c7ad
doc changes
novice03 Jan 2, 2022
2553d19
Doc changes
novice03 Jan 2, 2022
0129631
Update modeling_nystromformer.py
novice03 Jan 2, 2022
9c74356
Doc changes
novice03 Jan 2, 2022
fd9f168
Fix copies
novice03 Jan 2, 2022
4332b82
Apply suggestions from code review
novice03 Jan 2, 2022
016cb9d
Apply suggestions from code review
novice03 Jan 2, 2022
d208c78
Update configuration_nystromformer.py
novice03 Jan 2, 2022
b394eea
Fix copies
novice03 Jan 2, 2022
05d2d9b
Update tests/test_modeling_nystromformer.py
novice03 Jan 3, 2022
9d648fa
Update test_modeling_nystromformer.py
novice03 Jan 3, 2022
552cc12
Merge branch 'master' into add_nystromformer
novice03 Jan 3, 2022
ca3e934
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 5, 2022
c362660
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 10, 2022
875a0bc
Apply suggestions from code review
novice03 Jan 10, 2022
748092a
Fix code style
novice03 Jan 10, 2022
55edee2
Update modeling_nystromformer.py
novice03 Jan 10, 2022
6fc762d
Update modeling_nystromformer.py
novice03 Jan 10, 2022
c6a8d35
Fix code style
novice03 Jan 10, 2022
3e45dba
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 10, 2022
f971632
Reformat modeling file
novice03 Jan 10, 2022
e891e05
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 10, 2022
b678f5c
Update modeling_nystromformer.py
novice03 Jan 10, 2022
7268ec3
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 10, 2022
02fe323
Modify NystromformerForMultipleChoice
novice03 Jan 10, 2022
b4770d7
Merge branch 'add_nystromformer' of https://github.com/novice03/trans…
novice03 Jan 10, 2022
5f3a389
Fix code quality
novice03 Jan 10, 2022
e0e4be6
Apply suggestions from code review
novice03 Jan 10, 2022
55cbd32
Code style changes and torch.no_grad()
novice03 Jan 10, 2022
5a0b891
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 10, 2022
7e9e50a
make style
novice03 Jan 10, 2022
7da8e1c
Merge branch 'huggingface:master' into add_nystromformer
novice03 Jan 10, 2022
bd77cfe
Merge branch 'add_nystromformer' of https://github.com/novice03/trans…
novice03 Jan 10, 2022
9a5663b
Apply suggestions from code review
novice03 Jan 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/master/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
Expand Down
1 change: 1 addition & 0 deletions README_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/master/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
Expand Down
1 change: 1 addition & 0 deletions README_zh-hans.md
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,7 @@ conda install -c huggingface transformers
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (来自 Studio Ousia) 伴随论文 [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) 由 Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka 发布。
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (来自 Microsoft Research) 伴随论文 [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 发布。
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (来自 Google AI) 伴随论文 [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 发布。
1. **[Nyströmformer](https://huggingface.co/docs/transformers/master/model_doc/nystromformer)** (来自 the University of Wisconsin - Madison) 伴随论文 [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) 由 Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh 发布。
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (来自 Google) 伴随论文 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (来自 Deepmind) 伴随论文 [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 发布。
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (来自 VinAI Research) 伴随论文 [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) 由 Dat Quoc Nguyen and Anh Tuan Nguyen 发布。
Expand Down
1 change: 1 addition & 0 deletions README_zh-hant.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,7 @@ conda install -c huggingface transformers
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/master/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,8 @@
title: MPNet
- local: model_doc/mt5
title: MT5
- local: model_doc/nystromformer
title: Nyströmformer
- local: model_doc/openai-gpt
title: OpenAI GPT
- local: model_doc/gpt2
Expand Down
2 changes: 2 additions & 0 deletions docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ conversion utilities for the following models.
1. **[Megatron-GPT2](model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
Expand Down Expand Up @@ -235,6 +236,7 @@ Flax), PyTorch, and/or TensorFlow.
| MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| MPNet | ✅ | ✅ | ✅ | ✅ | ❌ |
| mT5 | ✅ | ✅ | ✅ | ✅ | ✅ |
| Nystromformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ |
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ |
| Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ |
Expand Down
69 changes: 69 additions & 0 deletions docs/source/model_doc/nystromformer.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Nyströmformer

## Overview

The Nyströmformer model was proposed in *<Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention>
<<https://arxiv.org/abs/2102.03902>>*__ by <Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn
Fung, Yin Li, and Vikas Singh>.

The abstract from the paper is the following:

*Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component
that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or
dependence of other tokens on each specific token. While beneficial, the quadratic complexity of self-attention on the
input sequence length has limited its application to longer sequences -- a topic being actively studied in the
community. To address this limitation, we propose Nyströmformer -- a model that exhibits favorable scalability as a
function of sequence length. Our idea is based on adapting the Nyström method to approximate standard self-attention
with O(n) complexity. The scalability of Nyströmformer enables application to longer sequences with thousands of
tokens. We perform evaluations on multiple downstream tasks on the GLUE benchmark and IMDB reviews with standard
sequence length, and find that our Nyströmformer performs comparably, or in a few cases, even slightly better, than
standard self-attention. On longer sequence tasks in the Long Range Arena (LRA) benchmark, Nyströmformer performs
favorably relative to other efficient self-attention methods. Our code is available at this https URL.*

This model was contributed by [novice03](https://huggingface.co/novice03). The original code can be found [here](https://github.com/mlpen/Nystromformer).

## NystromformerConfig

[[autodoc]] NystromformerConfig

## NystromformerModel

[[autodoc]] NystromformerModel
- forward

## NystromformerForMaskedLM

[[autodoc]] NystromformerForMaskedLM
- forward

## NystromformerForSequenceClassification

[[autodoc]] NystromformerForSequenceClassification
- forward

## NystromformerForMultipleChoice

[[autodoc]] NystromformerForMultipleChoice
- forward

## NystromformerForTokenClassification

[[autodoc]] NystromformerForTokenClassification
- forward

## NystromformerForQuestionAnswering

[[autodoc]] NystromformerForQuestionAnswering
- forward
29 changes: 29 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,10 @@
"models.mobilebert": ["MOBILEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "MobileBertConfig", "MobileBertTokenizer"],
"models.mpnet": ["MPNET_PRETRAINED_CONFIG_ARCHIVE_MAP", "MPNetConfig", "MPNetTokenizer"],
"models.mt5": ["MT5Config"],
"models.nystromformer": [
"NYSTROMFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP",
"NystromformerConfig",
],
"models.openai": ["OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP", "OpenAIGPTConfig", "OpenAIGPTTokenizer"],
"models.pegasus": ["PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP", "PegasusConfig", "PegasusTokenizer"],
"models.perceiver": ["PERCEIVER_PRETRAINED_CONFIG_ARCHIVE_MAP", "PerceiverConfig", "PerceiverTokenizer"],
Expand Down Expand Up @@ -1122,6 +1126,19 @@
]
)
_import_structure["models.mt5"].extend(["MT5EncoderModel", "MT5ForConditionalGeneration", "MT5Model"])
_import_structure["models.nystromformer"].extend(
[
"NYSTROMFORMER_PRETRAINED_MODEL_ARCHIVE_LIST",
"NystromformerForMaskedLM",
"NystromformerForMultipleChoice",
"NystromformerForQuestionAnswering",
"NystromformerForSequenceClassification",
"NystromformerForTokenClassification",
"NystromformerLayer",
"NystromformerModel",
"NystromformerPreTrainedModel",
]
)
_import_structure["models.openai"].extend(
[
"OPENAI_GPT_PRETRAINED_MODEL_ARCHIVE_LIST",
Expand Down Expand Up @@ -2305,6 +2322,7 @@
from .models.mobilebert import MOBILEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, MobileBertConfig, MobileBertTokenizer
from .models.mpnet import MPNET_PRETRAINED_CONFIG_ARCHIVE_MAP, MPNetConfig, MPNetTokenizer
from .models.mt5 import MT5Config
from .models.nystromformer import NYSTROMFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, NystromformerConfig
from .models.openai import OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP, OpenAIGPTConfig, OpenAIGPTTokenizer
from .models.pegasus import PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP, PegasusConfig, PegasusTokenizer
from .models.perceiver import PERCEIVER_PRETRAINED_CONFIG_ARCHIVE_MAP, PerceiverConfig, PerceiverTokenizer
Expand Down Expand Up @@ -3038,6 +3056,17 @@
MPNetPreTrainedModel,
)
from .models.mt5 import MT5EncoderModel, MT5ForConditionalGeneration, MT5Model
from .models.nystromformer import (
NYSTROMFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
NystromformerForMaskedLM,
NystromformerForMultipleChoice,
NystromformerForQuestionAnswering,
NystromformerForSequenceClassification,
NystromformerForTokenClassification,
NystromformerLayer,
NystromformerModel,
NystromformerPreTrainedModel,
)
from .models.openai import (
OPENAI_GPT_PRETRAINED_MODEL_ARCHIVE_LIST,
OpenAIGPTDoubleHeadsModel,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@
mobilebert,
mpnet,
mt5,
nystromformer,
openai,
pegasus,
perceiver,
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
CONFIG_MAPPING_NAMES = OrderedDict(
[
# Add configs here
("nystromformer", "NystromformerConfig"),
("imagegpt", "ImageGPTConfig"),
("qdqbert", "QDQBertConfig"),
("vision-encoder-decoder", "VisionEncoderDecoderConfig"),
Expand Down Expand Up @@ -116,6 +117,7 @@
CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
[
# Add archive maps here
("nystromformer", "NYSTROMFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("imagegpt", "IMAGEGPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("qdqbert", "QDQBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("fnet", "FNET_PRETRAINED_CONFIG_ARCHIVE_MAP"),
Expand Down Expand Up @@ -190,6 +192,7 @@
MODEL_NAMES_MAPPING = OrderedDict(
[
# Add full (and cased) model names here
("nystromformer", "Nystromformer"),
("imagegpt", "ImageGPT"),
("qdqbert", "QDQBert"),
("vision-encoder-decoder", "Vision Encoder decoder"),
Expand Down
Loading