Skip to content

Commit b971c76

Browse files
younesbelkadapatrickvonplatenArthurZuckersguggerLysandreJik
authored
Add OPT (#17088)
* First version - OPT model * Final changes - putting use cache to False * few changes - remove commented block * few changes - remove unecessary files * fix style issues * few changes - remove a test file - added the logits test * Update src/transformers/models/auto/tokenization_auto.py Co-authored-by: Patrick von Platen <[email protected]> * add gen tests * few changes - rm mask filling example on docstring * few changes - remove useless args * some changes - more tests should pass now - needs to clean more - documentation still needs to be done * fix code quality * major changes - change attention architecture to BART-like - modify some tests - style fix * rm useless classes - remove opt for: - QA - cond generation - seq classif * Removed autodoc calls to non-existant classes TOkenizers are not implemented * Update src/transformers/__init__.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/__init__.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/modeling_tf_auto.py Co-authored-by: Arthur <[email protected]> * Replaced OPTTokeniser with GPT2 tokenizer * added GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer") * Removed OPTTokenizer * make style * Make style replaces ``` ...).unsqueeze(``` by ``` >>>).unsqueeze(``` * make repo consistency * Removed PretrainedOPTModel * fix opt.mdx removed other heads * fix init, removed 3 heads * removed heads * finished cleaning head * removed seauence classif and question answering * removed unused imports * removed useless dummy object for QA, SC and CG * removed tests for removed useless dummy object for QA, SC and CG * Removed head_mask using encoder layers which don't exist * fixed test * fix line * added OPT to toctree * Updated model path with pushed weigths * fix model path * fixed code quality * fixed embeddings and generation tests * update paths * clean comments * removed OPTClassificationHead for sentence classification * renamed hidden layer * renamed num layers to standard num_hidden_layers * num_attention_heads fix * changes for 125m * add first version for 125m * add first version - flax * add new version * causal LM output * replace output type with BaseModelOutputWithPastAndCrossAttentions * revert working config from 150m to 350m * clean * removed decoder input ids * fixed embed dim * more embed_dim issues * make style + removed enc_dec test * update falx model * removed troublesome copy * added is_encoder_decoder=False to config * added set_input emb fuinction to model class * requires torch on embed test * use head mask instead of decoder head mask input param solves a test * 8 test remaining, update * Updated create_and_check_decoder_model_past_large_inputs * Make style * update op tokenizer with condition * make style * See if I can push * some clean up * remove linear head hack * save intermediate * save correct attention * add copied from from bart * Update src/transformers/models/opt/modeling_opt.py Co-authored-by: Patrick von Platen <[email protected]> * fix part of the reviewss Co-authored-by: Patrick von Platen <[email protected]> * same changes in naming / conversion * correct mask * more fixes * delete FlaxOPT and TfOPT * clean traces of Flax and Tf * fix mask * fixed positionnal embedding length when past key value is provoded * get 125m, 6.7b to work * Added do_layer_norm * solved mismatch in load dictionnary * clean up preapre opt input dict * fixed past key value as bool * fix previus * fixed return dict False tuple issue * All tests are passing * Make style * Ignore OPTDecoder non tested * make fix-copies * make repo consistency * small fix * removed uselss @torch.no_grad decorator * make styl;e * fix previous opt test * style * make style * added opt documentation * update OPT_PRETRAINED_MODEL_ARCHIVE_LIST * up * more fixes * model & config work * Update src/transformers/models/opt/modeling_opt.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/opt/modeling_opt.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/opt/modeling_opt.py Co-authored-by: Patrick von Platen <[email protected]> * added comment on padding hack (+2) * cleaup * review update * docstring for missing arg * Update docs/source/en/model_doc/opt.mdx Co-authored-by: Patrick von Platen <[email protected]> * Update docs/source/en/model_doc/opt.mdx Co-authored-by: Patrick von Platen <[email protected]> * Update docs/source/en/model_doc/opt.mdx Co-authored-by: Patrick von Platen <[email protected]> * Update src/transformers/models/opt/__init__.py Co-authored-by: Patrick von Platen <[email protected]> * update pretrained map * update path and tests * make style * styling * make consistency * add gpt2 tok new * more tok fixes * Update src/transformers/models/auto/tokenization_auto.py * Update docs/source/en/model_doc/opt.mdx Co-authored-by: Sylvain Gugger <[email protected]> * Update docs/source/en/model_doc/opt.mdx Co-authored-by: Sylvain Gugger <[email protected]> * Update docs/source/en/model_doc/opt.mdx Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/opt/modeling_opt.py Co-authored-by: Sylvain Gugger <[email protected]> * Update tests/models/opt/test_modeling_opt.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/opt/modeling_opt.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/opt/modeling_opt.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/opt/modeling_opt.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/opt/modeling_opt.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/opt/modeling_opt.py Co-authored-by: Sylvain Gugger <[email protected]> * Update based on reviews * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * make style * make tokenizer auto tests pass * apply Lysandre suggestion * finish tests * add some good tokenizer tests * improve docs slighly Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Arthur <[email protected]> Co-authored-by: ArthurZucker <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>
1 parent 8c7481f commit b971c76

22 files changed

+1834
-1
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -294,6 +294,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
294294
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
295295
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
296296
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
297+
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
297298
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
298299
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
299300
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.

README_ko.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -273,6 +273,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
273273
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
274274
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
275275
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
276+
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
276277
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
277278
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
278279
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.

README_zh-hans.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,7 @@ conda install -c huggingface transformers
297297
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (来自 Microsoft Research) 伴随论文 [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 发布。
298298
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (来自 Google AI) 伴随论文 [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 发布。
299299
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (来自 the University of Wisconsin - Madison) 伴随论文 [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) 由 Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh 发布。
300+
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (来自 Meta AI) 伴随论文 [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) 由 Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al 发布。
300301
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (来自 Google) 伴随论文 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。
301302
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (来自 Deepmind) 伴随论文 [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 发布。
302303
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (来自 VinAI Research) 伴随论文 [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) 由 Dat Quoc Nguyen and Anh Tuan Nguyen 发布。

README_zh-hant.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -309,6 +309,7 @@ conda install -c huggingface transformers
309309
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
310310
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
311311
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
312+
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
312313
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
313314
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
314315
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,8 @@
270270
title: Nyströmformer
271271
- local: model_doc/openai-gpt
272272
title: OpenAI GPT
273+
- local: model_doc/opt
274+
title: OPT
273275
- local: model_doc/gpt2
274276
title: OpenAI GPT2
275277
- local: model_doc/gptj

docs/source/en/index.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ The library currently contains JAX, PyTorch and TensorFlow implementations, pret
115115
1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
116116
1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
117117
1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
118+
1. **[OPT](master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
118119
1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
119120
1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
120121
1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
@@ -231,6 +232,7 @@ Flax), PyTorch, and/or TensorFlow.
231232
| Nystromformer | | | | | |
232233
| OpenAI GPT | | | | | |
233234
| OpenAI GPT-2 | | | | | |
235+
| OPT | | | | | |
234236
| Pegasus | | | | | |
235237
| Perceiver | | | | | |
236238
| PLBart | | | | | |

docs/source/en/model_doc/opt.mdx

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# OPT
14+
15+
## Overview
16+
17+
The OPT model was proposed in [Open Pre-trained Transformer Language Models](https://arxiv.org/pdf/2205.01068) by Meta AI.
18+
OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.
19+
20+
21+
The abstract from the paper is the following:
22+
23+
*Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.*
24+
25+
Tips:
26+
- OPT has the same architecture as [`BartDecoder`].
27+
- Contrary to GPT2, OPT adds the EOS token `</s>` to the beginning of every prompt. **Note**: Make sure to pass `use_fast=False` when loading OPT's tokenizer with [`AutoTokenizer`] to get the correct tokenizer.
28+
29+
This model was contributed by [Arthur Zucker](https://huggingface.co/ArthurZ), [Younes Belkada](https://huggingface.co/ybelkada), and [Patrick Von Platen](https://huggingface.co/patrickvonplaten).
30+
The original code can be found [here](https://github.com/facebookresearch/metaseq).
31+
32+
33+
## OPTConfig
34+
35+
[[autodoc]] OPTConfig
36+
37+
## OPTModel
38+
39+
[[autodoc]] OPTModel
40+
- forward
41+
42+
43+
## OPTForCausalLM
44+
45+
[[autodoc]] OPTForCausalLM
46+
- forward
47+

src/transformers/__init__.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,7 @@
247247
"NystromformerConfig",
248248
],
249249
"models.openai": ["OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP", "OpenAIGPTConfig", "OpenAIGPTTokenizer"],
250+
"models.opt": ["OPTConfig"],
250251
"models.pegasus": ["PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP", "PegasusConfig", "PegasusTokenizer"],
251252
"models.perceiver": ["PERCEIVER_PRETRAINED_CONFIG_ARCHIVE_MAP", "PerceiverConfig", "PerceiverTokenizer"],
252253
"models.phobert": ["PhobertTokenizer"],
@@ -1323,6 +1324,14 @@
13231324
"load_tf_weights_in_openai_gpt",
13241325
]
13251326
)
1327+
_import_structure["models.opt"].extend(
1328+
[
1329+
"OPT_PRETRAINED_MODEL_ARCHIVE_LIST",
1330+
"OPTForCausalLM",
1331+
"OPTModel",
1332+
"OPTPreTrainedModel",
1333+
]
1334+
)
13261335
_import_structure["models.pegasus"].extend(
13271336
["PegasusForCausalLM", "PegasusForConditionalGeneration", "PegasusModel", "PegasusPreTrainedModel"]
13281337
)
@@ -2373,7 +2382,6 @@
23732382
"FlaxBartPreTrainedModel",
23742383
]
23752384
)
2376-
23772385
_import_structure["models.beit"].extend(
23782386
[
23792387
"FlaxBeitForImageClassification",
@@ -2382,6 +2390,7 @@
23822390
"FlaxBeitPreTrainedModel",
23832391
]
23842392
)
2393+
23852394
_import_structure["models.bert"].extend(
23862395
[
23872396
"FlaxBertForCausalLM",
@@ -2718,6 +2727,7 @@
27182727
from .models.mt5 import MT5Config
27192728
from .models.nystromformer import NYSTROMFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, NystromformerConfig
27202729
from .models.openai import OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP, OpenAIGPTConfig, OpenAIGPTTokenizer
2730+
from .models.opt import OPTConfig
27212731
from .models.pegasus import PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP, PegasusConfig, PegasusTokenizer
27222732
from .models.perceiver import PERCEIVER_PRETRAINED_CONFIG_ARCHIVE_MAP, PerceiverConfig, PerceiverTokenizer
27232733
from .models.phobert import PhobertTokenizer
@@ -3630,6 +3640,7 @@
36303640
OpenAIGPTPreTrainedModel,
36313641
load_tf_weights_in_openai_gpt,
36323642
)
3643+
from .models.opt import OPT_PRETRAINED_MODEL_ARCHIVE_LIST, OPTForCausalLM, OPTModel, OPTPreTrainedModel
36333644
from .models.pegasus import (
36343645
PegasusForCausalLM,
36353646
PegasusForConditionalGeneration,

src/transformers/models/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@
8787
mt5,
8888
nystromformer,
8989
openai,
90+
opt,
9091
pegasus,
9192
perceiver,
9293
phobert,

src/transformers/models/auto/configuration_auto.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@
9898
("megatron-bert", "MegatronBertConfig"),
9999
("mpnet", "MPNetConfig"),
100100
("bart", "BartConfig"),
101+
("opt", "OPTConfig"),
101102
("blenderbot", "BlenderbotConfig"),
102103
("reformer", "ReformerConfig"),
103104
("longformer", "LongformerConfig"),
@@ -190,6 +191,7 @@
190191
("blenderbot-small", "BLENDERBOT_SMALL_PRETRAINED_CONFIG_ARCHIVE_MAP"),
191192
("bert", "BERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
192193
("bart", "BART_PRETRAINED_CONFIG_ARCHIVE_MAP"),
194+
("opt", "OPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
193195
("blenderbot", "BLENDERBOT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
194196
("mbart", "MBART_PRETRAINED_CONFIG_ARCHIVE_MAP"),
195197
("openai-gpt", "OPENAI_GPT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
@@ -301,6 +303,7 @@
301303
("mbart", "mBART"),
302304
("megatron-bert", "MegatronBert"),
303305
("bart", "BART"),
306+
("opt", "OPT"),
304307
("reformer", "Reformer"),
305308
("longformer", "Longformer"),
306309
("roberta", "RoBERTa"),

0 commit comments

Comments
 (0)