-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Add OPT #17088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OPT #17088
Changes from 150 commits
c8cf718
9ee623d
0484ca1
b931db8
681dfc5
1e21983
1363221
8427279
5e8e2f5
be0e434
51db79e
99001d3
a777bbc
38f7463
c6f3a69
30d3db2
f903445
bb4ab4a
2a6e288
cb853fd
337e71f
0d9130f
290b7f0
096eb74
020843a
c63d9f8
8b6e496
0303f2b
2c0327d
4aa6ab2
752f512
14eeb13
9c96f09
54fc962
06f42ca
76e52ac
556c2f4
1460025
db100a5
d16d40d
c10f347
f1fe820
9b9c65b
4fb9608
ab57047
0c1c791
ac50b44
1505de5
8ace67b
80296cb
752c1d2
77e6e04
1564dac
abd1f3c
23ff89c
5c5c858
41fad01
27b55c9
aebd19e
d0723aa
7575749
66e8298
8d4920e
c005840
84eb497
043a109
8ba7cbc
2099b5f
1c9580f
9f6291d
740fcf5
f8c276b
fff035f
30ed9f6
69c7ae6
ff09958
0555b92
5491431
521822f
61e8023
7b27a91
26729d7
7661453
25a40b1
aefa63d
f3b5e24
0365e27
929be23
f6b032b
d633832
85ce8e8
d6fc7f3
95d7ead
412bdab
974d44c
cc1b4c9
156866d
849afd3
1acb47a
668246b
769b9d6
def917e
5131932
5ec0766
2ed32a8
1db5f2b
f57a0b5
5f96836
70c2196
49e905d
2c1bce4
9c3f0c0
29987ed
145838f
e2c932b
3bf333d
b24ac4b
2e1d4f4
6737d09
994c104
136983b
6834c7b
014674d
2c7102b
598ef8d
0a58092
66c807a
fd91198
dfb00c0
f6c587c
7923a46
192c407
ab1c4fb
0215920
6de5a2d
4dbd565
484f24f
e3b7c4b
46f6401
27437f7
c325b0a
5fc7b7b
133a465
109abdc
d69db00
51cba40
6554537
200ac36
368620a
e1bbc22
e53e8f7
4c1e494
4c9c360
6055da9
22c89b4
776b42c
c39f5bd
d8070cd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| <!--Copyright 2022 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
| --> | ||
|
|
||
| # OPT | ||
|
|
||
| ## Overview | ||
|
|
||
| The OPT model was proposed in [Open Pre-trained Transformer Language Models](https://arxiv.org/pdf/2205.01068) by Meta AI. | ||
| OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3. | ||
|
|
||
|
|
||
| The abstract from the paper is the following: | ||
|
|
||
| *Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.* | ||
|
|
||
| Tips: | ||
|
|
||
| If you want to use the `opt-350m`, you need to set the `do_layer_norm_before` parameter to `False`. | ||
patrickvonplaten marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| This model was contributed by [Arthur Zucker](https://huggingface.co/ArthurZ), [Younes Belkada](https://huggingface.co/ybelkada), and [Patrick Von Platen](https://huggingface.co/patrickvonplaten). | ||
| The original code can be found [here](https://github.com/facebookresearch/metaseq). | ||
|
|
||
|
|
||
| ## OPTConfig | ||
|
|
||
| [[autodoc]] OPTConfig | ||
|
|
||
| ## OPTModel | ||
|
|
||
| [[autodoc]] OPTModel | ||
| - forward | ||
|
|
||
|
|
||
| ## OPTForCausalLM | ||
|
|
||
| [[autodoc]] OPTForCausalLM | ||
| - forward | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -86,6 +86,7 @@ | |
| mt5, | ||
| nystromformer, | ||
| openai, | ||
| opt, | ||
| pegasus, | ||
| perceiver, | ||
| phobert, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -162,20 +162,26 @@ def __init__( | |
| unk_token="<|endoftext|>", | ||
| bos_token="<|endoftext|>", | ||
| eos_token="<|endoftext|>", | ||
| pad_token=None, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to add good test here, but locally it worked fine. The correct tokenizer lives currently here: https://huggingface.co/patrickvonplaten/opt-30b Will move after review is 👍
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also checked that GPT2 is not affected, but would be great if you could double-check @LysandreJik @sgugger |
||
| add_prefix_space=False, | ||
| add_bos_token=False, | ||
| **kwargs | ||
| ): | ||
| bos_token = AddedToken(bos_token, lstrip=False, rstrip=False) if isinstance(bos_token, str) else bos_token | ||
| eos_token = AddedToken(eos_token, lstrip=False, rstrip=False) if isinstance(eos_token, str) else eos_token | ||
| unk_token = AddedToken(unk_token, lstrip=False, rstrip=False) if isinstance(unk_token, str) else unk_token | ||
| pad_token = AddedToken(pad_token, lstrip=False, rstrip=False) if isinstance(pad_token, str) else pad_token | ||
| super().__init__( | ||
| errors=errors, | ||
| unk_token=unk_token, | ||
| bos_token=bos_token, | ||
| eos_token=eos_token, | ||
| pad_token=pad_token, | ||
| add_prefix_space=add_prefix_space, | ||
| add_bos_token=add_bos_token, | ||
| **kwargs, | ||
| ) | ||
| self.add_bos_token = add_bos_token | ||
|
|
||
| with open(vocab_file, encoding="utf-8") as vocab_handle: | ||
| self.encoder = json.load(vocab_handle) | ||
|
|
@@ -242,6 +248,19 @@ def bpe(self, token): | |
| self.cache[token] = word | ||
| return word | ||
|
|
||
| def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None): | ||
| if self.add_bos_token: | ||
| bos_token_ids = [self.bos_token_id] | ||
| else: | ||
| bos_token_ids = [] | ||
|
|
||
| output = bos_token_ids + token_ids_0 | ||
|
|
||
| if token_ids_1 is None: | ||
| return output | ||
|
|
||
| return output + bos_token_ids + token_ids_1 | ||
|
|
||
| def _tokenize(self, text): | ||
| """Tokenize a string.""" | ||
| bpe_tokens = [] | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| # flake8: noqa | ||
| # There's no way to ignore "F401 '...' imported but unused" warnings in this | ||
| # module, but to preserve other warnings. So, don't check this module at all. | ||
|
|
||
| # Copyright 2022 The HuggingFace Team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from ...utils import _LazyModule, is_tokenizers_available, is_torch_available | ||
|
|
||
|
|
||
| _import_structure = { | ||
| "configuration_opt": ["OPT_PRETRAINED_CONFIG_ARCHIVE_MAP", "OPTConfig"], | ||
| } | ||
|
|
||
|
|
||
| if is_torch_available(): | ||
| _import_structure["modeling_opt"] = [ | ||
| "OPT_PRETRAINED_MODEL_ARCHIVE_LIST", | ||
| "OPTForCausalLM", | ||
| "OPTModel", | ||
| "OPTPretrainedModel", | ||
| ] | ||
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| from .configuration_opt import OPT_PRETRAINED_CONFIG_ARCHIVE_MAP, OPTConfig | ||
|
|
||
| if is_torch_available(): | ||
| from .modeling_opt import OPT_PRETRAINED_MODEL_ARCHIVE_LIST, OPTForCausalLM, OPTModel, OPTPretrainedModel | ||
|
|
||
| else: | ||
| import sys | ||
|
|
||
| sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__) |
Uh oh!
There was an error while loading. Please reload this page.