Skip to content

Commit 68d9251

Browse files
authored
Merge branch 'master' into master
2 parents 7480ded + 5cd7086 commit 68d9251

File tree

1 file changed

+103
-0
lines changed

1 file changed

+103
-0
lines changed

docs/source/model_summary.mdx

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,8 +70,12 @@ that at each position, the model can only look at the tokens before the attentio
7070
<a href="model_doc/openai-gpt">
7171
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-openai--gpt-blueviolet">
7272
</a>
73+
<a href="https://huggingface.co/spaces/akhaliq/openai-gpt">
74+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
75+
</a>
7376
</div>
7477

78+
7579
[Improving Language Understanding by Generative Pre-Training](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf), Alec Radford et al.
7680

7781
The first autoregressive model based on the transformer architecture, pretrained on the Book Corpus dataset.
@@ -88,8 +92,12 @@ classification.
8892
<a href="model_doc/gpt2">
8993
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-gpt2-blueviolet">
9094
</a>
95+
<a href="https://huggingface.co/spaces/akhaliq/gpt2">
96+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
97+
</a>
9198
</div>
9299

100+
93101
[Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf),
94102
Alec Radford et al.
95103

@@ -108,8 +116,12 @@ classification.
108116
<a href="model_doc/ctrl">
109117
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-ctrl-blueviolet">
110118
</a>
119+
<a href="https://huggingface.co/spaces/akhaliq/tiny-ctrl">
120+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
121+
</a>
111122
</div>
112123

124+
113125
[CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858),
114126
Nitish Shirish Keskar et al.
115127

@@ -128,8 +140,12 @@ The library provides a version of the model for language modeling only.
128140
<a href="model_doc/transfo-xl">
129141
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-transfo--xl-blueviolet">
130142
</a>
143+
<a href="https://huggingface.co/spaces/akhaliq/transfo-xl-wt103">
144+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
145+
</a>
131146
</div>
132147

148+
133149
[Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860), Zihang
134150
Dai et al.
135151

@@ -158,8 +174,12 @@ The library provides a version of the model for language modeling only.
158174
<a href="model_doc/reformer">
159175
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-reformer-blueviolet">
160176
</a>
177+
<a href="https://huggingface.co/spaces/akhaliq/reformer-crime-and-punishment">
178+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
179+
</a>
161180
</div>
162181

182+
163183
[Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451), Nikita Kitaev et al .
164184

165185
An autoregressive transformer model with lots of tricks to reduce memory footprint and compute time. Those tricks
@@ -195,8 +215,12 @@ The library provides a version of the model for language modeling only.
195215
<a href="model_doc/xlnet">
196216
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlnet-blueviolet">
197217
</a>
218+
<a href="https://huggingface.co/spaces/akhaliq/xlnet-base-cased">
219+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
220+
</a>
198221
</div>
199222

223+
200224
[XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237), Zhilin
201225
Yang et al.
202226

@@ -229,6 +253,9 @@ corrupted versions.
229253
<a href="model_doc/bert">
230254
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-bert-blueviolet">
231255
</a>
256+
<a href="https://huggingface.co/spaces/akhaliq/bert-base-uncased">
257+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
258+
</a>
232259
</div>
233260

234261
[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805),
@@ -257,8 +284,12 @@ token classification, sentence classification, multiple choice classification an
257284
<a href="model_doc/albert">
258285
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-albert-blueviolet">
259286
</a>
287+
<a href="https://huggingface.co/spaces/akhaliq/albert-base-v2">
288+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
289+
</a>
260290
</div>
261291

292+
262293
[ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942),
263294
Zhenzhong Lan et al.
264295

@@ -285,8 +316,12 @@ classification, multiple choice classification and question answering.
285316
<a href="model_doc/roberta">
286317
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-roberta-blueviolet">
287318
</a>
319+
<a href="https://huggingface.co/spaces/akhaliq/roberta-base">
320+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
321+
</a>
288322
</div>
289323

324+
290325
[RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692), Yinhan Liu et al.
291326

292327
Same as BERT with better pretraining tricks:
@@ -309,8 +344,12 @@ classification, multiple choice classification and question answering.
309344
<a href="model_doc/distilbert">
310345
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-distilbert-blueviolet">
311346
</a>
347+
<a href="https://huggingface.co/spaces/akhaliq/distilbert-base-uncased">
348+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
349+
</a>
312350
</div>
313351

352+
314353
[DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108),
315354
Victor Sanh et al.
316355

@@ -333,8 +372,12 @@ and question answering.
333372
<a href="model_doc/convbert">
334373
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-convbert-blueviolet">
335374
</a>
375+
<a href="https://huggingface.co/spaces/akhaliq/conv-bert-base">
376+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
377+
</a>
336378
</div>
337379

380+
338381
[ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496), Zihang Jiang,
339382
Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
340383

@@ -362,8 +405,12 @@ and question answering.
362405
<a href="model_doc/xlm">
363406
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlm-blueviolet">
364407
</a>
408+
<a href="https://huggingface.co/spaces/akhaliq/xlm-mlm-en-2048">
409+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
410+
</a>
365411
</div>
366412

413+
367414
[Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291), Guillaume Lample and Alexis Conneau
368415

369416
A transformer model trained on several languages. There are three different type of training for this model and the
@@ -395,8 +442,12 @@ question answering.
395442
<a href="model_doc/xlm-roberta">
396443
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlm--roberta-blueviolet">
397444
</a>
445+
<a href="https://huggingface.co/spaces/akhaliq/xlm-roberta-base">
446+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
447+
</a>
398448
</div>
399449

450+
400451
[Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116), Alexis Conneau et
401452
al.
402453

@@ -416,8 +467,12 @@ classification, multiple choice classification and question answering.
416467
<a href="model_doc/flaubert">
417468
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-flaubert-blueviolet">
418469
</a>
470+
<a href="https://huggingface.co/spaces/akhaliq/flaubert_small_cased">
471+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
472+
</a>
419473
</div>
420474

475+
421476
[FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372), Hang Le et al.
422477

423478
Like RoBERTa, without the sentence ordering prediction (so just trained on the MLM objective).
@@ -433,8 +488,12 @@ The library provides a version of the model for language modeling and sentence c
433488
<a href="model_doc/electra">
434489
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-electra-blueviolet">
435490
</a>
491+
<a href="https://huggingface.co/spaces/akhaliq/electra_large_discriminator_squad2_512">
492+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
493+
</a>
436494
</div>
437495

496+
438497
[ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://arxiv.org/abs/2003.10555),
439498
Kevin Clark et al.
440499

@@ -456,8 +515,12 @@ classification.
456515
<a href="model_doc/funnel">
457516
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-funnel-blueviolet">
458517
</a>
518+
<a href="https://huggingface.co/spaces/akhaliq/funnel-transformer-small">
519+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
520+
</a>
459521
</div>
460522

523+
461524
[Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236), Zihang Dai et al.
462525

463526
Funnel Transformer is a transformer model using pooling, a bit like a ResNet model: layers are grouped in blocks, and
@@ -488,8 +551,12 @@ classification, multiple choice classification and question answering.
488551
<a href="model_doc/longformer">
489552
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-longformer-blueviolet">
490553
</a>
554+
<a href="https://huggingface.co/spaces/akhaliq/longformer-base-4096-finetuned-squadv1">
555+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
556+
</a>
491557
</div>
492558

559+
493560
[Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150), Iz Beltagy et al.
494561

495562
A transformer model replacing the attention matrices by sparse matrices to go faster. Often, the local context (e.g.,
@@ -526,8 +593,12 @@ As mentioned before, these models keep both the encoder and the decoder of the o
526593
<a href="model_doc/bart">
527594
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-bart-blueviolet">
528595
</a>
596+
<a href="https://huggingface.co/spaces/akhaliq/bart-large-mnli">
597+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
598+
</a>
529599
</div>
530600

601+
531602
[BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461), Mike Lewis et al.
532603

533604
Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is
@@ -551,8 +622,12 @@ The library provides a version of this model for conditional generation and sequ
551622
<a href="model_doc/pegasus">
552623
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-pegasus-blueviolet">
553624
</a>
625+
<a href="https://huggingface.co/spaces/akhaliq/pegasus_paraphrase">
626+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
627+
</a>
554628
</div>
555629

630+
556631
[PEGASUS: Pre-training with Extracted Gap-sentences forAbstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf), Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019.
557632

558633
Sequence-to-sequence model with the same encoder-decoder model architecture as BART. Pegasus is pre-trained jointly on
@@ -580,8 +655,12 @@ The library provides a version of this model for conditional generation, which s
580655
<a href="model_doc/marian">
581656
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-marian-blueviolet">
582657
</a>
658+
<a href="https://huggingface.co/spaces/akhaliq/opus-mt-zh-en">
659+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
660+
</a>
583661
</div>
584662

663+
585664
[Marian: Fast Neural Machine Translation in C++](https://arxiv.org/abs/1804.00344), Marcin Junczys-Dowmunt et al.
586665

587666
A framework for translation models, using the same models as BART
@@ -598,8 +677,12 @@ The library provides a version of this model for conditional generation.
598677
<a href="model_doc/t5">
599678
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-t5-blueviolet">
600679
</a>
680+
<a href="https://huggingface.co/spaces/akhaliq/t5-base">
681+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
682+
</a>
601683
</div>
602684

685+
603686
[Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683), Colin Raffel et al.
604687

605688
Uses the traditional transformer model (with a slight change in the positional embeddings, which are learned at each
@@ -629,8 +712,12 @@ The library provides a version of this model for conditional generation.
629712
<a href="model_doc/mt5">
630713
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-mt5-blueviolet">
631714
</a>
715+
<a href="https://huggingface.co/spaces/akhaliq/mt5-small-finetuned-arxiv-cs-finetuned-arxiv-cs-full">
716+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
717+
</a>
632718
</div>
633719

720+
634721
[mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934), Linting Xue
635722
et al.
636723

@@ -649,8 +736,12 @@ The library provides a version of this model for conditional generation.
649736
<a href="model_doc/mbart">
650737
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-mbart-blueviolet">
651738
</a>
739+
<a href="https://huggingface.co/spaces/akhaliq/mbart-large-50-one-to-many-mmt">
740+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
741+
</a>
652742
</div>
653743

744+
654745
[Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu,
655746
Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
656747

@@ -677,8 +768,12 @@ finetuning.
677768
<a href="model_doc/prophetnet">
678769
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-prophetnet-blueviolet">
679770
</a>
771+
<a href="https://huggingface.co/spaces/akhaliq/prophetnet-large-uncased">
772+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
773+
</a>
680774
</div>
681775

776+
682777
[ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training,](https://arxiv.org/abs/2001.04063) by
683778
Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
684779

@@ -701,8 +796,12 @@ summarization.
701796
<a href="model_doc/xlm-prophetnet">
702797
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xprophetnet-blueviolet">
703798
</a>
799+
<a href="https://huggingface.co/spaces/akhaliq/xprophetnet-large-wiki100-cased-xglue-ntg">
800+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
801+
</a>
704802
</div>
705803

804+
706805
[ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training,](https://arxiv.org/abs/2001.04063) by
707806
Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
708807

@@ -753,8 +852,12 @@ Some models use documents retrieval during (pre)training and inference for open-
753852
<a href="model_doc/dpr">
754853
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-dpr-blueviolet">
755854
</a>
855+
<a href="https://huggingface.co/spaces/akhaliq/dpr-question_encoder-bert-base-multilingual">
856+
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
857+
</a>
756858
</div>
757859

860+
758861
[Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906), Vladimir Karpukhin et
759862
al.
760863

0 commit comments

Comments
 (0)