@@ -70,8 +70,12 @@ that at each position, the model can only look at the tokens before the attentio
7070<a href = " model_doc/openai-gpt" >
7171<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-openai--gpt-blueviolet" >
7272</a >
73+ <a href = " https://huggingface.co/spaces/akhaliq/openai-gpt" >
74+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
75+ </a >
7376</div >
7477
78+
7579[ Improving Language Understanding by Generative Pre-Training] ( https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf ) , Alec Radford et al.
7680
7781The first autoregressive model based on the transformer architecture, pretrained on the Book Corpus dataset.
@@ -88,8 +92,12 @@ classification.
8892<a href = " model_doc/gpt2" >
8993<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-gpt2-blueviolet" >
9094</a >
95+ <a href = " https://huggingface.co/spaces/akhaliq/gpt2" >
96+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
97+ </a >
9198</div >
9299
100+
93101[ Language Models are Unsupervised Multitask Learners] ( https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf ) ,
94102Alec Radford et al.
95103
@@ -108,8 +116,12 @@ classification.
108116<a href = " model_doc/ctrl" >
109117<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-ctrl-blueviolet" >
110118</a >
119+ <a href = " https://huggingface.co/spaces/akhaliq/tiny-ctrl" >
120+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
121+ </a >
111122</div >
112123
124+
113125[ CTRL: A Conditional Transformer Language Model for Controllable Generation] ( https://arxiv.org/abs/1909.05858 ) ,
114126Nitish Shirish Keskar et al.
115127
@@ -128,8 +140,12 @@ The library provides a version of the model for language modeling only.
128140<a href = " model_doc/transfo-xl" >
129141<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-transfo--xl-blueviolet" >
130142</a >
143+ <a href = " https://huggingface.co/spaces/akhaliq/transfo-xl-wt103" >
144+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
145+ </a >
131146</div >
132147
148+
133149[ Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context] ( https://arxiv.org/abs/1901.02860 ) , Zihang
134150Dai et al.
135151
@@ -158,8 +174,12 @@ The library provides a version of the model for language modeling only.
158174<a href = " model_doc/reformer" >
159175<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-reformer-blueviolet" >
160176</a >
177+ <a href = " https://huggingface.co/spaces/akhaliq/reformer-crime-and-punishment" >
178+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
179+ </a >
161180</div >
162181
182+
163183[ Reformer: The Efficient Transformer] ( https://arxiv.org/abs/2001.04451 ) , Nikita Kitaev et al .
164184
165185An autoregressive transformer model with lots of tricks to reduce memory footprint and compute time. Those tricks
@@ -195,8 +215,12 @@ The library provides a version of the model for language modeling only.
195215<a href = " model_doc/xlnet" >
196216<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-xlnet-blueviolet" >
197217</a >
218+ <a href = " https://huggingface.co/spaces/akhaliq/xlnet-base-cased" >
219+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
220+ </a >
198221</div >
199222
223+
200224[ XLNet: Generalized Autoregressive Pretraining for Language Understanding] ( https://arxiv.org/abs/1906.08237 ) , Zhilin
201225Yang et al.
202226
@@ -229,6 +253,9 @@ corrupted versions.
229253<a href = " model_doc/bert" >
230254<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-bert-blueviolet" >
231255</a >
256+ <a href = " https://huggingface.co/spaces/akhaliq/bert-base-uncased" >
257+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
258+ </a >
232259</div >
233260
234261[ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding] ( https://arxiv.org/abs/1810.04805 ) ,
@@ -257,8 +284,12 @@ token classification, sentence classification, multiple choice classification an
257284<a href = " model_doc/albert" >
258285<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-albert-blueviolet" >
259286</a >
287+ <a href = " https://huggingface.co/spaces/akhaliq/albert-base-v2" >
288+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
289+ </a >
260290</div >
261291
292+
262293[ ALBERT: A Lite BERT for Self-supervised Learning of Language Representations] ( https://arxiv.org/abs/1909.11942 ) ,
263294Zhenzhong Lan et al.
264295
@@ -285,8 +316,12 @@ classification, multiple choice classification and question answering.
285316<a href = " model_doc/roberta" >
286317<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-roberta-blueviolet" >
287318</a >
319+ <a href = " https://huggingface.co/spaces/akhaliq/roberta-base" >
320+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
321+ </a >
288322</div >
289323
324+
290325[ RoBERTa: A Robustly Optimized BERT Pretraining Approach] ( https://arxiv.org/abs/1907.11692 ) , Yinhan Liu et al.
291326
292327Same as BERT with better pretraining tricks:
@@ -309,8 +344,12 @@ classification, multiple choice classification and question answering.
309344<a href = " model_doc/distilbert" >
310345<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-distilbert-blueviolet" >
311346</a >
347+ <a href = " https://huggingface.co/spaces/akhaliq/distilbert-base-uncased" >
348+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
349+ </a >
312350</div >
313351
352+
314353[ DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter] ( https://arxiv.org/abs/1910.01108 ) ,
315354Victor Sanh et al.
316355
@@ -333,8 +372,12 @@ and question answering.
333372<a href = " model_doc/convbert" >
334373<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-convbert-blueviolet" >
335374</a >
375+ <a href = " https://huggingface.co/spaces/akhaliq/conv-bert-base" >
376+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
377+ </a >
336378</div >
337379
380+
338381[ ConvBERT: Improving BERT with Span-based Dynamic Convolution] ( https://arxiv.org/abs/2008.02496 ) , Zihang Jiang,
339382Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
340383
@@ -362,8 +405,12 @@ and question answering.
362405<a href = " model_doc/xlm" >
363406<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-xlm-blueviolet" >
364407</a >
408+ <a href = " https://huggingface.co/spaces/akhaliq/xlm-mlm-en-2048" >
409+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
410+ </a >
365411</div >
366412
413+
367414[ Cross-lingual Language Model Pretraining] ( https://arxiv.org/abs/1901.07291 ) , Guillaume Lample and Alexis Conneau
368415
369416A transformer model trained on several languages. There are three different type of training for this model and the
@@ -395,8 +442,12 @@ question answering.
395442<a href = " model_doc/xlm-roberta" >
396443<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-xlm--roberta-blueviolet" >
397444</a >
445+ <a href = " https://huggingface.co/spaces/akhaliq/xlm-roberta-base" >
446+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
447+ </a >
398448</div >
399449
450+
400451[ Unsupervised Cross-lingual Representation Learning at Scale] ( https://arxiv.org/abs/1911.02116 ) , Alexis Conneau et
401452al.
402453
@@ -416,8 +467,12 @@ classification, multiple choice classification and question answering.
416467<a href = " model_doc/flaubert" >
417468<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-flaubert-blueviolet" >
418469</a >
470+ <a href = " https://huggingface.co/spaces/akhaliq/flaubert_small_cased" >
471+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
472+ </a >
419473</div >
420474
475+
421476[ FlauBERT: Unsupervised Language Model Pre-training for French] ( https://arxiv.org/abs/1912.05372 ) , Hang Le et al.
422477
423478Like RoBERTa, without the sentence ordering prediction (so just trained on the MLM objective).
@@ -433,8 +488,12 @@ The library provides a version of the model for language modeling and sentence c
433488<a href = " model_doc/electra" >
434489<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-electra-blueviolet" >
435490</a >
491+ <a href = " https://huggingface.co/spaces/akhaliq/electra_large_discriminator_squad2_512" >
492+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
493+ </a >
436494</div >
437495
496+
438497[ ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators] ( https://arxiv.org/abs/2003.10555 ) ,
439498Kevin Clark et al.
440499
@@ -456,8 +515,12 @@ classification.
456515<a href = " model_doc/funnel" >
457516<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-funnel-blueviolet" >
458517</a >
518+ <a href = " https://huggingface.co/spaces/akhaliq/funnel-transformer-small" >
519+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
520+ </a >
459521</div >
460522
523+
461524[ Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing] ( https://arxiv.org/abs/2006.03236 ) , Zihang Dai et al.
462525
463526Funnel Transformer is a transformer model using pooling, a bit like a ResNet model: layers are grouped in blocks, and
@@ -488,8 +551,12 @@ classification, multiple choice classification and question answering.
488551<a href = " model_doc/longformer" >
489552<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-longformer-blueviolet" >
490553</a >
554+ <a href = " https://huggingface.co/spaces/akhaliq/longformer-base-4096-finetuned-squadv1" >
555+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
556+ </a >
491557</div >
492558
559+
493560[ Longformer: The Long-Document Transformer] ( https://arxiv.org/abs/2004.05150 ) , Iz Beltagy et al.
494561
495562A transformer model replacing the attention matrices by sparse matrices to go faster. Often, the local context (e.g.,
@@ -526,8 +593,12 @@ As mentioned before, these models keep both the encoder and the decoder of the o
526593<a href = " model_doc/bart" >
527594<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-bart-blueviolet" >
528595</a >
596+ <a href = " https://huggingface.co/spaces/akhaliq/bart-large-mnli" >
597+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
598+ </a >
529599</div >
530600
601+
531602[ BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension] ( https://arxiv.org/abs/1910.13461 ) , Mike Lewis et al.
532603
533604Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is
@@ -551,8 +622,12 @@ The library provides a version of this model for conditional generation and sequ
551622<a href = " model_doc/pegasus" >
552623<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-pegasus-blueviolet" >
553624</a >
625+ <a href = " https://huggingface.co/spaces/akhaliq/pegasus_paraphrase" >
626+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
627+ </a >
554628</div >
555629
630+
556631[ PEGASUS: Pre-training with Extracted Gap-sentences forAbstractive Summarization] ( https://arxiv.org/pdf/1912.08777.pdf ) , Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019.
557632
558633Sequence-to-sequence model with the same encoder-decoder model architecture as BART. Pegasus is pre-trained jointly on
@@ -580,8 +655,12 @@ The library provides a version of this model for conditional generation, which s
580655<a href = " model_doc/marian" >
581656<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-marian-blueviolet" >
582657</a >
658+ <a href = " https://huggingface.co/spaces/akhaliq/opus-mt-zh-en" >
659+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
660+ </a >
583661</div >
584662
663+
585664[ Marian: Fast Neural Machine Translation in C++] ( https://arxiv.org/abs/1804.00344 ) , Marcin Junczys-Dowmunt et al.
586665
587666A framework for translation models, using the same models as BART
@@ -598,8 +677,12 @@ The library provides a version of this model for conditional generation.
598677<a href = " model_doc/t5" >
599678<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-t5-blueviolet" >
600679</a >
680+ <a href = " https://huggingface.co/spaces/akhaliq/t5-base" >
681+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
682+ </a >
601683</div >
602684
685+
603686[ Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer] ( https://arxiv.org/abs/1910.10683 ) , Colin Raffel et al.
604687
605688Uses the traditional transformer model (with a slight change in the positional embeddings, which are learned at each
@@ -629,8 +712,12 @@ The library provides a version of this model for conditional generation.
629712<a href = " model_doc/mt5" >
630713<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-mt5-blueviolet" >
631714</a >
715+ <a href = " https://huggingface.co/spaces/akhaliq/mt5-small-finetuned-arxiv-cs-finetuned-arxiv-cs-full" >
716+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
717+ </a >
632718</div >
633719
720+
634721[ mT5: A massively multilingual pre-trained text-to-text transformer] ( https://arxiv.org/abs/2010.11934 ) , Linting Xue
635722et al.
636723
@@ -649,8 +736,12 @@ The library provides a version of this model for conditional generation.
649736<a href = " model_doc/mbart" >
650737<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-mbart-blueviolet" >
651738</a >
739+ <a href = " https://huggingface.co/spaces/akhaliq/mbart-large-50-one-to-many-mmt" >
740+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
741+ </a >
652742</div >
653743
744+
654745[ Multilingual Denoising Pre-training for Neural Machine Translation] ( https://arxiv.org/abs/2001.08210 ) by Yinhan Liu,
655746Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
656747
@@ -677,8 +768,12 @@ finetuning.
677768<a href = " model_doc/prophetnet" >
678769<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-prophetnet-blueviolet" >
679770</a >
771+ <a href = " https://huggingface.co/spaces/akhaliq/prophetnet-large-uncased" >
772+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
773+ </a >
680774</div >
681775
776+
682777[ ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training,] ( https://arxiv.org/abs/2001.04063 ) by
683778Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
684779
@@ -701,8 +796,12 @@ summarization.
701796<a href = " model_doc/xlm-prophetnet" >
702797<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-xprophetnet-blueviolet" >
703798</a >
799+ <a href = " https://huggingface.co/spaces/akhaliq/xprophetnet-large-wiki100-cased-xglue-ntg" >
800+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
801+ </a >
704802</div >
705803
804+
706805[ ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training,] ( https://arxiv.org/abs/2001.04063 ) by
707806Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
708807
@@ -753,8 +852,12 @@ Some models use documents retrieval during (pre)training and inference for open-
753852<a href = " model_doc/dpr" >
754853<img alt = " Doc" src = " https://img.shields.io/badge/Model_documentation-dpr-blueviolet" >
755854</a >
855+ <a href = " https://huggingface.co/spaces/akhaliq/dpr-question_encoder-bert-base-multilingual" >
856+ <img alt = " Spaces" src = " https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue" >
857+ </a >
756858</div >
757859
860+
758861[ Dense Passage Retrieval for Open-Domain Question Answering] ( https://arxiv.org/abs/2004.04906 ) , Vladimir Karpukhin et
759862al.
760863
0 commit comments