Merge branch 'master' into master

AK391 · web-flow · commit 68d925195ede · 2022-01-11T11:11:29.000-05:00
diff --git a/docs/source/model_summary.mdx b/docs/source/model_summary.mdx
@@ -70,8 +70,12 @@ that at each position, the model can only look at the tokens before the attentio
 <a href="model_doc/openai-gpt">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-openai--gpt-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/openai-gpt">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [Improving Language Understanding by Generative Pre-Training](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf), Alec Radford et al.
 
 The first autoregressive model based on the transformer architecture, pretrained on the Book Corpus dataset.
@@ -88,8 +92,12 @@ classification.
 <a href="model_doc/gpt2">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-gpt2-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/gpt2">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf),
 Alec Radford et al.
 
@@ -108,8 +116,12 @@ classification.
 <a href="model_doc/ctrl">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-ctrl-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/tiny-ctrl">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858),
 Nitish Shirish Keskar et al.
 
@@ -128,8 +140,12 @@ The library provides a version of the model for language modeling only.
 <a href="model_doc/transfo-xl">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-transfo--xl-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/transfo-xl-wt103">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860), Zihang
 Dai et al.
 
@@ -158,8 +174,12 @@ The library provides a version of the model for language modeling only.
 <a href="model_doc/reformer">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-reformer-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/reformer-crime-and-punishment">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451), Nikita Kitaev et al .
 
 An autoregressive transformer model with lots of tricks to reduce memory footprint and compute time. Those tricks
@@ -195,8 +215,12 @@ The library provides a version of the model for language modeling only.
 <a href="model_doc/xlnet">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlnet-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/xlnet-base-cased">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237), Zhilin
 Yang et al.
 
@@ -229,6 +253,9 @@ corrupted versions.
 <a href="model_doc/bert">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-bert-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/bert-base-uncased">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
 [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805),
@@ -257,8 +284,12 @@ token classification, sentence classification, multiple choice classification an
 <a href="model_doc/albert">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-albert-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/albert-base-v2">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942),
 Zhenzhong Lan et al.
 
@@ -285,8 +316,12 @@ classification, multiple choice classification and question answering.
 <a href="model_doc/roberta">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-roberta-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/roberta-base">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692), Yinhan Liu et al.
 
 Same as BERT with better pretraining tricks:
@@ -309,8 +344,12 @@ classification, multiple choice classification and question answering.
 <a href="model_doc/distilbert">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-distilbert-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/distilbert-base-uncased">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108),
 Victor Sanh et al.
 
@@ -333,8 +372,12 @@ and question answering.
 <a href="model_doc/convbert">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-convbert-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/conv-bert-base">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496), Zihang Jiang,
 Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
 
@@ -362,8 +405,12 @@ and question answering.
 <a href="model_doc/xlm">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlm-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/xlm-mlm-en-2048">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291), Guillaume Lample and Alexis Conneau
 
 A transformer model trained on several languages. There are three different type of training for this model and the
@@ -395,8 +442,12 @@ question answering.
 <a href="model_doc/xlm-roberta">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlm--roberta-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/xlm-roberta-base">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116), Alexis Conneau et
 al.
 
@@ -416,8 +467,12 @@ classification, multiple choice classification and question answering.
 <a href="model_doc/flaubert">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-flaubert-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/flaubert_small_cased">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372), Hang Le et al.
 
 Like RoBERTa, without the sentence ordering prediction (so just trained on the MLM objective).
@@ -433,8 +488,12 @@ The library provides a version of the model for language modeling and sentence c
 <a href="model_doc/electra">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-electra-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/electra_large_discriminator_squad2_512">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://arxiv.org/abs/2003.10555),
 Kevin Clark et al.
 
@@ -456,8 +515,12 @@ classification.
 <a href="model_doc/funnel">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-funnel-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/funnel-transformer-small">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236), Zihang Dai et al.
 
 Funnel Transformer is a transformer model using pooling, a bit like a ResNet model: layers are grouped in blocks, and
@@ -488,8 +551,12 @@ classification, multiple choice classification and question answering.
 <a href="model_doc/longformer">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-longformer-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/longformer-base-4096-finetuned-squadv1">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150), Iz Beltagy et al.
 
 A transformer model replacing the attention matrices by sparse matrices to go faster. Often, the local context (e.g.,
@@ -526,8 +593,12 @@ As mentioned before, these models keep both the encoder and the decoder of the o
 <a href="model_doc/bart">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-bart-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/bart-large-mnli">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461), Mike Lewis et al.
 
 Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is
@@ -551,8 +622,12 @@ The library provides a version of this model for conditional generation and sequ
 <a href="model_doc/pegasus">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-pegasus-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/pegasus_paraphrase">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [PEGASUS: Pre-training with Extracted Gap-sentences forAbstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf), Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019.
 
 Sequence-to-sequence model with the same encoder-decoder model architecture as BART. Pegasus is pre-trained jointly on
@@ -580,8 +655,12 @@ The library provides a version of this model for conditional generation, which s
 <a href="model_doc/marian">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-marian-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/opus-mt-zh-en">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [Marian: Fast Neural Machine Translation in C++](https://arxiv.org/abs/1804.00344), Marcin Junczys-Dowmunt et al.
 
 A framework for translation models, using the same models as BART
@@ -598,8 +677,12 @@ The library provides a version of this model for conditional generation.
 <a href="model_doc/t5">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-t5-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/t5-base">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683), Colin Raffel et al.
 
 Uses the traditional transformer model (with a slight change in the positional embeddings, which are learned at each
@@ -629,8 +712,12 @@ The library provides a version of this model for conditional generation.
 <a href="model_doc/mt5">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-mt5-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/mt5-small-finetuned-arxiv-cs-finetuned-arxiv-cs-full">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934), Linting Xue
 et al.
 
@@ -649,8 +736,12 @@ The library provides a version of this model for conditional generation.
 <a href="model_doc/mbart">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-mbart-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/mbart-large-50-one-to-many-mmt">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu,
 Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
 
@@ -677,8 +768,12 @@ finetuning.
 <a href="model_doc/prophetnet">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-prophetnet-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/prophetnet-large-uncased">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training,](https://arxiv.org/abs/2001.04063) by
 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
 
@@ -701,8 +796,12 @@ summarization.
 <a href="model_doc/xlm-prophetnet">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xprophetnet-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/xprophetnet-large-wiki100-cased-xglue-ntg">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training,](https://arxiv.org/abs/2001.04063) by
 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
 
@@ -753,8 +852,12 @@ Some models use documents retrieval during (pre)training and inference for open-
 <a href="model_doc/dpr">
 <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-dpr-blueviolet">
 </a>
+<a href="https://huggingface.co/spaces/akhaliq/dpr-question_encoder-bert-base-multilingual">
+<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
+</a>
 </div>
 
+
 [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906), Vladimir Karpukhin et
 al.