Skip to content

Conversation

@pyup-bot
Copy link
Contributor

This PR pins transformers to the latest release 4.21.2.

Changelog

4.21.2

Fix a regression in the TableQA pipeline: Fix a regression in Trainer checkpoint loading: [18428](https://github.com/huggingface/transformers/pull/18428)

4.21.1

Fix a regression in Trainer checkpoint loading: 18470

4.21.0

TensorFlow XLA Text Generation

The TensorFlow text generation method can now be wrapped with `tf.function` and compiled to XLA. You should be able to achieve up to 100x speedup this way. See our blog post and [our benchmarks](https://huggingface.co/spaces/joaogante/tf_xla_generate_benchmarks). You can also see XLA generation in action in our [example notebooks](https://huggingface.co/docs/transformers/notebooks), particularly for [summarization](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization-tf.ipynb) and [translation](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation-tf.ipynb).

python
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")

Main changes with respect to the original generate workflow: `tf.function` and `pad_to_multiple_of`
xla_generate = tf.function(model.generate, jit_compile=True)
tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"}

The first prompt will be slow (compiling), the others will be very fast!
input_prompts = [
 f"translate English to {language}: I have four cats and three dogs."
 for language in ["German", "French", "Romanian"]
]
for input_prompt in input_prompts:
 tokenized_inputs = tokenizer([input_prompt], **tokenization_kwargs)
 generated_text = xla_generate(**tokenized_inputs, max_new_tokens=32)
 print(tokenizer.decode(generated_text[0], skip_special_tokens=True))


* Generate: deprecate default `max_length`  by gante in 18018
* TF: GPT-J compatible with XLA generation  by gante in 17986
* TF: T5 can now handle a padded past (i.e. XLA generation)  by gante in 17969
* TF: XLA beam search + most generation-compatible models are now also XLA-generate-compatible  by gante in 17857
* TF: generate without `tf.TensorArray`  by gante in 17801
* TF: BART compatible with XLA generation  by gante in 17479

New model additions

OwlViT

The OWL-ViT model (short for Vision Transformer for Open-World Localization) was proposed in [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.

* Add OWL-ViT model for zero-shot object detection  by alaradirik in 17938
* Fix OwlViT tests  by sgugger in 18253

NLLB

The NLLB model was presented in [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No Language Left Behind (NLLB) is a model capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more.

* [M2M100] update conversion script  by patil-suraj in 17916
* NLLB tokenizer  by LysandreJik in 18126

MobileViT

The MobileViT model was proposed in [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari. MobileViT introduces a new layer that replaces local processing in convolutions with global processing using transformers.

* add MobileViT model  by hollance in 17354

Nezha

The Nezha model was proposed in [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei et al. NEZHA is a language model based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models.

* Nezha Pytorch implementation  by sijunhe in 17776

GroupViT

The GroupViT model was proposed in [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. Inspired by [CLIP](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip), GroupViT is a vision-language model that can perform zero-shot semantic segmentation on any given vocabulary categories, inspired by CLIP.

* Adding GroupViT Models  by xvjiarui in 17313

MVP

The MVP model was proposed in [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. MVP is a generative language model, pre-trained on a labeled pre-training corpus from 45 datasets over seven generation tasks. For each task, the model is further pre-trained using specific soft prompts to stimulate the model capacity in performing a specific task.

* Add MVP model  by StevenTang1998 in 17787

CodeGen

The CodeGen model was proposed in [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. CodeGen is an autoregressive language model for program synthesis trained sequentially on [The Pile](https://pile.eleuther.ai/), BigQuery, and BigPython.

* Add CodeGen model  by rooa in 17443
* [CodeGen] support device_map="auto" for sharded checkpoints  by patil-suraj in 17871

UL2

The UL2 model was presented in [Unifying Language Learning Paradigms](https://arxiv.org/pdf/2205.05131v1.pdf) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler. UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.

* Add UL2 (just docs)  by patrickvonplaten in 17740

Custom pipelines

This adds the ability to support custom pipelines on the Hub and share it with everyone else. Like the code in the Hub feature for models, tokenizers etc., the user has to add `trust_remote_code=True` when they want to use it. Apart from this, the best way to get familiar with the feature is to look at the [added documentation](https://huggingface.co/docs/transformers/v4.21.0/en/add_new_pipeline#share-your-pipeline-on-the-hub).

* Custom pipeline  by sgugger in 18079

PyTorch to TensorFlow CLI utility

This adds a CLI to convert PT weights into TF weights, validate them, and (optionally) open a PR.

* CLI: tool to convert PT into TF weights and open hub PR by gante in https://github.com/huggingface/transformers/pull/17497

TensorFlow-specific improvements

The following models have been ported to be used in TensorFlow: SegFormer, DeiT, ResNet and RegNet. 

* [SegFormer] TensorFlow port  by sayakpaul in 17910
* Add TF DeiT implementation  by amyeroberts in 17806
* Add TF ResNet model  by amyeroberts in 17427
* TF implementation of RegNets  by ariG23498 in 17554

Additionally, our TF models now support loading sharded checkpoints:

* TF Sharded  by ArthurZucker in 17713

Flax-specific improvements

The following models have been ported to be used in JAX:

* Flax t5 Encoder  by crystina-z in 17784

Additionally, our JAX models now support loading sharded checkpoints:

* Flax sharded  by ArthurZucker in 17760

Additional model heads

The following models now have a brand new head for new tasks:

* Add ViltForTokenClassification e.g. for Named-Entity-Recognition (NER)  by gilad19 in 17924
* Adding OPTForSeqClassification class  by oneraghavan in 18123

ONNX support

A continued community effort provides ONNX converters for an increasing number of models.

* add ONNX support for LeVit  by gcheron in 18154
* add ONNX support for BLOOM  by NouamaneTazi in 17961
* Add ONNX support for LayoutLMv3  by regisss in 17953
* Mrbean/codegen onnx  by sam-h-bean in 17903
* Add ONNX support for DETR  by regisss in 17904
* add onnx support for deberta and debertav2  by sam-h-bean in 17617

Documentation translation

A community effort aiming to translate the documentation in several languages has been continued.

Portuguese

* Added translation of index.mdx to Portuguese Issue 16824  by rzimmerdev in 17565

Spanish

* Add Spanish translation of custom_models.mdx  by donelianc in 17807

Italian

* Add Italian translation of sharing_custom_models.mdx  by Xpiri in 17631
* Add Italian translation of converting_tensorflow_models.mdx  by Xpiri in 18283
* Add Italian translation of create_model.mdx  and serialization.mdx   by F02934 in 17640
* Italian/accelerate  by mfumanelli in 17698
* Italian/model sharing  by mfumanelli in 17828
* Italian translation of run_scripts.mdx gh-17459  by lorenzobalzani in 17642
* Translation/debugging  by nickprock in 18230
* Translation/training: italian translation training.mdx  by nickprock in 17662
* Translation italian: multilingual.mdx  by nickprock in 17768
* Added preprocessing.mdx italian translation  by nickprock in 17600

Improvements and bugfixes

* [EncoderDecoder] Improve docs  by NielsRogge in 18271
* [DETR] Improve code examples  by NielsRogge in 18262
* patch for smddp import  by carolynwang in 18244
* Fix Sylvain's nits on the original KerasMetricCallback PR  by Rocketknight1 in 18300
* Add PYTEST_TIMEOUT for CircleCI test jobs  by ydshieh in 18251
* Add PyTorch 1.11 to past CI  by ydshieh in 18302
* Raise a TF-specific error when importing Torch classes  by Rocketknight1 in 18280
* [ create_a_model.mdx ] translate to pt  by Fellip15 in 18098
* Update translation.mdx  by gorkemozkaya in 18169
* Add TFAutoModelForImageClassification to pipelines.py  by ydshieh in 18292
* Adding type hints of TF:OpenAIGPT  by Mathews-Tom in 18263
* Adding type hints of TF:CTRL  by Mathews-Tom in 18264
* Replace false parameter by a buffer  by sgugger in 18259
* Fix ORTTrainer failure on gpt2 fp16 training  by JingyaHuang in 18017
* Owlvit docs test  by alaradirik in 18257
* Good difficult issue override for the stalebot  by LysandreJik in 18094
* Fix dtype of input_features in docstring  by ydshieh in 18258
* Fix command of doc tests for local testing  by oneraghavan in 18236
* Fix TF bad words filter with XLA  by Rocketknight1 in 18286
* Allows `KerasMetricCallback` to use XLA generation  by Rocketknight1 in 18265
* Skip passes report for `--make-reports`  by ydshieh in 18250
* Update serving code to enable `saved_model=True`  by amyeroberts in 18153
* Change how `take_along_axis` is computed in DeBERTa to stop confusing XLA  by Rocketknight1 in 18256
* Fix torch version check in Vilt  by ydshieh in 18260
* change bloom parameters to 176B  by muhammad-ahmed-ghani in 18235
* TF: use the correct config with `(...)EncoderDecoder` models  by gante in 18097
* Fix `no_trainer` CI  by muellerzr in 18242
* Update notification service  by ydshieh in 17921
* Make errors for loss-less models more user-friendly  by sgugger in 18233
* Fix TrainingArguments help section  by sgugger in 18232
* Better messaging and fix for incorrect shape when collating data.  by CakeCrusher in 18119
* Add support for Sagemaker Model Parallel >= 1.10 new checkpoint API  by viclzhu in 18221
* Update add_new_pipeline.mdx  by zh-zheng in 18224
* Add custom config to quicktour  by stevhliu in 18115
* skip some test_multi_gpu_data_parallel_forward  by ydshieh in 18188
* Change to FlavaProcessor in PROCESSOR_MAPPING_NAMES  by ydshieh in 18213
* Fix `LayoutXLM` docstrings  by qqaatw in 17038
* update cache to v0.5  by ydshieh in 18203
* Reduce console spam when using the KerasMetricCallback  by Rocketknight1 in 18202
* TF: Add missing cast to GPT-J  by gante in 18201
* Use next-gen CircleCI convenience images  by ydshieh in 18197
* Typo in readme  by flozi00 in 18195
* [From pretrained] Allow download from subfolder inside model repo  by patrickvonplaten in 18184
* Update docs README with instructions on locally previewing docs  by snehankekre in 18196
* bugfix: div-->dim  by orgoro in 18135
* Add vision example to README  by sgugger in 18194
* Remove use_auth_token from the from_config method  by duongna21 in 18192
* FSDP integration enhancements and fixes  by pacman100 in 18134
* BLOOM minor fixes small test  by younesbelkada in 18175
* fix typo inside bloom documentation  by SaulLu in 18187
* Better default for offload_state_dict in from_pretrained  by sgugger in 18183
* Fix template for new models in README  by sgugger in 18182
* FIX: Typo  by ayansengupta17 in 18156
* Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests  by ydshieh in 18073
* Fix expected loss values in some (m)T5 tests  by ydshieh in 18177
* [HPO] update to sigopt new experiment api  by sywangyi in 18147
* Fix incorrect type hint for lang  by JohnGiorgi in 18161
* Fix check for falsey inputs in run_summarization  by JohnGiorgi in 18155
* Adding support for `device_map` directly in `pipeline(..)` function.  by Narsil in 17902
* Fixing a hard to trigger bug for `text-generation` pipeline.  by Narsil in 18131
* Enable torchdynamo with torch_tensorrt(fx path)  by frank-wei in 17765
* Make sharded checkpoints work in offline mode  by sgugger in 18125
* add dataset split and config to model-index in TrainingSummary.from_trainer  by loicmagne in 18064
* Add summarization name mapping for MultiNews  by JohnGiorgi in 18117
* supported python versions reference  by CakeCrusher in 18116
* TF: unpack_inputs decorator independent from main_input_name  by gante in 18110
* TF: remove graph mode distinction when processing boolean options  by gante in 18102
* Fix BLOOM dtype  by Muennighoff in 17995
* CLI: reenable `pt_to_tf` test   by gante in 18108
* Report value for a step instead of epoch.  by zhawe01 in 18095
* speed up test  by sijunhe in 18106
* Enhance IPEX integration in Trainer  by jianan-gu in 18072
* Bloom Optimize operations  by younesbelkada in 17866
* Add filename to info diaplyed when downloading things in from_pretrained  by sgugger in 18099
* Fix image segmentation and object detection pipeline tests  by sgugger in 18100
* Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts  by duongna21 in 18069
* Fix torchscript tests for GPT-NeoX  by ydshieh in 18012
* Fix some typos.  by Yulv-git in 17560
* [bloom] fix alibi device placement  by stas00 in 18087
* Make predict() close progress bars after finishing  by neverix in 17952) 
* Update localized READMES when template is filled.  by sgugger in 18062
* Fix type issue in using bucketing with Trainer  by seopbo in 18051
* Fix slow CI by pinning resampy  by sgugger in 18077
* Drop columns after loading samples in prepare_tf_dataset  by Rocketknight1 in 17967
* [Generate Tests] Make sure no tokens are force-generated  by patrickvonplaten in 18053
* Added Command for windows VENV activation in installation docs  by darthvader2 in 18008
* Sort doc toc  by sgugger in 18034
* Place inputs on device when include_inputs_for_metrics is True  by sgugger in 18046
* Doc to dataset  by sgugger in 18037
* Protect `TFGenerationMixin.seed_generator` so it's not created at import  by Rocketknight1 in 18044
* Fix T5 incorrect weight decay in Trainer and official summarization example  by ADAning in 18002
* Squash commits  by NielsRogge in 17981
* Enable Past CI  by ydshieh in 17919
* Fix T5/mT5 tests  by Rocketknight1 in 18029
* [Flax] Bump to v0.4.1  by sanchit-gandhi in 17966
* Update expected values in DecisionTransformerModelIntegrationTest  by ydshieh in 18016
* fixed calculation of ctc loss in TFWav2Vec2ForCTC  by Sreyan88 in 18014
* Return scalar losses instead of per-sample means  by Rocketknight1 in 18013
* sort list of models  by hollance in 18011
* Replace BloomTokenizer by BloomTokenizerFast in doc  by regisss in 18005
* Fix typo in error message in generation_utils  by regisss in 18000
* Refactor to inherit from nn.Module instead of nn.ModuleList  by amyeroberts in 17501
* Add link to existing documentation  by LysandreJik in 17931
* only a stupid typo, but it can lead to confusion  by Dobatymo in 17930
* Exclude Databricks from notebook env only if the runtime is below 11.0  by davidheryanto in 17988
* Shifting labels for causal LM when using label smoother  by seungeunrho in 17987
* Restore original task in test_warning_logs  by ydshieh in 17985
* Ensure PT model is in evaluation mode and lightweight forward pass done  by amyeroberts in 17970
* XLA train step fixes  by Rocketknight1 in 17973
* [Flax] Add remat (gradient checkpointing)  by sanchit-gandhi in 17843
* higher atol to avoid flaky trainer test failure  by ydshieh in 17979
* Fix FlaxBigBirdEmbeddings  by ydshieh in 17842
* fixing fsdp autowrap functionality  by pacman100 in 17922
* fix `bias` keyword argument in TFDebertaEmbeddings  by WissamAntoun in 17940
* Update expected values in CodeGen tests  by ydshieh in 17888
* Fix typo in perf_train_gpu_one.mdx  by aliencaocao in 17983
* skip some gpt_neox tests that require 80G RAM  by ydshieh in 17923
* feat: add pipeline registry abstraction  by aarnphm in 17905
* skip some ipex tests until it works with torch 1.12  by ydshieh in 17964
* Fix number of examples for iterable dataset in distributed training  by sgugger in 17951
* [Pipelines] Add revision tag to all default pipelines  by patrickvonplaten in 17667
* Unifying training argument type annotations  by jannisborn in 17934
* Fix GPT-NeoX-20B past handling, attention computation  by zphang in 17811
* Fix 17893, removed dead code  by clefourrier in 17917
* Fix prepare_tf_dataset when drop_remainder is not supplied  by Rocketknight1 in 17950
* ExplicitEnum subclass str (JSON dump compatible)  by BramVanroy in 17933
* PyTorch 1.12.0 for scheduled CI  by ydshieh in 17949
* OPT - Fix Softmax NaN in half precision mode  by younesbelkada in 17437
* Use explicit torch version in deepspeed CI  by ydshieh in 17942
* fix regexes with escape sequence  by stas00 in 17943
* Fix all is_torch_tpu_available issues  by muellerzr in 17936
* Fix img seg tests (load checkpoints from `hf-internal-testing`)  by mishig25 in 17939
* Remove imports and use forward references in ONNX feature  by sgugger in 17926
* Fix job links in Slack report  by ydshieh in 17892
* Add missing comment quotes  by leondz in 17379
* Remove render tags  by NielsRogge in 17897
* Fix the Conda package build  by bryant1410 in 16737
* Remove DT_DOUBLE from the T5 graph  by szutenberg in 17891
* Compute min_resolution in prepare_image_inputs  by ydshieh in 17915
* Fixing a regression with `return_all_scores` introduced in 17606  by Narsil in 17906
* In `group_texts` function, drop last block if smaller than `block_size`  by billray0259 in 17908
* Move logic into pixelshuffle layer  by amyeroberts in 17899
* Fix loss computation in TFBertForPreTraining  by Rocketknight1 in 17898
* Pin black to 22.3.0 to benefit from a stable --preview flag  by LysandreJik in 17918
* Fix PyTorch/TF Auto tests  by ydshieh in 17895
* Fix `test_number_of_steps_in_training_with_ipex`  by ydshieh in 17889
* Update expected values in constrained beam search tests  by ydshieh in 17887
* Fix bug in gpt2's (from-scratch) special scaled weight initialization   by karpathy in 17877
* Update README_zh-hans.md  by mmdjiji in 17861
* bert: add conversion script for BERT Token Dropping TF2 checkpoints  by stefan-it in 17142
* Fix add new model like frameworks  by sgugger in 17869
* Add type annotations for RoFormer models  by donelianc in 17878
* fix  by ydshieh in 17890
* fix mask  by younesbelkada in 17837
* Add a TF in-graph tokenizer for BERT  by Rocketknight1 in 17701
* Fix TF GPT2 test_onnx_runtime_optimize  by ydshieh in 17874
* CLI: handle multimodal inputs  by gante in 17839
* Properly get tests deps in test_fetcher  by sgugger in 17870
* Fix `test_inference_instance_segmentation_head`  by ydshieh in 17872
* Skip `test_multi_gpu_data_parallel_forward` for `MaskFormer`  by ydshieh in 17864
* Use higher value for hidden_size in Flax BigBird test  by ydshieh in 17822
* Fix: torch.utils.checkpoint import error.  by kumapo in 17849
* Add type hints for gptneox models  by willtai in 17858
* Fix Splinter test  by ydshieh in 17854
* [tests/VisionEncoderDecoder] import to_2tuple from test utils  by patil-suraj in 17865
* Fix Constrained beam search duplication and weird output issue  by boy2000-007man in 17814
* Improve encoder decoder model docs  by Threepointone4 in 17815
* Improve vision models  by NielsRogge in 17731
* Auto-build Docker images before on-merge if setup.py was changed  by muellerzr in 17573
* Properly calculate the total train iterations and recalculate num epochs in no_trainer scripts  by muellerzr in 17856
* Index RNG states by global rank in saves  by sgugger in 17852
* Change no trainer image_classification test  by muellerzr in 17635
* Update modeling_cvt.py  by F02934 in 17846
* Fix broken test for models with batchnorm  by Rocketknight1 in 17841
* BLOOM minor changes on tokenizer  by younesbelkada in 17823
* Improve performance docs  by lvwerra in 17750
* Fix an error message in BigBird  by ydshieh in 17840
* Fix properties of unset special tokens in non verbose mode  by guillaumekln in 17797
* change message  by SaulLu in 17836
* Add missing type hints for QDQBertModel  by willtai in 17783
* Update type hints modeling_yoso.py   by F02934 in 17827
* add doctests for DETR  by qherreros in 17786
* Fix push CI artifact path  by ydshieh in 17788
* Offload fixes  by sgugger in 17810
* CLI: use hub's `create_commit`  by gante in 17755
* initial commit  by ArthurZucker in 17818
* Add logits_processor parameter, used by `generate`, to `Seq2SeqTrainer` methods `evaluate` and `predict`  by eranhirs in 17805
* Fix `top_k_top_p_filtering` having unexpected behavior  by unifyh in 17744
* Remove duplicate code  by lkm2835 in 17708
* CLI: convert sharded PT models  by gante in 17959
* Improve error message Union not allowed  by BramVanroy in 17769
* Add final_layer_norm to OPT model  by thomasw21 in 17785
* Properly check for a TPU device  by muellerzr in 17802
* Fix test for BF16 detection  by sgugger in 17803
* Use 5e-5 For BigBird PT/Flax equivalence tests  by ydshieh in 17780
* Prepare transformers for v0.8.0 huggingface-hub release  by LysandreJik in 17716
* Fix forward reference imports in DeBERTa configs  by sgugger in 17800
* Fix Automatic Download of Pretrained Weights in DETR  by AnugunjNaman in 17712
* [ViTMAE] Fix docstrings and variable names  by NielsRogge in 17710
* Add link to notebook  by NielsRogge in 17791
* [CodeParrot] Near-deduplication with jaccard similarity  by liyongsea in 17054
* Update modeling_longt5.py  by bjascob in 17777
* Not use -1e4 as attn mask  by ydshieh in 17306
* Fix cache for GPT-Neo-X  by sgugger in 17764
* deprecate is_torch_bf16_available  by stas00 in 17738
* Attempt to change Push CI to workflow_run  by ydshieh in 17753
* Save huggingface checkpoint as artifact in mlflow callback  by swethmandava in 17686
* Migrate HFDeepSpeedConfig from trfrs to accelerate  by pacman100 in 17623
* feat: add num_workers arg to DataLoader  by greg2451 in 17751
* Enable PyTorch nightly build CI  by ydshieh in 17335

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* donelianc
 * Add Spanish translation of custom_models.mdx (17807)
 * Add type annotations for RoFormer models (17878)
* Xpiri
 * Add Italian translation of sharing_custom_models.mdx (17631)
 * Add Italian translation of converting_tensorflow_models.mdx (18283)
* F02934
 * Add Italian translation of create_model.mdx  and serialization.mdx  (17640)
 * Update modeling_cvt.py (17846)
 * Update type hints modeling_yoso.py  (17827)
* sayakpaul
 * [SegFormer] TensorFlow port (17910)
* mfumanelli
 * Italian/accelerate (17698)
 * Italian/model sharing (17828)
* nickprock
 * Translation/debugging (18230)
 * Translation/training: italian translation training.mdx (17662)
 * Translation italian: multilingual.mdx (17768)
 * Added preprocessing.mdx italian translation (17600)
* sijunhe
 * speed up test (18106)
 * Nezha Pytorch implementation (17776)
* StevenTang1998
 * Add MVP model (17787)
* ariG23498
 * TF implementation of RegNets (17554)
* xvjiarui
 * Adding GroupViT Models (17313)
* rooa
 * Add CodeGen model (17443)

4.20.1

This patch releases fixes a bug in the OPT models and makes Transformers compatible with `huggingface_hub` version 0.8.1.

* Add final_layer_norm to OPT model 17785
* Prepare transformers for v0.8.0 huggingface-hub release 17716

4.20.0

Big model inference

You can now use the big model inference of Accelerate directly in any call to `from_pretrained` by specifying `device_map="auto"` (or your own `device_map`). It will automatically load the model taking advantage of your GPU(s) then offloading what doesn't fit in RAM, or even on the hard drive if you don't have RAM. Your model can then be used normally for inference without anything else to do.

py
from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained(
"bigscience/T0pp", revision="sharded", device_map="auto"
)


* Use Accelerate in `from_pretrained` for big model inference  by sgugger in 17341

BLOOM

The BLOOM model has been proposed with its various versions through the [BigScience Workshop](https://bigscience.huggingface.co/). The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code. 

* BLOOM   by younesbelkada in 17474

CvT

The Convolutional vision Transformer (CvT) improves the Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

* Add CvT  by NielsRogge and AnugunjNaman in 17299

GPT Neo-X

GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile, whose weights are made freely and openly available to the public through a permissive license. GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models.

* Adding GPT-NeoX-20B  by zphang in 16659

LayoutLMv3

LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).

* Add LayoutLMv3  by NielsRogge in 17060

LeViT

LeViT improves the Vision Transformer (ViT) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.

* Adding LeViT Model by Facebook  by AnugunjNaman in 17466

LongT5

LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. It is capable of handling input sequences of a length up to 16,384 tokens.

* Add `LongT5` model  by stancld in 16792

M-CTC-T

The M-CTC-T model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.

* M-CTC-T Model  by cwkeam in 16402

Trajectory Transformer

This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from actions, states and rewards from all previous timesteps. This model will treat all these elements together as one big sequence (a trajectory).

* Add trajectory transformer  by CarlCochet in 17141

Wav2Vec2-Conformer

The Wav2Vec2-Conformer is an updated version of fairseq S2T: Fast Speech-to-Text. It requires more parameters than Wav2Vec2, but also yields an improved word error rate.

* [Wav2Vec2Conformer] Official release  by patrickvonplaten in 17709
* Add Wav2Vec2Conformer  by patrickvonplaten in 16812

TensorFlow implementations

Data2VecVision for semantic segmentation, OPT and Swin are now available in TensorFlow.
* Add TFData2VecVision for semantic segmentation  by sayakpaul in 17271
* Opt in flax and tf  by ArthurZucker in 17388
* Add Tensorflow Swin model  by amyeroberts in 16988

Flax implementations

OPT is now available in Flax.
* Opt in flax and tf  by ArthurZucker in 17388

Documentation translation in Italian and Portuguese

A community effort has been started to translate the documentation in two new languages: Italian and Portuguese.

* Translation/italian: added pipeline_tutorial.mdx [Issue: 17459]  by nickprock in 17507
* Add installation.mdx Italian translation  by mfumanelli in 17530
* Setup for Italian translation and add quicktour.mdx translation  by mfumanelli in 17472
* Adding the Portuguese version of the tasks/token_classification.mdx documentation  by jonatasgrosman in 17492
* Adding the Portuguese version of the tasks/sequence_classification.mdx documentation  by jonatasgrosman in 17352
* [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial  by Fellip15 in 17076
* Added translation of installation.mdx to Portuguese Issue 16824  by rzimmerdev in 16979

Improvements and bugfixes

* Sort the model doc Toc Alphabetically  by sgugger in 17723
* normalize keys_to_ignore  by stas00 in 17722
* CLI: Add flag to push TF weights directly into main  by gante in 17720
* Update requirements.txt  by jeffra in 17719
* Revert "Change push CI to run on workflow_run event  by ydshieh in 17692)"
* Documentation: RemBERT fixes  by stefan-it in 17641
* Change push CI to run on workflow_run event  by ydshieh in 17692
* fix tolerance for a bloom slow test  by younesbelkada in 17634
* [LongT5] disable model parallel test  by patil-suraj in 17702
* FX function refactor  by michaelbenayoun in 17625
* Add `BloomForSequenceClassification` and `BloomForTokenClassification` classes  by haileyschoelkopf in 17639
* Swin main layer  by amyeroberts in 17693
* Include a comment to reflect Amy's contributions  by sayakpaul in 17689
* Rag end2end new  by shamanez in 17650
* [LongT5] Rename checkpoitns  by patrickvonplaten in 17700
* Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference  by jianan-gu in 17153
* Fix doc builder Dockerfile  by ydshieh in 17435
* Add FP16 Support for SageMaker Model Parallel  by haohanchen-yagao in 17386
* enable cpu distribution training using mpirun  by sywangyi in 17570
* Add Ray's scope to training arguments  by BramVanroy in 17629
* Update modeling_gpt_neox.py  by willfrey in 17575
* Fix dtype getter  by sgugger in 17668
* explicitly set utf8 for Windows  by BramVanroy in 17664
* Fixed documentation typo, parameter name is evaluation_strategy, not eval_strategy  by sainttttt in 17669
* Add Visual Question Answering (VQA) pipeline  by sijunhe in 17286
* Fix typo in adding_a_new_model README  by ayushtues in 17679
* Avoid GPU OOM for a TF Rag test  by ydshieh in 17638
* fix typo from emtpy to empty  by domenicrosati in 17643
* [Generation Test] Make fast test actually fast  by patrickvonplaten in 17661
* [Data2Vec] Speed up test  by patrickvonplaten in 17660
* [BigBirdFlaxTests] Make tests slow  by patrickvonplaten in 17658
* update README.md  by loubnabnl in 17657
* 🐛 Properly raise `RepoNotFoundError` when not authenticated  by SBrandeis in 17651
* Fixes 17128 .  by mygithubid1 in 17356
* Fix dtype getters  by sgugger in 17656
* Add skip logic for attentions test - Levit  by amyeroberts in 17633
* Enable crop_center method to handle (W, H, C) images  by alaradirik in 17626
* Move Clip image utils to image_utils.py  by alaradirik in 17628
* Skip tests until bug is fixed.  by sgugger in 17646
* Translation/autoclass  by mfumanelli in 17615
* didn't exist in pt-1.9  by stas00 in 17644
* convert assertion to raised exception in debertav2  by sam-h-bean in 17619
* Pre-build DeepSpeed  by ydshieh in 17607
* [modeling_utils] torch_dtype/auto floating dtype fixes  by stas00 in 17614
* Running a pipeline of `float16`.  by Narsil in 17637
* fix use_amp rename after pr 17138  by stas00 in 17636
* Fix very long job failure text in Slack report  by ydshieh in 17630
* Adding `top_k` argument to `text-classification` pipeline.  by Narsil in 17606
* Mention in the doc we drop support for fairscale  by sgugger in 17610
* Use shape_list to safely get shapes for Swin  by amyeroberts in 17591
* Add ONNX support for ConvNeXT  by regisss in 17627
* Add ONNX support for ResNet  by regisss in 17585
* has_attentions - consistent test skipping logic and tf tests  by amyeroberts in 17495
* CLI: Print all different tensors on exception  by gante in 17612
* TF: Merge PT and TF behavior for Bart when no decoder_input_ids are passed  by gante in 17593
* Fix telemetry URL  by sgugger in 17608
* CLI: Properly detect encoder-decoder models  by gante in 17605
* Fix link for community notebooks  by ngoquanghuy99 in 17602
* Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch  by jianan-gu in 17138
* fix `train_new_from_iterator` in the case of byte-level tokenizers  by SaulLu in 17549
* Explicit versions in docker files  by ydshieh in 17586
* CLI: add stricter automatic checks to `pt-to-tf`  by gante in 17588
* fix  by ydshieh in 17589
* quicktour.mdx en -> pt translation  by vitorfrois in 17074
* Fx support for Deberta-v[1-2], Hubert and LXMERT  by michaelbenayoun in 17539
* Add examples telemetry  by sgugger in 17552
* Fix gendered sentence in Spanish translation by omarespejel in 17558
* Fix circular import in onnx.utils  by sgugger in 17577
* Use latest stable PyTorch/DeepSpeed for Push & Scheduled CI  by ydshieh in 17417
* Remove circular imports in layoutlm/__init__.py  by regisss in 17576
* Add magic method to our TF models to convert datasets with column inference  by Rocketknight1 in 17160
* [deepspeed / testing] reset global state  by stas00 in 17553
* Remove RuntimeErrors for NaN-checking in 20B  by zphang in 17563
* fix integration test levit  by AnugunjNaman in 17555
* [deepspeed] fix load_best_model test  by stas00 in 17550
* Update index.mdx  by BritneyMuller in 17547
* Clean imports to fix test_fetcher  by sgugger in 17531
* Update run_glue_no_trainer.py  by bofenghuang in 17546
* Fix all offload and MP tests  by sgugger in 17533
* Fix bug - layer names and activation from previous refactor  by amyeroberts in 17524
* Add support for Perceiver ONNX export  by deutschmn in 17213
* Allow from transformers import TypicalLogitsWarper  by teticio in 17477
* Add Gated-SiLU to T5  by DanielHesslow in 17420
* Update URL for Hub PR docs  by lewtun in 17532
* fix OPT-Flax CI tests   by ArthurZucker in 17512
* [trainer/deepspeed] load_best_model (reimplement re-init)  by stas00 in 17151
* Implemented loss for training AudioFrameClassification  by MorenoLaQuatra in 17513
* Update configuration_auto.py  by kamalkraj in 17527
* Check list of models in the main README and sort it  by sgugger in 17517
* Fix  when Accelerate is not installed  by sgugger in 17518
* Clean README in post release job as well.  by sgugger in 17519
* Fix CI tests hang forever  by ydshieh in 17471
* Print more library versions in CI  by ydshieh in 17384
* Split push CI into 2 workflows  by ydshieh in 17369
* Fix Tapas tests  by ydshieh in 17510
* CLI: tool to convert PT into TF weights and open hub PR  by gante in 17497
* Fix flakey no-trainer test  by muellerzr in 17515
* Deal with the error when task is regression  by fireindark707 in 16330
* Fix CTRL tests  by ydshieh in 17508
* Fix LayoutXLMProcessorTest  by ydshieh in 17506
* Debug LukeForMaskedLM  by Ryou0634 in 17499
* Fix MP and CPU offload tests for Funnel and GPT-Neo  by sgugger in 17503
* Exclude Databricks from notebook env  by sgugger in 17496
* Fix `tokenizer` type annotation in `pipeline(...)`  by willfrey in 17500
* Refactor classes to inherit from nn.Module instead of nn.Sequential  by amyeroberts in 17493
* Fix wav2vec2 export onnx model with attention_mask error  by nilboy in 16004
* Add warning when using older version of torch for ViltFeatureExtractor  by xhluca in 16756
* Fix typo of variable names for key and query projection layer  by Kyeongpil in 17155
* Fixed wrong error message for missing weight file  by 123jimin in 17216
* Add OnnxConfig for SqueezeBert iss17314  by Ruihua-Fang in 17315
* [GPT2Tokenizer] Fix GPT2 with bos token  by patrickvonplaten in 17498
* [Json configs] Make json prettier for all saved tokenizer files & ensure same json format for all processors (tok + feat_extract)  by patrickvonplaten in 17457
* Accumulate tokens into batches in `PreTrainedTokenizerBase.add_tokens()`  by Witiko in 17119
* Add HF.co for PRs / Issues regarding specific model checkpoints  by patrickvonplaten in 17485
* Fix checkpoint name  by ydshieh in 17484
* Docker image build in parallel  by ydshieh in 17434
* Added XLM onnx config  by nandwalritik in 17030
* Disk offload fix  by sgugger in 17428
* TF: GPT-2 generation supports left-padding  by gante in 17426
* Fix ViTMAEModelTester  by ydshieh in 17470
* [Generate] Fix output scores greedy search  by patrickvonplaten in 17442
* Fix nits  by omarespejel in 17349
* Fx support for multiple model architectures  by michaelbenayoun in 17393
* typo IBERT in __repr__ quant_mode  by scratchmex in 17398
* Fix typo (remove parenthesis)  by mikcnt in 17415
* Improve notrainer examples  by pacman100 in 17449
* [OPT] Fix bos token id default  by patrickvonplaten in 17441
* Fix model parallelism test  by sgugger in 17439
* Pin protobouf that breaks TensorBoard in PyTorch  by sgugger in 17440
* Spanish translation of the file preprocessing.mdx  by yharyarias in 16299
* Spanish translation of the files sagemaker.mdx and image_classification.mdx  by SimplyJuanjo in 17262
* Added es version of bertology.mdx doc  by jQuinRivero in 17255
* Wav2vec2 finetuning shared file system  by patrickvonplaten in 17423
* fix link in performance docs  by lvwerra in 17419
* Add link to Hub PR docs in model cards  by lewtun in 17421
* Upd AutoTokenizer.from_pretrained doc examples  by c00k1ez in 17416
* Support compilation via Torchdynamo, AOT Autograd, NVFuser  by anijain2305 in 17308
* Add test for new model parallelism features  by sgugger in 17401
* Make check_init script more robust and clean inits  by sgugger in 17408
* Fix README localizer script  by sgugger in 17407
* Fix expected value for OPT test `test_inference_no_head`  by ydshieh in 17395
* Clean up CLIP tests  by NielsRogge in 17380
* Enabling `imageGPT` auto feature extractor.  by Narsil in 16871
* Add support for `device_map="auto"` to OPT  by sgugger in 17382
* OPTForCausalLM lm_head input size should be config.word_embed_proj_dim  by vfbd in 17225
* Traced models serialization and torchscripting fix  by michaelbenayoun in 17206
* Fix Comet ML integration  by mxschmdt in 17381
* Fix cvt docstrings  by AnugunjNaman in 17367
* Correct & Improve Doctests for LayoutLMv2  by gnolai in 17168
* Fix CodeParrot training script  by loubnabnl in 17291
* Fix a typo relative_postion_if_large -> relative_position_if_large  by stancld in 17366
* Pin dill to fix examples  by sgugger in 17368
* [Test OPT] Add batch generation test opt  by patrickvonplaten in 17359
* Fix bug in Wav2Vec2 pretrain example  by ddobokki in 17326
* fix for 17292  by nadahlberg in 17293
* [Generation] Fix Transition probs  by patrickvonplaten in 17311
* [OPT] Run test in lower precision on GPU  by patrickvonplaten in 17353
* Adding `batch_size` test to QA pipeline.  by Narsil in 17330
* [BC] Fixing usage of text pairs  by Narsil in 17324
* [tests] fix copy-n-paste error  by stas00 in 17312
* Fix ci_url might be None  by ydshieh in 17332
* fix  by ydshieh in 17337
* Fix metric calculation in examples and setup tests to run on multi-gpu for no_trainer scripts  by muellerzr in 17331
* docs for typical decoding  by jadermcs in 17186
* Not send successful report  by ydshieh in 17329
* Fix test_t5_decoder_model_past_large_inputs  by ydshieh in 17320
* Add onnx export cuda support  by JingyaHuang in 17183
* Add Information Gain Filtration algorithm  by mraunak in 16953
* Fix typo  by kamalkraj in 17328
* remove  by ydshieh in 17325
* Accepting real pytorch device as arguments.  by Narsil in 17318
* Updating the docs for `max_seq_len` in QA pipeline  by Narsil in 17316
* [T5] Fix init in TF and Flax for pretraining  by patrickvonplaten in 17294
* Add type hints for ProphetNet (Pytorch)  by jQuinRivero in 17223
* fix  by patrickvonplaten in 17310
* [LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing  by caesar-one in 17112
* Add support for pretraining recurring span selection to Splinter  by jvcop in 17247
* Add PR author in CI report + merged by info  by ydshieh in 17298
* Fix dummy creation script  by sgugger in 17304
* Doctest longformer  by KMFODA in 16441
* [Test] Fix W2V-Conformer integration test  by patrickvonplaten in 17303
* Improve mismatched sizes management when loading a pretrained model  by regisss in 17257
* correct opt  by patrickvonplaten in 17301
* Rewrite TensorFlow train_step and test_step  by Rocketknight1 in 17057
* Fix tests of mixed precision now that experimental is deprecated  by Rocketknight1 in 17300
* fix retribert's `test_torch_encode_plus_sent_to_model`  by SaulLu in 17231
* [ConvNeXT] Fix drop_path_rate  by NielsRogge in 17280
* Fix wrong PT/TF categories in CI report  by ydshieh in 17272
* Fix missing job action button in CI report   by ydshieh in 17270
* Fix test_model_parallelization  by lkm2835 in 17249
* [Tests] Fix slow opt tests  by patrickvonplaten in 17282
* docs(transformers): fix typo  by k-zehnder in 17263
* logging documentation update  by sanderland in 17174
* Use the PR URL in CI report  by ydshieh in 17269
* Fix FlavaForPreTrainingIntegrationTest CI test  by ydshieh in 17232
* Better error in the Auto API when a dep is missing  by sgugger in 17289
* Make TrainerHyperParameterSigOptIntegrationTest slow test  by ydshieh in 17288
* Automatically sort auto mappings  by sgugger in 17250
* Mlflowcallback fix nonetype error  by orieg in 17171
* Align logits and labels in OPT  by MichelBartels in 17237
* Remove next sentence prediction from supported ONNX tasks  by lewtun in 17276
* CodeParrot data pretokenization  by loubnabnl in 16932
* Update codeparrot data preprocessing  by loubnabnl in 16944
* Updated checkpoint support for Sagemaker Model Parallel  by cavdard in 17219
* fixed bug in run_mlm_flax_stream.py  by KennethEnevoldsen in 17203
* [doc] performance/scalability revamp  by stas00 in 15723
* TF - Fix convnext classification example  by gante in 17261
* Fix obvious typos in flax decoder impl  by cloudhan in 17279
* Guide to create custom models in Spanish  by ignacioct in 17158
* Translated version of model_sharing.mdx doc to spanish  by Gerard-170 in 16184
* Add PR title to push CI report  by ydshieh in 17246
* Fix push CI channel  by ydshieh in 17242
* install dev. version of accelerate  by ydshieh in 17243
* Fix Trainer for Datasets that don't have dict items  by sgugger in 17239
* Handle copyright in add-new-model-like  by sgugger in 17218
* fix --gpus option for docker  by ydshieh in 17235
* Update self-push workflow  by ydshieh in 17177
* OPT - fix docstring and improve tests slighly  by patrickvonplaten in 17228
* OPT-fix  by younesbelkada in 17229
* Fix typo in bug report template  by fxmarty in 17178
* Black preview  by sgugger in 17217
* update BART docs  by patil-suraj in 17212
* Add test to ensure models can take int64 inputs  by Rocketknight1 in 17210

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* sayakpaul
 * Include a comment to reflect Amy's contributions (17689)
 * Add TFData2VecVision for semantic segmentation (17271)
* jianan-gu
 * Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference (17153)
 * Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch (17138)
* stancld
 * Add `LongT5` model (16792)
 * Fix a typo relative_postion_if_large -> relative_position_if_large (17366)
* mfumanelli
 * Translation/autoclass (17615)
 * Add installation.mdx Italian translation (17530)
 * Setup for Italian translation and add quicktour.mdx translation (17472)
* cwkeam
 * M-CTC-T Model (16402)
* zphang
 * Remove RuntimeErrors for NaN-checking in 20B (17563)
 * Adding GPT-NeoX-20B (16659)
* AnugunjNaman
 * fix integration test levit (17555)
 * Adding LeViT Model by Facebook (17466)
 * Fix cvt docstrings (17367)
* yharyarias
 * Spanish translation of the file preprocessing.mdx (16299)
* mraunak
 * Add Information Gain Filtration algorithm (16953)
* rzimmerdev
 * Added translation of installation.mdx to Portuguese Issue 16824 (16979)

4.19.4

Fixes the errors message when trying to access a repo that does not exist (started to break due to changes in Hub API).

[🐛]Properly raise RepoNotFoundError when not authenticated 17651[

4.19.3

This patch release fixes the install of protobuf when a user wants to do `pip install transformers[sentencepiece]`.

- Pin protobouf that breaks TensorBoard in PyTorch 17440

4.19.2

Patch release for the following PRs/commits:

- [OPT-fix 17229](https://github.com/huggingface/transformers/pull/17229)
- [OPT - fix docstring and improve tests slighly 17228](https://github.com/huggingface/transformers/pull/17228)
- [Align logits and labels in OPT 17237](https://github.com/huggingface/transformers/pull/17237)

4.19.1

Fix Trainer for Datasets that don't have dict items 17239

4.19.0

*Disclaimer*: this release is the first release with no Python 3.6 support.

OPT

The OPT model was proposed in [Open Pre-trained Transformer Language Models](https://arxiv.org/pdf/2205.01068) by Meta AI. OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.

* Add OPT  by younesbelkada in 17088

FLAVA

The FLAVA model was proposed in [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.

The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.

* [feat] Add FLAVA model  by apsdehal in 16654

YOLOS

The YOLOS model was proposed in [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. YOLOS proposes to just leverage the plain [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/main/en/model_doc/vit) for object detection, inspired by DETR. It turns out that a base-sized encoder-only Transformer can also achieve 42 AP on COCO, similar to DETR and much more complex frameworks such as Faster R-CNN.

* Add YOLOS  by NielsRogge in 16848

RegNet

The RegNet model was proposed in [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.

The authors design search spaces to perform Neural Architecture Search (NAS). They first start from a high dimensional search space and iteratively reduce the search space by empirically applying constraints based on the best-performing models sampled by the current search space.

* RegNet  by FrancescoSaverioZuppichini in 16188

TAPEX

The TAPEX model was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. TAPEX pre-trains a BART model to solve synthetic SQL queries, after which it can be fine-tuned to answer natural language questions related to tabular data, as well as performing table fact checking.

* Add TAPEX  by NielsRogge in 16473

Data2Vec: vision

The Data2Vec model was proposed in [data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/pdf/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

The vision model is added in v4.19.0.

* [Data2Vec] Add data2vec vision  by patrickvonplaten in 16760
* Add Data2Vec for Vision in TF  by sayakpaul in 17008

FSDP integration in Trainer

PyTorch recently upstreamed the Fairscale FSDP into PyTorch Distributed with additional optimizations. This PR is aimed at integrating it into Trainer API.

It enables Distributed Training at Scale. It's a wrapper for sharding Module parameters across data parallel workers. This is inspired by Xu et al. as well as the ZeRO Stage 3 from DeepSpeed.
PyTorch FSDP will focus more on production readiness and long-term support. This includes better integration with ecosystems and improvements on performance, usability, reliability, debuggability and composability.

* PyTorch FSDP integration in Trainer  by pacman100 in 17136

Training scripts

New example scripts were added for image classification and semantic segmentation. Both now have versions that leverage the Trainer API and Accelerate.

* Add image classification script, no trainer  by NielsRogge in 16727
* Add semantic script no trainer, v2  by NielsRogge in 16788
* Add semantic script, trainer  by NielsRogge in 16834

Documentation in Spanish

To continue democratizing good machine learning, we're making the Transformers documentation more accessible to non-English speakers; starting with Spanish (572M speakers worldwide).

- Added es version of language_modeling.mdx doc by jQuinRivero in 17021
- Spanish translation of the file philosophy.mdx by jkmg in 16922
- Documentation: Spanish translation of fast_tokenizers.mdx by jloayza10 in 16882
- Translate index.mdx (to ES) and add Spanish models to quicktour.mdx examples by omarespejel in 16685
- Spanish translation of the file multilingual.mdx by SimplyJuanjo in 16329
- Added spanish translation of autoclass_tutorial. by Duedme in 17069
- Fix style error in Spanish docs by osanseviero in 17197

Improvements and bugfixes

* [modeling_utils] rearrange text  by stas00 in 16632
* Added Annotations for PyTorch models  by anmolsjoshi in 16619
* Allow the same config in the auto mapping  by sgugger in 16631
* Update no_trainer scripts with new Accelerate functionalities  by muellerzr in 16617
* Fix doc example  by NielsRogge in 16448
* Add inputs vector to calculate metric method  by lmvasque in 16461
* [megatron-bert-uncased-345m] fix conversion  by stas00 in 16639
* Remove parent/child tests in auto model tests  by sgugger in 16653
* Updated _load_pretrained_model_low_mem to check if keys are in the state_dict  by FrancescoSaverioZuppichini in 16643
* Update Support image on README.md  by BritneyMuller in 16615
* bert: properly mention deprecation of TF2 conversion script  by stefan-it in 16171
* add vit tf doctest with add_code_sample_docstrings  by johko in 16636
* Fix error in doc of `DataCollatorWithPadding`  by secsilm in 16662
* Fix QA sample  by ydshieh in 16648
* TF generate refactor - Beam Search  by gante in 16374
*  Add tests for no_trainer and fix existing examples  by muellerzr in 16656
* only load state dict when the checkpoint is not None  by laurahanu in 16673
* [Trainer] tf32 arg doc  by stas00 in 16674
* Update audio examples with MInDS-14  by stevhliu in 16633
* add a warning in `SpmConverter` for sentencepiece's model using the byte fallback feature   by SaulLu in 16629
* Fix some doc examples in task summary  by ydshieh in 16666
* Jia multi gpu eval  by liyongsea in 16428
* Generate: min length can't be larger than max length  by gante in 16668
* fixed crash when deleting older checkpoint and a file f"{checkpoint_prefix}-*" exist  by sadransh in 16686
* [Doctests] Correct task summary  by patrickvonplaten in 16644
* Add Doc Test for BERT  by vumichien in 16523
* Fix t5 shard on TPU Pods  by agemagician in 16527
* update decoder_vocab_size when resizing embeds  by patil-suraj in 16700
* Fix TF_MASKED_LM_SAMPLE  by ydshieh in 16698
* Rename the method test_torchscript  by ydshieh in 16693
* Reduce memory leak in _create_and_check_torchscript  by ydshieh in 16691
* Enable more test_torchscript  by ydshieh in 16679
* Don't push checkpoints to hub in `no_trainer` scripts  by muellerzr in 16703
* Private repo TrainingArgument  by nbroad1881 in 16707
* Handle image_embeds in ViltModel  by ydshieh in 16696
* Improve PT/TF equivalence test  by ydshieh in 16557
* Fix example logs repeating themselves  by muellerzr in 16669
* [Bart] correct doc test  by patrickvonplaten in 16722
* Add Doc Test GPT-2  by ArEnSc in 16439
* Only call get_output_embeddings when tie_word_embeddings is set  by smelm in 16667
* Update run_translation_no_trainer.py  by raki-1203 in 16652
* Qdqbert example add benchmark script with ORT-TRT  by shangz-ai in 16592
* Replace assertion with exception  by anmolsjoshi in 16720
* Change the chunk_iter function to handle  by Narsil in 16730
* Remove duplicate header  by sgugger in 16732
* Moved functions to pytorch_utils.py  by anmolsjoshi in 16625
* TF: remove set_tensor_by_indices_to_value  by gante in 16729
* Add Doc Tests for Reformer PyTorch  by hiromu166 in 16565
* [FlaxSpeechEncoderDecoder] Fix input shape bug in weights init  by sanchit-gandhi in 16728
* [FlaxWav2Vec2Model] Fix bug in attention mask  by sanchit-gandhi in 16725
* add Bigbird ONNX config  by vumichien in 16427
* TF generate: handle case without cache in beam search  by gante in 16704
* Fix decoding score comparison when using logits processors or warpers  by bryant1410 in 10638
* [Doctests] Fix all T5 doc tests  by patrickvonplaten in 16646
* Fix 16660 (tokenizers setters of ids of special tokens)  by davidleonfdez in 16661
* [from_pretrained] refactor find_mismatched_keys  by stas00 in 16706
* Add Doc Test for GPT-J  by ArEnSc in 16507
* Fix and improve CTRL doctests  by jeremyadamsfisher in 16573
* [modeling_utils] better explanation of ignore keys  by stas00 in 16741
* CI: setup-dependent pip cache  by gante in 16751
* Reduce Funnel PT/TF diff  by ydshieh in 16744
* Add defensive check for config num_labels and id2label  by sgugger in 16709
* Add self training code for text classification  by tuvuumass in 16738
* [self-scheduled ci] explain where dependencies are  by stas00 in 16757
* Fixup no_trainer examples scripts and add more tests  by muellerzr in 16765
* [Doctest] added doctest changes for electra  by bhadreshpsavani in 16675
* Enabling `Tapex` in table question answering pipeline.  by Narsil in 16663
* [Flax `.from_pretrained`] Raise a warning if model weights are not in float32  by sanchit-gandhi in 16762
* Fix batch size in evaluation loop  by sgugger in 16763
* Make nightly install dev accelerate  by muellerzr in 16783
* [deepspeed / m2m_100] make deepspeed zero-3 work with layerdrop  by stas00 in 16717
* Kill async pushes when calling push_to_hub with blocking=True  by sgugger in 16755
* Improve image classification example  by NielsRogge in 16585
* [SpeechEncoderDecoderModel] Fix bug in reshaping labels  by sanchit-gandhi in 16748
*  Fix issue avoid-missing-comma found at https://codereview.doctor  by code-review-doctor in #16768
* [trainer / deepspeed] fix hyperparameter_search  by stas00 in 16740
* [modeling utils] revamp `from_pretrained(..., low_cpu_mem_usage=True)` + tests  by stas00 in 16657
* Fix PT TF ViTMAE  by ydshieh in 16766
* Update README.md  by NielsRogge in 16797
* Pin Jax to last working release  by sgugger in 16808
* CI: non-remote GH Actions now use a python venv  by gante in 16789
* TF generate refactor - XLA sample  by gante in 16713
* Raise error and suggestion when using custom optimizer with Fairscale or Deepspeed  by allanj in 16786
* Create empty venv on cache miss  by gante in 16816
* [ViT, BEiT, DeiT, DPT] Improve code  by NielsRogge in 16799
* [Quicktour Audio] Improve && remove ffmpeg dependency  by patrickvonplaten in 16723
* fix megatron bert convert state dict naming  by Codle in 15820
* use base_version to check torch version in torch_less_than_1_11  by nbroad1881 in 16806
* Allow passing encoder_ouputs as tuple to EncoderDecoder Models  by jsnfly in 16814
* Refactor issues with yaml  by LysandreJik in 16772
* fix _setup_devices in case where there is no torch.distributed package in build  by dlwh in 16821
* Clean up semantic segmentation tests  by NielsRogge in 16801
* Fix `LayoutLMv2` tokenization docstrings  by qqaatw in 16187
* Wav2 vec2 phoneme ctc tokenizer optimisation  by ArthurZucker in 16817
* [Flax] improve large model init and loading   by patil-suraj in 16148
* Some tests misusing assertTrue for comparisons fix  by code-review-doctor in 16771
* Type hints added for TFMobileBert  by Dahlbomii in 16505
* fix `rum_clm.py` seeking text column name twice  by dandelin in 16624
* Add onnx export of models with a multiple choice classification head  by echarlaix in 16758
* [ASR Pipeline] Correct init docs  by patrickvonplaten in 16833
* Add doc about `attention_mask` on gpt2   by wiio12 in 16829
* TF: Add sigmoid activation function  by gante in 16819
* Correct Logging of Eval metric to Tensorboard  by Jeevesh8 in 16825
* replace `Speech2TextTokenizer` by `Speech2TextFeatureExtractor` in some docstrings  by SaulLu in 16835
* Type hints added to Speech to Text  by Dahlbomii in 16506
* Improve test_pt_tf_model_equivalence on PT side  by ydshieh in 16731
* Add support for bitsandbytes  by manuelciosici in 15622
* [Typo] Fix typo in modeling utils  by patrickvonplaten in 16840
* add DebertaV2 fast tokenizer  by mingboiz in 15529
* Fixing return type tensor with `num_return_sequences>1`.  by Narsil in 16828
* [modeling_utils] use less cpu memory with sharded checkpoint loading  by stas00 in 16844
* [docs] fix url  by stas00 in 16860
* Fix custom init sorting script  by sgugger in 16864
* Fix multiproc metrics in no_trainer examples  by muellerzr in 16865
* Long QuestionAnsweringPipeline fix.  by Narsil in 16778
* t5: add conversion script for T5X to FLAX  by stefan-it in 16853
* tiny tweak to allow BatchEncoding.token_to_char when token doesn't correspond to chars  by ghlai9665 in 15901
* Adding support for `array` key in raw dictionnaries in ASR pipeline.  by Narsil in 16827
* Return input_ids in ImageGPT feature extractor  by sgugger in 16872
* Use ACT2FN to fetch ReLU activation  by eldarkurtic in 16874
* Fix GPT-J onnx conversion  by ChainYo in 16780
* Fix doctest list  by ydshieh in 16878
* New features for CodeParrot training script  by loubnabnl in 16851
* Add missing entries in mappings  by ydshieh in 16857
* TF: rework XLA generate tests  by gante in 16866
* Minor fixes/improvements in `convert_file_size_to_int`  by mariosasko in 16891
* Add doc tests for Albert and Bigbird  by vumichien in 16774
* Add OnnxConfig for ConvBERT  by ChainYo in 16859
* TF: XLA repetition penalty  by gante in 16879
* Changes in create_optimizer to support tensor parallelism with SMP  by cavdard in 16880
* [DocTests] Fix some doc tests  by patrickvonplaten in 16889
* add bigbird typo fixes  by ChainYo in 16897
* Fix doc test quicktour dataset  by patrickvonplaten in 16929
* Add missing ckpt in config docs  by ydshieh in 16900
* Fix PyTorch RAG tests GPU OOM  by ydshieh in 16881
* Fix RemBertTokenizerFast  by ydshieh in 16933
* TF: XLA logits processors - minimum length, forced eos, and forced bos  by gante in 16912
* TF: XLA Logits Warpers  by gante in 16899
* added deit onnx config  by rushic24 in 16887
* TF: XLA stable softmax  by gante in 16892
* Replace deprecated logger.warn with warning  by sanchit-gandhi in 16876
*  Fix issue probably-meant-fstring found at https://codereview.doctor  by code-review-doctor in #16913
* Limit the use of PreTrainedModel.device  by sgugger in 16935
* apply torch int div to layoutlmv2  by ManuelFay in 15457
* FIx Iterations for decoder  by agemagician in 16934
* Add onnx config for RoFormer  by skrsna in 16861
* documentation: some minor clean up  by mingboiz in 16850
* Fix RuntimeError message format  by ftnext in 16906
* use original loaded keys to find mismatched keys  by tricktreat in 16920
* [Research] Speed up evaluation for XTREME-S  by anton-l in 16785
* Fix HubertRobustTest PT/TF equivalence test on GPU  by ydshieh in 16943
* Misc. fixes for Pytorch QA examples:  by searchivarius in 16958
* [HF Argparser] Fix parsing of optional boolean arguments  by NielsRogge in 16946
* Fix `distributed_concat` with scalar tensor  by Yard1 in 16963
* Update custom_models.mdx  by mishig25 in 16964
* Fix add-new-model-like when model doesn't support all frameworks  by sgugger in 16966
* Fix multiple deletions of the same files in save_pretrained  by sgugger in 16947
* Fixup no_trainer save logic  by muellerzr in 16968
* Fix doc notebooks links  by sgugger in 16969
* Fix check_all_models_are_tested  by ydshieh in 16970
* Add -e flag to some GH workflow yml files  by ydshieh in 16959
* Update tokenization_bertweet.py  by datquocnguyen in 16941
* Update check_models_are_tested to deal with Windows path  by ydshieh in 16973
* Add parameter --config_overrides for run_mlm_wwm.py  by conan1024hao in 16961
* Rename a class to reflect framework pattern AutoModelXxx -> TFAutoModelXxx  by amyeroberts in 16993
* set eos_token_id to None to generate until max length  by ydshieh in 16989
* Fix savedir for by epoch  by muellerzr in 16996
* Update README to latest release  by sgugger in 16997
* use scale=1.0 in floats_tensor called in speech model testers  by ydshieh in 17007
* Update all require decorators to use skipUnless when possible  by muellerzr in 16999
* TF: XLA bad words logits processor and list of processors  by gante in 16974
* Make create_extended_attention_mask_for_decoder static method  by pbelevich in 16893
* Update README_zh-hans.md  by tarzanwill in 16977
* Updating variable names.  by Narsil in 16445
* Revert "Updating variable names.  by Narsil in 16445)" 
* Replace dict/BatchEncoding instance checks by Mapping  by sgugger in 17014
* Result of new doc style with fixes  by sgugger in 17015
* Add a check on config classes docstring checkpoints  by ydshieh in 17012
* Add translating guide  by omarespejel in 17004
* update docs of length_penalty  by manandey in 17022
* [FlaxGenerate] Fix bug in decoder_start_token_id  by sanchit-gandhi in 17035
* Fx with meta  by michaelbenayoun in 16836
* [Flax(Speech)EncoderDecoder] Fix bug in `decoder_module`  by sanchit-gandhi in 17036
* Fix typo 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants