Skip to content

Conversation

@pyup-bot
Copy link
Contributor

This PR pins transformers to the latest release 4.20.1.

Changelog

4.20.1

This patch releases fixes a bug in the OPT models and makes Transformers compatible with `huggingface_hub` version 0.8.1.

* Add final_layer_norm to OPT model 17785
* Prepare transformers for v0.8.0 huggingface-hub release 17716

4.20.0

Big model inference

You can now use the big model inference of Accelerate directly in any call to `from_pretrained` by specifying `device_map="auto"` (or your own `device_map`). It will automatically load the model taking advantage of your GPU(s) then offloading what doesn't fit in RAM, or even on the hard drive if you don't have RAM. Your model can then be used normally for inference without anything else to do.

py
from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained(
"bigscience/T0pp", revision="sharded", device_map="auto"
)


* Use Accelerate in `from_pretrained` for big model inference  by sgugger in 17341

BLOOM

The BLOOM model has been proposed with its various versions through the [BigScience Workshop](https://bigscience.huggingface.co/). The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code. 

* BLOOM   by younesbelkada in 17474

CvT

The Convolutional vision Transformer (CvT) improves the Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

* Add CvT  by NielsRogge and AnugunjNaman in 17299

GPT Neo-X

GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile, whose weights are made freely and openly available to the public through a permissive license. GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models.

* Adding GPT-NeoX-20B  by zphang in 16659

LayoutLMv3

LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).

* Add LayoutLMv3  by NielsRogge in 17060

LeViT

LeViT improves the Vision Transformer (ViT) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.

* Adding LeViT Model by Facebook  by AnugunjNaman in 17466

LongT5

LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. It is capable of handling input sequences of a length up to 16,384 tokens.

* Add `LongT5` model  by stancld in 16792

M-CTC-T

The M-CTC-T model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.

* M-CTC-T Model  by cwkeam in 16402

Trajectory Transformer

This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from actions, states and rewards from all previous timesteps. This model will treat all these elements together as one big sequence (a trajectory).

* Add trajectory transformer  by CarlCochet in 17141

Wav2Vec2-Conformer

The Wav2Vec2-Conformer is an updated version of fairseq S2T: Fast Speech-to-Text. It requires more parameters than Wav2Vec2, but also yields an improved word error rate.

* [Wav2Vec2Conformer] Official release  by patrickvonplaten in 17709
* Add Wav2Vec2Conformer  by patrickvonplaten in 16812

TensorFlow implementations

Data2VecVision for semantic segmentation, OPT and Swin are now available in TensorFlow.
* Add TFData2VecVision for semantic segmentation  by sayakpaul in 17271
* Opt in flax and tf  by ArthurZucker in 17388
* Add Tensorflow Swin model  by amyeroberts in 16988

Flax implementations

OPT is now available in Flax.
* Opt in flax and tf  by ArthurZucker in 17388

Documentation translation in Italian and Portuguese

A community effort has been started to translate the documentation in two new languages: Italian and Portuguese.

* Translation/italian: added pipeline_tutorial.mdx [Issue: 17459]  by nickprock in 17507
* Add installation.mdx Italian translation  by mfumanelli in 17530
* Setup for Italian translation and add quicktour.mdx translation  by mfumanelli in 17472
* Adding the Portuguese version of the tasks/token_classification.mdx documentation  by jonatasgrosman in 17492
* Adding the Portuguese version of the tasks/sequence_classification.mdx documentation  by jonatasgrosman in 17352
* [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial  by Fellip15 in 17076
* Added translation of installation.mdx to Portuguese Issue 16824  by rzimmerdev in 16979

Improvements and bugfixes

* Sort the model doc Toc Alphabetically  by sgugger in 17723
* normalize keys_to_ignore  by stas00 in 17722
* CLI: Add flag to push TF weights directly into main  by gante in 17720
* Update requirements.txt  by jeffra in 17719
* Revert "Change push CI to run on workflow_run event  by ydshieh in 17692)"
* Documentation: RemBERT fixes  by stefan-it in 17641
* Change push CI to run on workflow_run event  by ydshieh in 17692
* fix tolerance for a bloom slow test  by younesbelkada in 17634
* [LongT5] disable model parallel test  by patil-suraj in 17702
* FX function refactor  by michaelbenayoun in 17625
* Add `BloomForSequenceClassification` and `BloomForTokenClassification` classes  by haileyschoelkopf in 17639
* Swin main layer  by amyeroberts in 17693
* Include a comment to reflect Amy's contributions  by sayakpaul in 17689
* Rag end2end new  by shamanez in 17650
* [LongT5] Rename checkpoitns  by patrickvonplaten in 17700
* Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference  by jianan-gu in 17153
* Fix doc builder Dockerfile  by ydshieh in 17435
* Add FP16 Support for SageMaker Model Parallel  by haohanchen-yagao in 17386
* enable cpu distribution training using mpirun  by sywangyi in 17570
* Add Ray's scope to training arguments  by BramVanroy in 17629
* Update modeling_gpt_neox.py  by willfrey in 17575
* Fix dtype getter  by sgugger in 17668
* explicitly set utf8 for Windows  by BramVanroy in 17664
* Fixed documentation typo, parameter name is evaluation_strategy, not eval_strategy  by sainttttt in 17669
* Add Visual Question Answering (VQA) pipeline  by sijunhe in 17286
* Fix typo in adding_a_new_model README  by ayushtues in 17679
* Avoid GPU OOM for a TF Rag test  by ydshieh in 17638
* fix typo from emtpy to empty  by domenicrosati in 17643
* [Generation Test] Make fast test actually fast  by patrickvonplaten in 17661
* [Data2Vec] Speed up test  by patrickvonplaten in 17660
* [BigBirdFlaxTests] Make tests slow  by patrickvonplaten in 17658
* update README.md  by loubnabnl in 17657
* 🐛 Properly raise `RepoNotFoundError` when not authenticated  by SBrandeis in 17651
* Fixes 17128 .  by mygithubid1 in 17356
* Fix dtype getters  by sgugger in 17656
* Add skip logic for attentions test - Levit  by amyeroberts in 17633
* Enable crop_center method to handle (W, H, C) images  by alaradirik in 17626
* Move Clip image utils to image_utils.py  by alaradirik in 17628
* Skip tests until bug is fixed.  by sgugger in 17646
* Translation/autoclass  by mfumanelli in 17615
* didn't exist in pt-1.9  by stas00 in 17644
* convert assertion to raised exception in debertav2  by sam-h-bean in 17619
* Pre-build DeepSpeed  by ydshieh in 17607
* [modeling_utils] torch_dtype/auto floating dtype fixes  by stas00 in 17614
* Running a pipeline of `float16`.  by Narsil in 17637
* fix use_amp rename after pr 17138  by stas00 in 17636
* Fix very long job failure text in Slack report  by ydshieh in 17630
* Adding `top_k` argument to `text-classification` pipeline.  by Narsil in 17606
* Mention in the doc we drop support for fairscale  by sgugger in 17610
* Use shape_list to safely get shapes for Swin  by amyeroberts in 17591
* Add ONNX support for ConvNeXT  by regisss in 17627
* Add ONNX support for ResNet  by regisss in 17585
* has_attentions - consistent test skipping logic and tf tests  by amyeroberts in 17495
* CLI: Print all different tensors on exception  by gante in 17612
* TF: Merge PT and TF behavior for Bart when no decoder_input_ids are passed  by gante in 17593
* Fix telemetry URL  by sgugger in 17608
* CLI: Properly detect encoder-decoder models  by gante in 17605
* Fix link for community notebooks  by ngoquanghuy99 in 17602
* Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch  by jianan-gu in 17138
* fix `train_new_from_iterator` in the case of byte-level tokenizers  by SaulLu in 17549
* Explicit versions in docker files  by ydshieh in 17586
* CLI: add stricter automatic checks to `pt-to-tf`  by gante in 17588
* fix  by ydshieh in 17589
* quicktour.mdx en -> pt translation  by vitorfrois in 17074
* Fx support for Deberta-v[1-2], Hubert and LXMERT  by michaelbenayoun in 17539
* Add examples telemetry  by sgugger in 17552
* Fix gendered sentence in Spanish translation by omarespejel in 17558
* Fix circular import in onnx.utils  by sgugger in 17577
* Use latest stable PyTorch/DeepSpeed for Push & Scheduled CI  by ydshieh in 17417
* Remove circular imports in layoutlm/__init__.py  by regisss in 17576
* Add magic method to our TF models to convert datasets with column inference  by Rocketknight1 in 17160
* [deepspeed / testing] reset global state  by stas00 in 17553
* Remove RuntimeErrors for NaN-checking in 20B  by zphang in 17563
* fix integration test levit  by AnugunjNaman in 17555
* [deepspeed] fix load_best_model test  by stas00 in 17550
* Update index.mdx  by BritneyMuller in 17547
* Clean imports to fix test_fetcher  by sgugger in 17531
* Update run_glue_no_trainer.py  by bofenghuang in 17546
* Fix all offload and MP tests  by sgugger in 17533
* Fix bug - layer names and activation from previous refactor  by amyeroberts in 17524
* Add support for Perceiver ONNX export  by deutschmn in 17213
* Allow from transformers import TypicalLogitsWarper  by teticio in 17477
* Add Gated-SiLU to T5  by DanielHesslow in 17420
* Update URL for Hub PR docs  by lewtun in 17532
* fix OPT-Flax CI tests   by ArthurZucker in 17512
* [trainer/deepspeed] load_best_model (reimplement re-init)  by stas00 in 17151
* Implemented loss for training AudioFrameClassification  by MorenoLaQuatra in 17513
* Update configuration_auto.py  by kamalkraj in 17527
* Check list of models in the main README and sort it  by sgugger in 17517
* Fix  when Accelerate is not installed  by sgugger in 17518
* Clean README in post release job as well.  by sgugger in 17519
* Fix CI tests hang forever  by ydshieh in 17471
* Print more library versions in CI  by ydshieh in 17384
* Split push CI into 2 workflows  by ydshieh in 17369
* Fix Tapas tests  by ydshieh in 17510
* CLI: tool to convert PT into TF weights and open hub PR  by gante in 17497
* Fix flakey no-trainer test  by muellerzr in 17515
* Deal with the error when task is regression  by fireindark707 in 16330
* Fix CTRL tests  by ydshieh in 17508
* Fix LayoutXLMProcessorTest  by ydshieh in 17506
* Debug LukeForMaskedLM  by Ryou0634 in 17499
* Fix MP and CPU offload tests for Funnel and GPT-Neo  by sgugger in 17503
* Exclude Databricks from notebook env  by sgugger in 17496
* Fix `tokenizer` type annotation in `pipeline(...)`  by willfrey in 17500
* Refactor classes to inherit from nn.Module instead of nn.Sequential  by amyeroberts in 17493
* Fix wav2vec2 export onnx model with attention_mask error  by nilboy in 16004
* Add warning when using older version of torch for ViltFeatureExtractor  by xhluca in 16756
* Fix typo of variable names for key and query projection layer  by Kyeongpil in 17155
* Fixed wrong error message for missing weight file  by 123jimin in 17216
* Add OnnxConfig for SqueezeBert iss17314  by Ruihua-Fang in 17315
* [GPT2Tokenizer] Fix GPT2 with bos token  by patrickvonplaten in 17498
* [Json configs] Make json prettier for all saved tokenizer files & ensure same json format for all processors (tok + feat_extract)  by patrickvonplaten in 17457
* Accumulate tokens into batches in `PreTrainedTokenizerBase.add_tokens()`  by Witiko in 17119
* Add HF.co for PRs / Issues regarding specific model checkpoints  by patrickvonplaten in 17485
* Fix checkpoint name  by ydshieh in 17484
* Docker image build in parallel  by ydshieh in 17434
* Added XLM onnx config  by nandwalritik in 17030
* Disk offload fix  by sgugger in 17428
* TF: GPT-2 generation supports left-padding  by gante in 17426
* Fix ViTMAEModelTester  by ydshieh in 17470
* [Generate] Fix output scores greedy search  by patrickvonplaten in 17442
* Fix nits  by omarespejel in 17349
* Fx support for multiple model architectures  by michaelbenayoun in 17393
* typo IBERT in __repr__ quant_mode  by scratchmex in 17398
* Fix typo (remove parenthesis)  by mikcnt in 17415
* Improve notrainer examples  by pacman100 in 17449
* [OPT] Fix bos token id default  by patrickvonplaten in 17441
* Fix model parallelism test  by sgugger in 17439
* Pin protobouf that breaks TensorBoard in PyTorch  by sgugger in 17440
* Spanish translation of the file preprocessing.mdx  by yharyarias in 16299
* Spanish translation of the files sagemaker.mdx and image_classification.mdx  by SimplyJuanjo in 17262
* Added es version of bertology.mdx doc  by jQuinRivero in 17255
* Wav2vec2 finetuning shared file system  by patrickvonplaten in 17423
* fix link in performance docs  by lvwerra in 17419
* Add link to Hub PR docs in model cards  by lewtun in 17421
* Upd AutoTokenizer.from_pretrained doc examples  by c00k1ez in 17416
* Support compilation via Torchdynamo, AOT Autograd, NVFuser  by anijain2305 in 17308
* Add test for new model parallelism features  by sgugger in 17401
* Make check_init script more robust and clean inits  by sgugger in 17408
* Fix README localizer script  by sgugger in 17407
* Fix expected value for OPT test `test_inference_no_head`  by ydshieh in 17395
* Clean up CLIP tests  by NielsRogge in 17380
* Enabling `imageGPT` auto feature extractor.  by Narsil in 16871
* Add support for `device_map="auto"` to OPT  by sgugger in 17382
* OPTForCausalLM lm_head input size should be config.word_embed_proj_dim  by vfbd in 17225
* Traced models serialization and torchscripting fix  by michaelbenayoun in 17206
* Fix Comet ML integration  by mxschmdt in 17381
* Fix cvt docstrings  by AnugunjNaman in 17367
* Correct & Improve Doctests for LayoutLMv2  by gnolai in 17168
* Fix CodeParrot training script  by loubnabnl in 17291
* Fix a typo relative_postion_if_large -> relative_position_if_large  by stancld in 17366
* Pin dill to fix examples  by sgugger in 17368
* [Test OPT] Add batch generation test opt  by patrickvonplaten in 17359
* Fix bug in Wav2Vec2 pretrain example  by ddobokki in 17326
* fix for 17292  by nadahlberg in 17293
* [Generation] Fix Transition probs  by patrickvonplaten in 17311
* [OPT] Run test in lower precision on GPU  by patrickvonplaten in 17353
* Adding `batch_size` test to QA pipeline.  by Narsil in 17330
* [BC] Fixing usage of text pairs  by Narsil in 17324
* [tests] fix copy-n-paste error  by stas00 in 17312
* Fix ci_url might be None  by ydshieh in 17332
* fix  by ydshieh in 17337
* Fix metric calculation in examples and setup tests to run on multi-gpu for no_trainer scripts  by muellerzr in 17331
* docs for typical decoding  by jadermcs in 17186
* Not send successful report  by ydshieh in 17329
* Fix test_t5_decoder_model_past_large_inputs  by ydshieh in 17320
* Add onnx export cuda support  by JingyaHuang in 17183
* Add Information Gain Filtration algorithm  by mraunak in 16953
* Fix typo  by kamalkraj in 17328
* remove  by ydshieh in 17325
* Accepting real pytorch device as arguments.  by Narsil in 17318
* Updating the docs for `max_seq_len` in QA pipeline  by Narsil in 17316
* [T5] Fix init in TF and Flax for pretraining  by patrickvonplaten in 17294
* Add type hints for ProphetNet (Pytorch)  by jQuinRivero in 17223
* fix  by patrickvonplaten in 17310
* [LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing  by caesar-one in 17112
* Add support for pretraining recurring span selection to Splinter  by jvcop in 17247
* Add PR author in CI report + merged by info  by ydshieh in 17298
* Fix dummy creation script  by sgugger in 17304
* Doctest longformer  by KMFODA in 16441
* [Test] Fix W2V-Conformer integration test  by patrickvonplaten in 17303
* Improve mismatched sizes management when loading a pretrained model  by regisss in 17257
* correct opt  by patrickvonplaten in 17301
* Rewrite TensorFlow train_step and test_step  by Rocketknight1 in 17057
* Fix tests of mixed precision now that experimental is deprecated  by Rocketknight1 in 17300
* fix retribert's `test_torch_encode_plus_sent_to_model`  by SaulLu in 17231
* [ConvNeXT] Fix drop_path_rate  by NielsRogge in 17280
* Fix wrong PT/TF categories in CI report  by ydshieh in 17272
* Fix missing job action button in CI report   by ydshieh in 17270
* Fix test_model_parallelization  by lkm2835 in 17249
* [Tests] Fix slow opt tests  by patrickvonplaten in 17282
* docs(transformers): fix typo  by k-zehnder in 17263
* logging documentation update  by sanderland in 17174
* Use the PR URL in CI report  by ydshieh in 17269
* Fix FlavaForPreTrainingIntegrationTest CI test  by ydshieh in 17232
* Better error in the Auto API when a dep is missing  by sgugger in 17289
* Make TrainerHyperParameterSigOptIntegrationTest slow test  by ydshieh in 17288
* Automatically sort auto mappings  by sgugger in 17250
* Mlflowcallback fix nonetype error  by orieg in 17171
* Align logits and labels in OPT  by MichelBartels in 17237
* Remove next sentence prediction from supported ONNX tasks  by lewtun in 17276
* CodeParrot data pretokenization  by loubnabnl in 16932
* Update codeparrot data preprocessing  by loubnabnl in 16944
* Updated checkpoint support for Sagemaker Model Parallel  by cavdard in 17219
* fixed bug in run_mlm_flax_stream.py  by KennethEnevoldsen in 17203
* [doc] performance/scalability revamp  by stas00 in 15723
* TF - Fix convnext classification example  by gante in 17261
* Fix obvious typos in flax decoder impl  by cloudhan in 17279
* Guide to create custom models in Spanish  by ignacioct in 17158
* Translated version of model_sharing.mdx doc to spanish  by Gerard-170 in 16184
* Add PR title to push CI report  by ydshieh in 17246
* Fix push CI channel  by ydshieh in 17242
* install dev. version of accelerate  by ydshieh in 17243
* Fix Trainer for Datasets that don't have dict items  by sgugger in 17239
* Handle copyright in add-new-model-like  by sgugger in 17218
* fix --gpus option for docker  by ydshieh in 17235
* Update self-push workflow  by ydshieh in 17177
* OPT - fix docstring and improve tests slighly  by patrickvonplaten in 17228
* OPT-fix  by younesbelkada in 17229
* Fix typo in bug report template  by fxmarty in 17178
* Black preview  by sgugger in 17217
* update BART docs  by patil-suraj in 17212
* Add test to ensure models can take int64 inputs  by Rocketknight1 in 17210

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* sayakpaul
 * Include a comment to reflect Amy's contributions (17689)
 * Add TFData2VecVision for semantic segmentation (17271)
* jianan-gu
 * Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference (17153)
 * Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch (17138)
* stancld
 * Add `LongT5` model (16792)
 * Fix a typo relative_postion_if_large -> relative_position_if_large (17366)
* mfumanelli
 * Translation/autoclass (17615)
 * Add installation.mdx Italian translation (17530)
 * Setup for Italian translation and add quicktour.mdx translation (17472)
* cwkeam
 * M-CTC-T Model (16402)
* zphang
 * Remove RuntimeErrors for NaN-checking in 20B (17563)
 * Adding GPT-NeoX-20B (16659)
* AnugunjNaman
 * fix integration test levit (17555)
 * Adding LeViT Model by Facebook (17466)
 * Fix cvt docstrings (17367)
* yharyarias
 * Spanish translation of the file preprocessing.mdx (16299)
* mraunak
 * Add Information Gain Filtration algorithm (16953)
* rzimmerdev
 * Added translation of installation.mdx to Portuguese Issue 16824 (16979)

4.19.4

Fixes the errors message when trying to access a repo that does not exist (started to break due to changes in Hub API).

[🐛]Properly raise RepoNotFoundError when not authenticated 17651[

4.19.3

This patch release fixes the install of protobuf when a user wants to do `pip install transformers[sentencepiece]`.

- Pin protobouf that breaks TensorBoard in PyTorch 17440

4.19.2

Patch release for the following PRs/commits:

- [OPT-fix 17229](https://github.com/huggingface/transformers/pull/17229)
- [OPT - fix docstring and improve tests slighly 17228](https://github.com/huggingface/transformers/pull/17228)
- [Align logits and labels in OPT 17237](https://github.com/huggingface/transformers/pull/17237)

4.19.1

Fix Trainer for Datasets that don't have dict items 17239

4.19.0

*Disclaimer*: this release is the first release with no Python 3.6 support.

OPT

The OPT model was proposed in [Open Pre-trained Transformer Language Models](https://arxiv.org/pdf/2205.01068) by Meta AI. OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.

* Add OPT  by younesbelkada in 17088

FLAVA

The FLAVA model was proposed in [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.

The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.

* [feat] Add FLAVA model  by apsdehal in 16654

YOLOS

The YOLOS model was proposed in [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. YOLOS proposes to just leverage the plain [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/main/en/model_doc/vit) for object detection, inspired by DETR. It turns out that a base-sized encoder-only Transformer can also achieve 42 AP on COCO, similar to DETR and much more complex frameworks such as Faster R-CNN.

* Add YOLOS  by NielsRogge in 16848

RegNet

The RegNet model was proposed in [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.

The authors design search spaces to perform Neural Architecture Search (NAS). They first start from a high dimensional search space and iteratively reduce the search space by empirically applying constraints based on the best-performing models sampled by the current search space.

* RegNet  by FrancescoSaverioZuppichini in 16188

TAPEX

The TAPEX model was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. TAPEX pre-trains a BART model to solve synthetic SQL queries, after which it can be fine-tuned to answer natural language questions related to tabular data, as well as performing table fact checking.

* Add TAPEX  by NielsRogge in 16473

Data2Vec: vision

The Data2Vec model was proposed in [data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/pdf/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

The vision model is added in v4.19.0.

* [Data2Vec] Add data2vec vision  by patrickvonplaten in 16760
* Add Data2Vec for Vision in TF  by sayakpaul in 17008

FSDP integration in Trainer

PyTorch recently upstreamed the Fairscale FSDP into PyTorch Distributed with additional optimizations. This PR is aimed at integrating it into Trainer API.

It enables Distributed Training at Scale. It's a wrapper for sharding Module parameters across data parallel workers. This is inspired by Xu et al. as well as the ZeRO Stage 3 from DeepSpeed.
PyTorch FSDP will focus more on production readiness and long-term support. This includes better integration with ecosystems and improvements on performance, usability, reliability, debuggability and composability.

* PyTorch FSDP integration in Trainer  by pacman100 in 17136

Training scripts

New example scripts were added for image classification and semantic segmentation. Both now have versions that leverage the Trainer API and Accelerate.

* Add image classification script, no trainer  by NielsRogge in 16727
* Add semantic script no trainer, v2  by NielsRogge in 16788
* Add semantic script, trainer  by NielsRogge in 16834

Documentation in Spanish

To continue democratizing good machine learning, we're making the Transformers documentation more accessible to non-English speakers; starting with Spanish (572M speakers worldwide).

- Added es version of language_modeling.mdx doc by jQuinRivero in 17021
- Spanish translation of the file philosophy.mdx by jkmg in 16922
- Documentation: Spanish translation of fast_tokenizers.mdx by jloayza10 in 16882
- Translate index.mdx (to ES) and add Spanish models to quicktour.mdx examples by omarespejel in 16685
- Spanish translation of the file multilingual.mdx by SimplyJuanjo in 16329
- Added spanish translation of autoclass_tutorial. by Duedme in 17069
- Fix style error in Spanish docs by osanseviero in 17197

Improvements and bugfixes

* [modeling_utils] rearrange text  by stas00 in 16632
* Added Annotations for PyTorch models  by anmolsjoshi in 16619
* Allow the same config in the auto mapping  by sgugger in 16631
* Update no_trainer scripts with new Accelerate functionalities  by muellerzr in 16617
* Fix doc example  by NielsRogge in 16448
* Add inputs vector to calculate metric method  by lmvasque in 16461
* [megatron-bert-uncased-345m] fix conversion  by stas00 in 16639
* Remove parent/child tests in auto model tests  by sgugger in 16653
* Updated _load_pretrained_model_low_mem to check if keys are in the state_dict  by FrancescoSaverioZuppichini in 16643
* Update Support image on README.md  by BritneyMuller in 16615
* bert: properly mention deprecation of TF2 conversion script  by stefan-it in 16171
* add vit tf doctest with add_code_sample_docstrings  by johko in 16636
* Fix error in doc of `DataCollatorWithPadding`  by secsilm in 16662
* Fix QA sample  by ydshieh in 16648
* TF generate refactor - Beam Search  by gante in 16374
*  Add tests for no_trainer and fix existing examples  by muellerzr in 16656
* only load state dict when the checkpoint is not None  by laurahanu in 16673
* [Trainer] tf32 arg doc  by stas00 in 16674
* Update audio examples with MInDS-14  by stevhliu in 16633
* add a warning in `SpmConverter` for sentencepiece's model using the byte fallback feature   by SaulLu in 16629
* Fix some doc examples in task summary  by ydshieh in 16666
* Jia multi gpu eval  by liyongsea in 16428
* Generate: min length can't be larger than max length  by gante in 16668
* fixed crash when deleting older checkpoint and a file f"{checkpoint_prefix}-*" exist  by sadransh in 16686
* [Doctests] Correct task summary  by patrickvonplaten in 16644
* Add Doc Test for BERT  by vumichien in 16523
* Fix t5 shard on TPU Pods  by agemagician in 16527
* update decoder_vocab_size when resizing embeds  by patil-suraj in 16700
* Fix TF_MASKED_LM_SAMPLE  by ydshieh in 16698
* Rename the method test_torchscript  by ydshieh in 16693
* Reduce memory leak in _create_and_check_torchscript  by ydshieh in 16691
* Enable more test_torchscript  by ydshieh in 16679
* Don't push checkpoints to hub in `no_trainer` scripts  by muellerzr in 16703
* Private repo TrainingArgument  by nbroad1881 in 16707
* Handle image_embeds in ViltModel  by ydshieh in 16696
* Improve PT/TF equivalence test  by ydshieh in 16557
* Fix example logs repeating themselves  by muellerzr in 16669
* [Bart] correct doc test  by patrickvonplaten in 16722
* Add Doc Test GPT-2  by ArEnSc in 16439
* Only call get_output_embeddings when tie_word_embeddings is set  by smelm in 16667
* Update run_translation_no_trainer.py  by raki-1203 in 16652
* Qdqbert example add benchmark script with ORT-TRT  by shangz-ai in 16592
* Replace assertion with exception  by anmolsjoshi in 16720
* Change the chunk_iter function to handle  by Narsil in 16730
* Remove duplicate header  by sgugger in 16732
* Moved functions to pytorch_utils.py  by anmolsjoshi in 16625
* TF: remove set_tensor_by_indices_to_value  by gante in 16729
* Add Doc Tests for Reformer PyTorch  by hiromu166 in 16565
* [FlaxSpeechEncoderDecoder] Fix input shape bug in weights init  by sanchit-gandhi in 16728
* [FlaxWav2Vec2Model] Fix bug in attention mask  by sanchit-gandhi in 16725
* add Bigbird ONNX config  by vumichien in 16427
* TF generate: handle case without cache in beam search  by gante in 16704
* Fix decoding score comparison when using logits processors or warpers  by bryant1410 in 10638
* [Doctests] Fix all T5 doc tests  by patrickvonplaten in 16646
* Fix 16660 (tokenizers setters of ids of special tokens)  by davidleonfdez in 16661
* [from_pretrained] refactor find_mismatched_keys  by stas00 in 16706
* Add Doc Test for GPT-J  by ArEnSc in 16507
* Fix and improve CTRL doctests  by jeremyadamsfisher in 16573
* [modeling_utils] better explanation of ignore keys  by stas00 in 16741
* CI: setup-dependent pip cache  by gante in 16751
* Reduce Funnel PT/TF diff  by ydshieh in 16744
* Add defensive check for config num_labels and id2label  by sgugger in 16709
* Add self training code for text classification  by tuvuumass in 16738
* [self-scheduled ci] explain where dependencies are  by stas00 in 16757
* Fixup no_trainer examples scripts and add more tests  by muellerzr in 16765
* [Doctest] added doctest changes for electra  by bhadreshpsavani in 16675
* Enabling `Tapex` in table question answering pipeline.  by Narsil in 16663
* [Flax `.from_pretrained`] Raise a warning if model weights are not in float32  by sanchit-gandhi in 16762
* Fix batch size in evaluation loop  by sgugger in 16763
* Make nightly install dev accelerate  by muellerzr in 16783
* [deepspeed / m2m_100] make deepspeed zero-3 work with layerdrop  by stas00 in 16717
* Kill async pushes when calling push_to_hub with blocking=True  by sgugger in 16755
* Improve image classification example  by NielsRogge in 16585
* [SpeechEncoderDecoderModel] Fix bug in reshaping labels  by sanchit-gandhi in 16748
*  Fix issue avoid-missing-comma found at https://codereview.doctor  by code-review-doctor in #16768
* [trainer / deepspeed] fix hyperparameter_search  by stas00 in 16740
* [modeling utils] revamp `from_pretrained(..., low_cpu_mem_usage=True)` + tests  by stas00 in 16657
* Fix PT TF ViTMAE  by ydshieh in 16766
* Update README.md  by NielsRogge in 16797
* Pin Jax to last working release  by sgugger in 16808
* CI: non-remote GH Actions now use a python venv  by gante in 16789
* TF generate refactor - XLA sample  by gante in 16713
* Raise error and suggestion when using custom optimizer with Fairscale or Deepspeed  by allanj in 16786
* Create empty venv on cache miss  by gante in 16816
* [ViT, BEiT, DeiT, DPT] Improve code  by NielsRogge in 16799
* [Quicktour Audio] Improve && remove ffmpeg dependency  by patrickvonplaten in 16723
* fix megatron bert convert state dict naming  by Codle in 15820
* use base_version to check torch version in torch_less_than_1_11  by nbroad1881 in 16806
* Allow passing encoder_ouputs as tuple to EncoderDecoder Models  by jsnfly in 16814
* Refactor issues with yaml  by LysandreJik in 16772
* fix _setup_devices in case where there is no torch.distributed package in build  by dlwh in 16821
* Clean up semantic segmentation tests  by NielsRogge in 16801
* Fix `LayoutLMv2` tokenization docstrings  by qqaatw in 16187
* Wav2 vec2 phoneme ctc tokenizer optimisation  by ArthurZucker in 16817
* [Flax] improve large model init and loading   by patil-suraj in 16148
* Some tests misusing assertTrue for comparisons fix  by code-review-doctor in 16771
* Type hints added for TFMobileBert  by Dahlbomii in 16505
* fix `rum_clm.py` seeking text column name twice  by dandelin in 16624
* Add onnx export of models with a multiple choice classification head  by echarlaix in 16758
* [ASR Pipeline] Correct init docs  by patrickvonplaten in 16833
* Add doc about `attention_mask` on gpt2   by wiio12 in 16829
* TF: Add sigmoid activation function  by gante in 16819
* Correct Logging of Eval metric to Tensorboard  by Jeevesh8 in 16825
* replace `Speech2TextTokenizer` by `Speech2TextFeatureExtractor` in some docstrings  by SaulLu in 16835
* Type hints added to Speech to Text  by Dahlbomii in 16506
* Improve test_pt_tf_model_equivalence on PT side  by ydshieh in 16731
* Add support for bitsandbytes  by manuelciosici in 15622
* [Typo] Fix typo in modeling utils  by patrickvonplaten in 16840
* add DebertaV2 fast tokenizer  by mingboiz in 15529
* Fixing return type tensor with `num_return_sequences>1`.  by Narsil in 16828
* [modeling_utils] use less cpu memory with sharded checkpoint loading  by stas00 in 16844
* [docs] fix url  by stas00 in 16860
* Fix custom init sorting script  by sgugger in 16864
* Fix multiproc metrics in no_trainer examples  by muellerzr in 16865
* Long QuestionAnsweringPipeline fix.  by Narsil in 16778
* t5: add conversion script for T5X to FLAX  by stefan-it in 16853
* tiny tweak to allow BatchEncoding.token_to_char when token doesn't correspond to chars  by ghlai9665 in 15901
* Adding support for `array` key in raw dictionnaries in ASR pipeline.  by Narsil in 16827
* Return input_ids in ImageGPT feature extractor  by sgugger in 16872
* Use ACT2FN to fetch ReLU activation  by eldarkurtic in 16874
* Fix GPT-J onnx conversion  by ChainYo in 16780
* Fix doctest list  by ydshieh in 16878
* New features for CodeParrot training script  by loubnabnl in 16851
* Add missing entries in mappings  by ydshieh in 16857
* TF: rework XLA generate tests  by gante in 16866
* Minor fixes/improvements in `convert_file_size_to_int`  by mariosasko in 16891
* Add doc tests for Albert and Bigbird  by vumichien in 16774
* Add OnnxConfig for ConvBERT  by ChainYo in 16859
* TF: XLA repetition penalty  by gante in 16879
* Changes in create_optimizer to support tensor parallelism with SMP  by cavdard in 16880
* [DocTests] Fix some doc tests  by patrickvonplaten in 16889
* add bigbird typo fixes  by ChainYo in 16897
* Fix doc test quicktour dataset  by patrickvonplaten in 16929
* Add missing ckpt in config docs  by ydshieh in 16900
* Fix PyTorch RAG tests GPU OOM  by ydshieh in 16881
* Fix RemBertTokenizerFast  by ydshieh in 16933
* TF: XLA logits processors - minimum length, forced eos, and forced bos  by gante in 16912
* TF: XLA Logits Warpers  by gante in 16899
* added deit onnx config  by rushic24 in 16887
* TF: XLA stable softmax  by gante in 16892
* Replace deprecated logger.warn with warning  by sanchit-gandhi in 16876
*  Fix issue probably-meant-fstring found at https://codereview.doctor  by code-review-doctor in #16913
* Limit the use of PreTrainedModel.device  by sgugger in 16935
* apply torch int div to layoutlmv2  by ManuelFay in 15457
* FIx Iterations for decoder  by agemagician in 16934
* Add onnx config for RoFormer  by skrsna in 16861
* documentation: some minor clean up  by mingboiz in 16850
* Fix RuntimeError message format  by ftnext in 16906
* use original loaded keys to find mismatched keys  by tricktreat in 16920
* [Research] Speed up evaluation for XTREME-S  by anton-l in 16785
* Fix HubertRobustTest PT/TF equivalence test on GPU  by ydshieh in 16943
* Misc. fixes for Pytorch QA examples:  by searchivarius in 16958
* [HF Argparser] Fix parsing of optional boolean arguments  by NielsRogge in 16946
* Fix `distributed_concat` with scalar tensor  by Yard1 in 16963
* Update custom_models.mdx  by mishig25 in 16964
* Fix add-new-model-like when model doesn't support all frameworks  by sgugger in 16966
* Fix multiple deletions of the same files in save_pretrained  by sgugger in 16947
* Fixup no_trainer save logic  by muellerzr in 16968
* Fix doc notebooks links  by sgugger in 16969
* Fix check_all_models_are_tested  by ydshieh in 16970
* Add -e flag to some GH workflow yml files  by ydshieh in 16959
* Update tokenization_bertweet.py  by datquocnguyen in 16941
* Update check_models_are_tested to deal with Windows path  by ydshieh in 16973
* Add parameter --config_overrides for run_mlm_wwm.py  by conan1024hao in 16961
* Rename a class to reflect framework pattern AutoModelXxx -> TFAutoModelXxx  by amyeroberts in 16993
* set eos_token_id to None to generate until max length  by ydshieh in 16989
* Fix savedir for by epoch  by muellerzr in 16996
* Update README to latest release  by sgugger in 16997
* use scale=1.0 in floats_tensor called in speech model testers  by ydshieh in 17007
* Update all require decorators to use skipUnless when possible  by muellerzr in 16999
* TF: XLA bad words logits processor and list of processors  by gante in 16974
* Make create_extended_attention_mask_for_decoder static method  by pbelevich in 16893
* Update README_zh-hans.md  by tarzanwill in 16977
* Updating variable names.  by Narsil in 16445
* Revert "Updating variable names.  by Narsil in 16445)" 
* Replace dict/BatchEncoding instance checks by Mapping  by sgugger in 17014
* Result of new doc style with fixes  by sgugger in 17015
* Add a check on config classes docstring checkpoints  by ydshieh in 17012
* Add translating guide  by omarespejel in 17004
* update docs of length_penalty  by manandey in 17022
* [FlaxGenerate] Fix bug in decoder_start_token_id  by sanchit-gandhi in 17035
* Fx with meta  by michaelbenayoun in 16836
* [Flax(Speech)EncoderDecoder] Fix bug in `decoder_module`  by sanchit-gandhi in 17036
* Fix typo in RetriBERT docstring  by mpoemsl in 17018
* add torch.no_grad when in eval mode  by JunnYu in 17020
* Disable Flax GPU tests on push  by sgugger in 17042
* Clean up vision tests  by NielsRogge in 17024
* [Trainer] Move logic for checkpoint loading into separate methods for easy overriding  by calpt in 17043
* Update no_trainer examples to use new logger  by muellerzr in 17044
* Fix no_trainer examples to properly calculate the number of samples  by muellerzr in 17046
* Allow all imports from transformers  by LysandreJik in 17050
* Make the sacremoses dependency optional  by LysandreJik in 17049
* Clean up setup.py  by sgugger in 17045
* [T5 Tokenizer] Model has no fixed position ids - there is no hardcode…  by patrickvonplaten in 16990
* [FlaxBert] Add ForCausalLM  by sanchit-gandhi in 16995
* Move test model folders  by ydshieh in 17034
* Make Trainer compatible with sharded checkpoints  by sgugger in 17053
* Remove Python and use v2 action  by sgugger in 17059
* Fix RNG reload in resume training from epoch checkpoint  by sgugger in 17055
* Remove device parameter from create_extended_attention_mask_for_decoder  by pbelevich in 16894
* Fix hashing for deduplication  by thomasw21 in 17048
* Skip RoFormer ONNX test if rjieba not installed  by lewtun in 16981
* Remove masked image modeling from BEIT ONNX export  by lewtun in 16980
* Make sure telemetry arguments are not returned as unused kwargs  by sgugger in 17063
* Type hint complete Albert model file.  by karthikrangasai in 16682
* Deprecate model templates  by sgugger in 17062
* Update to build via git for accelerate  by muellerzr in 17084
* Allow saved_model export of TFCLIPModel in save_pretrained  by seanmor5 in 16886
* Fix DeBERTa `token_type_ids`  by deutschmn in 17082
*  📝 open fresh PR for pipeline doctests  by stevhliu in 17073
* minor change on TF Data2Vec test  by ydshieh in 17085
* type hints for pytorch models  by robotjellyzone in 17064
* Add type hints for BERTGeneration  by robsmith155 in 17047
* Fix MLflowCallback and add support for MLFLOW_EXPERIMENT_NAME  by orieg in 17091
* Remove torchhub test  by sgugger in 17097
* fix missing "models" in pipeline test module  by ydshieh in 17090
* Fix link to example scripts  by stevhliu in 17103
* Fix self-push CI report path in cat  by ydshieh in 17111
* Added BigBirdPegasus onnx config  by nandwalritik in 17104
* split single_gpu and multi_gpu  by ydshieh in 17083
* LayoutLMv2Processor: ensure 1-to-1 mapping between images and samples in case of overflowing tokens  by ghlai9665 in 17092
* Add type hints for BigBirdPegasus and Data2VecText PyTorch models  by robsmith155 in 17123
* add `mobilebert` onnx configs  by manandey in 17029
* [WIP] Fix Pyright static type checking by replacing if-else imports with try-except  by d-miketa in 16578
* Add the auto_find_batch_size capability from Accelerate into Trainer  by muellerzr in 17068
* Fix MLflowCallback end_run() and add support for tags and nested runs  by orieg in 17130
* Fix all docs for accelerate install directions  by muellerzr in 17145
* LogSumExp trick `question_answering` pipeline.  by Narsil in 17143
* train args defaulting None marked as Optional  by d-miketa in 17156
* [trainer] sharded _load_best_model  by stas00 in 17150
* [Deepspeed] add many more models to the model zoo test   by stas00 in 12695
* Fixing the output of code examples in the preprocessing chapter  by HallerPatrick in 17162
* missing file  by stas00 in 17164
* Add MLFLOW_FLATTEN_PARAMS support in MLflowCallback  by orieg in 17148
* Fix template init  by sgugger in 17163
* MobileBERT tokenizer tests  by leondz in 16896
* [M2M100 doc] remove duplicate example  by patil-suraj in 17175
* Extend Transformers Trainer Class to Enable PyTorch SGD/Adagrad Optimizers for Training  by jianan-gu in 17154
* propagate "attention_mask" dtype for "use_past" in OnnxConfig.generate_dummy_inputs  by arampacha in 17105
* Convert image to rgb for clip model  by hengkuanwee in 17101
* Add missing RetriBERT tokenizer tests  by mpoemsl in 17017
* [WIP] Enable reproducibility for distributed trainings  by hasansalimkanmaz in 16907
* Remove unnecessary columns for all dataset types in `Trainer`  by Yard1 in 17166
* Fix LED documentation  by manuelciosici in 17181
* Ensure tensors are at least 1d for pad and concat  by Yard1 in 17179
* add shift_tokens_right in FlaxMT5  by patil-suraj in 17188
* Remove columns before passing to data collator  by Yard1 in 17187
* Remove duplicated os.path.join  by shijie-wu in 17192
* Fix contents in index.mdx to match docs' sidebar  by omarespejel in 17198
* ViT and Swin symbolic tracing with torch.fx  by michaelbenayoun in 17182
* migrate azure blob for beit checkpoints  by donglixp in 16902
* Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning)  by sayakpaul in 17194

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* anmolsjoshi
 * Added Annotations for PyTorch models (16619)
 * Replace assertion with exception (16720)
 * Moved functions to pytorch_utils.py (16625)
* vumichien
 * Add Doc Test for BERT (16523)
 * add Bigbird ONNX config (16427)
 * Add doc tests for Albert and Bigbird (16774)
* tuvuumass
 * Add self training code for text classification (16738)
* sayakpaul
 * Add Data2Vec for Vision in TF (17008)
* robotjellyzone
 * type hints for pytorch models (17064)
* d-miketa
 * [WIP] Fix Pyright static type checking by replacing if-else imports with try-except (16578)
 * train args defaulting None marked as Optional (17156)

4.18.0

New model additions

You'll notice that we are starting to add several older models in vision. This is because those models are used as backbones in recent architectures. While we could rely on existing libraries for such pretrained models, we will ultimately need some support for those backbones in PyTorch/TensorFlow and Jax, and there is currently no library that supports those three frameworks. This is why we are starting to add those models to Transformers directly (here ResNet and VAN)

GLPN

The GLPN model was proposed in [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. GLPN combines [SegFormer](https://huggingface.co/docs/transformers/main/en/model_doc/segformer)’s hierarchical mix-Transformer with a lightweight decoder for monocular depth estimation. The proposed decoder shows better performance than the previously proposed decoders, with considerably less computational complexity.

* Add GLPN by NielsRogge in https://github.com/huggingface/transformers/pull/16199

ResNet

The ResNet model was proposed in [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. Our implementation follows the small changes made by [Nvidia](https://catalog.ngc.nvidia.com/orgs/nvidia/resources/resnet_50_v1_5_for_pytorch), we apply the stride=2 for downsampling in bottleneck’s 3x3 conv and not in the first 1x1. This is generally known as “ResNet v1.5”.

ResNet introduced residual connections, they allow to train networks with an unseen number of layers (up to 1000). ResNet won the 2015 ILSVRC & COCO competition, one important milestone in deep computer vision.

* Resnet by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15770

VAN

The VAN model was proposed in [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.

This paper introduces a new attention layer based on convolution operations able to capture both local and distant relationships. This is done by combining normal and large kernel convolution layers. The latter uses a dilated convolution to capture distant correlations.

* Visual Attention Network (VAN) by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16027

VisionTextDualEncoder

The [VisionTextDualEncoderModel](https://huggingface.co/docs/transformers/main/en/model_doc/vision-text-dual-encoder#transformers.VisionTextDualEncoderModel) can be used to initialize a vision-text dual encoder model with any pretrained vision autoencoding model as the vision encoder (e.g. [ViT](https://huggingface.co/docs/transformers/main/en/model_doc/vit), [BEiT](https://huggingface.co/docs/transformers/main/en/model_doc/beit), [DeiT](https://huggingface.co/docs/transformers/main/en/model_doc/deit)) and any pretrained text autoencoding model as the text encoder (e.g. [RoBERTa](https://huggingface.co/docs/transformers/main/en/model_doc/roberta), [BERT](https://huggingface.co/docs/transformers/main/en/model_doc/bert)). Two projection layers are added on top of both the vision and text encoder to project the output embeddings to a shared latent space. The projection layers are randomly initialized so the model should be fine-tuned on a downstream task. This model can be used to align the vision-text embeddings using CLIP like contrastive image-text training and then can be used for zero-shot vision tasks such image-classification or retrieval.

In [LiT: Zero-Shot Transfer with Locked-image Text Tuning](https://arxiv.org/abs/2111.07991) it is shown how leveraging pre-trained (locked/frozen) image and text model for contrastive learning yields significant improvment on new zero-shot vision tasks such as image classification or retrieval.

* add VisionTextDualEncoder and CLIP fine-tuning script by patil-suraj in https://github.com/huggingface/transformers/pull/15701

DiT

DiT was proposed in [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. DiT applies the self-supervised objective of [BEiT](https://huggingface.co/docs/transformers/main/en/model_doc/beit) (BERT pre-training of Image Transformers) to 42 million document images, allowing for state-of-the-art results on tasks including:

- document image classification: the [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip/) dataset (a collection of 400,000 images belonging to one of 16 classes).
- document layout analysis: the [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) dataset (a collection of more than 360,000 document images constructed by automatically parsing PubMed XML files).
- table detection: the [ICDAR 2019 cTDaR](https://github.com/cndplab-founder/ICDAR2019_cTDaR) dataset (a collection of 600 training images and 240 testing images).

* Add Document Image Transformer (DiT) by NielsRogge in https://github.com/huggingface/transformers/pull/15984

DPT

The DPT model was proposed in [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun. DPT is a model that leverages the [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/main/en/model_doc/vit) as backbone for dense prediction tasks like semantic segmentation and depth estimation.

* Add DPT by NielsRogge in https://github.com/huggingface/transformers/pull/15991

Checkpoint sharding

Large models are becoming more and more the norm and having a checkpoint in a single file is challenging for several reasons:
- it's tougher to upload/download files bigger than 20/30 GB efficiently
- the whole checkpoint might not fit into RAM even if you have enough GPU memory

That's why the `save_pretrained` method will know automatically shard a checkpoint in several files when you go above a 10GB threshold for PyTorch models. `from_pretrained` will handle such sharded checkpoints as if there was only one file.

* Checkpoint sharding by sgugger in https://github.com/huggingface/transformers/pull/16343

TensorFlow implementations

GPT-J and ViTMAE are now available in TensorFlow.

* Add TF implementation of GPT-J by stancld in https://github.com/huggingface/transformers/pull/15623
* Add TF ViT MAE by sayakpaul in https://github.com/huggingface/transformers/pull/16255

Documentation guides

The IA migration is wrapped up with a new conceptual guide available.

* Create concept guide section by stevhliu in https://github.com/huggingface/transformers/pull/16369

Improvements and bugfixes

* Fix doc links in release utils by sgugger in https://github.com/huggingface/transformers/pull/15903
* Fix a TF Vision Encoder Decoder test by ydshieh in https://github.com/huggingface/transformers/pull/15896
* [Fix link in pipeline doc] by patrickvonplaten in https://github.com/huggingface/transformers/pull/15906
* Fix and improve REALM fine-tuning by qqaatw in https://github.com/huggingface/transformers/pull/15297
* Freeze FlaxWav2Vec2 Feature Encoder by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15873
* The tests were not updated after the addition of `torch.diag` by Narsil in https://github.com/huggingface/transformers/pull/15890
* [Doctests] Fix ignore bug and add more doc tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/15911
* Enabling MaskFormer in pipelines by Narsil in https://github.com/huggingface/transformers/pull/15917
* Minor fixes for MaskFormer by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15916
* Add vision models to doc tests by NielsRogge in https://github.com/huggingface/transformers/pull/15905
* Fix 15898 by davidleonfdez in https://github.com/huggingface/transformers/pull/15928
* Update doc test readme by patrickvonplaten in https://github.com/huggingface/transformers/pull/15926
* Re-enabling all fast pipeline tests. by Narsil in https://github.com/huggingface/transformers/pull/15924
* Support CLIPTokenizerFast for CLIPProcessor by cosmoquester in https://github.com/huggingface/transformers/pull/15913
* Updating the slow tests: by Narsil in https://github.com/huggingface/transformers/pull/15893
* Adding `MODEL_FOR_INSTANCE_SEGMENTATION_MAPPING` by Narsil in https://github.com/huggingface/transformers/pull/15934
* Add missing support for Flax XLM-RoBERTa by versae in https://github.com/huggingface/transformers/pull/15900
* [FlaxT5 Example] Fix flax t5 example pretraining by patrickvonplaten in https://github.com/huggingface/transformers/pull/15835
* Do not change the output from tuple to list - to match PT's version by ydshieh in https://github.com/huggingface/transformers/pull/15918
* Tests for MaskFormerFeatureExtractor's post_process*** methods by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15929
* Constrained Beam Search [*With* Disjunctive Decoding] by cwkeam in https://github.com/huggingface/transformers/pull/15761
* [LayoutLMv2] Update requires_backends of feature extractor by NielsRogge in https://github.com/huggingface/transformers/pull/15941
* Made MaskFormerModelTest faster by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15942
* [Bug Fix] Beam search example in docs fails & a fix (integrating `max_length` in `BeamScorer.finalize()`) by cwkeam in https://github.com/huggingface/transformers/pull/15555
* remove re-defination of FlaxWav2Vec2ForCTCModule by patil-suraj in https://github.com/huggingface/transformers/pull/15965
* Support modern list type hints in HfArgumentParser by konstantinjdobler in https://github.com/huggingface/transformers/pull/15951
* Backprop Test for Freeze FlaxWav2Vec2 Feature Encoder by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15938
* Fix Embedding Module Bug in Flax Models by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15920
* Make is_thing_map in Feature Extractor post_process_panoptic_segmentation defaults to all instances by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15954
* Update training scripts docs by stevhliu in https://github.com/huggingface/transformers/pull/15931
* Set scale_embedding to False in some TF tests by ydshieh in https://github.com/huggingface/transformers/pull/15952
* Fix LayoutLMv2 test by NielsRogge in https://github.com/huggingface/transformers/pull/15939
* [Tests] Fix ViTMAE integration test by NielsRogge in https://github.com/huggingface/transformers/pull/15949
* Returning outputs only when asked for for MaskFormer. by Narsil in https://github.com/huggingface/transformers/pull/15936
* Speedup T5 Flax training by using Numpy instead of JAX for batch shuffling by yhavinga in https://github.com/huggingface/transformers/pull/15963
* Do a pull in case docs were updated during build by sgugger in https://github.com/huggingface/transformers/pull/15922
* Fix TFEncDecModelTest - Pytorch device by ydshieh in https://github.com/huggingface/transformers/pull/15979
* [Env Command] Add hf hub to env version command by patrickvonplaten in https://github.com/huggingface/transformers/pull/15981
* TF: Update multiple choice example by gante in https://github.com/huggingface/transformers/pull/15868
* TF generate refactor - past without encoder outputs by gante in https://github.com/huggingface/transformers/pull/15944
* Seed _get_train_sampler's generator with arg seed to improve reproducibility by dlwh in https://github.com/huggingface/transformers/pull/15961
* Add `ForInstanceSegmentation` models to `image-segmentation` pipelines by Narsil in https://github.com/huggingface/transformers/pull/15937
* [Doctests] Move doctests to new GPU & Fix bugs by patrickvonplaten in https://github.com/huggingface/transformers/pull/15969
* Removed an outdated check about hdf5_version by ydshieh in https://github.com/huggingface/transformers/pull/16011
* Swag example: Update doc format by gante in https://github.com/huggingface/transformers/pull/16014
* Fix github actions comment by LysandreJik in https://github.com/huggingface/transformers/pull/16009
* Simplify release utils by sgugger in https://github.com/huggingface/transformers/pull/15921
* Make `pos` optional in `PerceiverAudioPreprocessor` to avoid crashing `PerceiverModel` operation by basilevh in https://github.com/huggingface/transformers/pull/15972
* Fix MaskFormer failing test on master by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16012
* Fix broken code blocks in README.md by upura in https://github.com/huggingface/transformers/pull/15967
* Use tiny models for get_pretrained_model in TFEncoderDecoderModelTest by ydshieh in https://github.com/huggingface/transformers/pull/15989
* Add ONNX export for ViT by lewtun in https://github.com/huggingface/transformers/pull/15658
* Add FlaxBartForCausalLM by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15995
* add doctests for bart like seq2seq models by patil-suraj in https://github.com/huggingface/transformers/pull/15987
* Fix warning message in ElectraForCausalLM by pbelevich in https://github.com/huggingface/transformers/pull/16023
* Freeze Feature Encoder in FlaxSpeechEncoderDecoder by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15997
* Fix dependency error message in ServeCommand by andstor in https://github.com/huggingface/transformers/pull/16033
* [Docs] Improve PyTorch, Flax generate API by patrickvonplaten in https://github.com/huggingface/transformers/pull/15988
* [Tests] Add attentions_option to ModelTesterMixin by NielsRogge in https://github.com/huggingface/transformers/pull/15909
* [README] fix url for Preprocessing tutorial by patil-suraj in https://github.com/huggingface/transformers/pull/16042
* Fix Bug in Flax-Speech-Encoder-Decoder Test by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16041
* Fix  TFDebertaV2ConvLayer in TFDebertaV2Model by ydshieh in https://github.com/huggingface/transformers/pull/16031
* Build the doc in a seperate folder then move it by sgugger in https://github.com/huggingface/transformers/pull/16020
* Don't compute metrics in LM examples on TPU by sgugger in https://github.com/huggingface/transformers/pull/16029
* TF: Unpack model inputs through a decorator  by gante in https://github.com/huggingface/transformers/pull/15907
* Fix Bug in Flax Seq2Seq Models by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16021
* DeBERTa/DeBERTa-v2/SEW Support for torch 1.11 by LysandreJik in https://github.com/huggingface/transformers/pull/16043
* support new marian models by patil-suraj in https://github.com/huggingface/transformers/pull/15831
* Fix duplicate arguments passed to dummy inputs in ONNX export by lewtun in https://github.com/huggingface/transformers/pull/16045
* FIX: updating doc/example for fine-tune for downstream Token Classification by davidsbatista in https://github.com/huggingface/transformers/pull/16063
* Fix a TF test name (LayoutLMModelTest) by ydshieh in https://github.com/huggingface/transformers/pull/16061
* Move QDQBert in just PyTorch block by sgugger in https://github.com/huggingface/transformers/pull/16062
* Remove assertion over possible activation functions in DistilBERT by mfuntowicz in https://github.com/huggingface/transformers/pull/16066
* Fix torch-scatter version by LysandreJik in https://github.com/huggingface/transformers/pull/16072
* Add type annotations for BERT and copies by Rocketknight1 in https://github.com/huggingface/transformers/pull/16074
* Adding type hints for TFRoBERTa by Rocketknight1 in https://github.com/huggingface/transformers/pull/16057
* Make sure `'torch.dtype'` has str-type value in config and all nested dicts for JSON serializability  by feifang24 in https://github.com/huggingface/transformers/pull/16065
* Run daily doctests without time-out at least once by patrickvonplaten in https://github.com/huggingface/transformers/pull/16077
* Add soft length regulation for sequence generation by kevinpl07 in https://github.com/huggingface/transformers/pull/15245
* Update troubleshoot guide by stevhliu in https://github.com/huggingface/transformers/pull/16001
* Add type annotations for ImageGPT by johnnv1 in https://github.com/huggingface/transformers/pull/16088
* Rebuild deepspeed by LysandreJik in https://github.com/huggingface/transformers/pull/16081
* Add missing type hints for all flavors of RoBERTa PyTorch models. by ChainYo in https://github.com/huggingface/transformers/pull/16086
* [Fix doc example] FSMT by ydshieh in https://github.com/huggingface/transformers/pull/16085
* Audio/vision task guides by stevhliu in https://github.com/huggingface/transformers/pull/15808
* [ZeRO] Fixes issue with embedding resize by jeffra in https://github.com/huggingface/transformers/pull/16093
* [Deepspeed] add support for bf16 mode by stas00 in https://github.com/huggingface/transformers/pull/14569
* Change unpacking of TF Bart inputs to use decorator by osanseviero in https://github.com/huggingface/transformers/pull/16094
* add unpack_inputs decorator to mbart tf by Abdelrhman-Hosny in https://github.com/huggingface/transformers/pull/16097
* Add type annotations for segformer pytorch by p-mishra1 in https://github.com/huggingface/transformers/pull/16099
* Add unpack_input decorator to ViT model by johnnv1 in https://github.com/huggingface/transformers/pull/16102
* Add type hints to XLM model (PyTorch) by jbrry in https://github.com/huggingface/transformers/pull/16108
* Add missing type hints for all flavors of LayoutLMv2 PyTorch models. by ChainYo in https://github.com/huggingface/transformers/pull/16089
* Add TFCamembertForCausalLM and ONNX integration test by lewtun in https://github.com/huggingface/transformers/pull/16073
* Fix and document Zero Shot Image Classification by osanseviero in https://github.com/huggingface/transformers/pull/16079
* Fix Loading of Flax(Speech)EncoderDecoderModel kwargs from PreTrained Encoder-Decoder Checkpoints by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16056
* Update convert_marian_to_pytorch.py by jorgtied in https://github.com/huggingface/transformers/pull/16124
* Make TF pt-tf equivalence test more aggressive by ydshieh in https://github.com/huggingface/transformers/pull/15839
* Fix ProphetNetTokenizer by ydshieh in https://github.com/huggingface/transformers/pull/16082
* Change unpacking of TF mobilebert inputs to use decorator by vumichien in https://github.com/huggingface/transformers/pull/16110
* Steps strategy fix for PushtoHubCallback and changed docstring by merveenoyan in https://github.com/huggingface/transformers/pull/16138
* [ViTMAE] Add copied from statements and fix prefix by NielsRogge in https://github.com/huggingface/transformers/pull/16119
* Spanish translation of the file training.mdx by yharyarias in https://github.com/huggingface/transformers/pull/16047
* Added missing type hints - ELECTRA PyTorch by kamalkraj in https://github.com/huggingface/transformers/pull/16103
* Added missing type hints - Deberta V1 and V2 by kamalkraj in https://github.com/huggingface/transformers/pull/16105
* [Fix doc example] Fix checkpoint name in docstring example by ydshieh in https://github.com/huggingface/transformers/pull/16083
* Better input variable naming for OpenAI (TF) by bhavika in https://github.com/huggingface/transformers/pull/16129
* Improve model variable naming - CLIP [TF]  by bhavika in https://github.com/huggingface/transformers/pull/16128
* Add type hints for TFDistilBert by PepijnBoers in https://github.com/huggingface/transformers/pull/16107
* Choose framework for ONNX export by michaelbenayoun in https://github.com/huggingface/transformers/pull/16018
* Add type hints for Luke in PyTorch by bhavika in https://github.com/huggingface/transformers/pull/16111
* Add type hints for PoolFormer in Pytorch by soomiles in https://github.com/huggingface/transformers/pull/161

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants