Skip to content

Conversation

@pyup-bot
Copy link
Contributor

@pyup-bot pyup-bot commented Dec 1, 2022

This PR pins transformers to the latest release 4.25.1.

Changelog

4.24.0

ESM-2/ESMFold

ESM-2 and ESMFold are new state-of-the-art Transformer protein language and folding models from Meta AI's Fundamental AI Research Team (FAIR). ESM-2 is trained with a masked language modeling objective, and it can be easily transferred to sequence and token classification tasks for proteins. Checkpoints exist in various sizes, [from 8 million parameters up to a huge 15 billion parameter model](https://huggingface.co/models?other=esm).

ESMFold is a state-of-the-art single sequence protein folding model which produces high accuracy predictions significantly faster. Unlike previous protein folding tools like AlphaFold2 and `openfold`, ESMFold uses a pretrained protein language model to generate token embeddings that are used as input to the folding model, and so does not require a multiple sequence alignment (MSA) of related proteins as input. As a result, proteins can be folded in a single forward pass of the model without requiring any external databases or search/alignment tools to be present at inference time. This hugely reduces the time and compute requirements for folding.

Transformer protein language models were introduced in the paper [Biological structure and function emerge from scaling
unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.

ESMFold was introduced in the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://www.biorxiv.org/content/10.1101/2022.07.20.500902v1) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, and Alexander Rives.

* Add ESMFold by Rocketknight1 in 19977
* TF port of ESM  by Rocketknight1 in 19587

LiLT

LiLT allows to combine any pre-trained RoBERTa text encoder with a lightweight Layout Transformer, to enable [LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)-like document understanding for many languages.

It was proposed in [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) by Jiapeng Wang, Lianwen Jin, Kai Ding.

* Add LiLT  by NielsRogge in 19450

Flan-T5

FLAN-T5 is an enhanced version of T5 that has been finetuned on a mixture of tasks.

It was released in the paper [Scaling Instruction-Finetuned Language Models](https://arxiv.org/pdf/2210.11416.pdf) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei.

* Add `flan-t5` documentation page  by younesbelkada in 19892

Table Transformer

Table Transformer is a model that can perform table extraction and table structure recognition from unstructured documents based on the DETR architecture.

It was proposed in [PubTables-1M: Towards comprehensive table extraction from unstructured documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham. 

* Add table transformer [v2]  by NielsRogge in 19614

Contrastive search decoding

Contrastive search decoding is a new state-of-the-art generation method which aims at reducing the repetitive patterns in which generation models often fall.

It was introduced in [A Contrastive Framework for Neural Text Generation](https://arxiv.org/abs/2202.06417) by Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier.

* Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py  by gmftbyGMFTBY in 19477

Safety and security

We continue to explore the new serialization format not using Pickle via the [safetensors](https://github.com/huggingface/safetensors) library, this time by adding support for TensorFlow models. More checkpoints have been converted to this format. Support is still experimental.

* Safetensors tf  by sgugger in 19900

🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly.

* 🚨🚨🚨  TF: Remove `TFWrappedEmbeddings` (breaking: TF embedding initialization updated for encoder-decoder models)  by gante in 19263
* 🚨🚨🚨 [Breaking change] Deformable DETR intermediate representations  by Narsil in 19678

Bugfixes and improvements

* Enabling custom TF signature draft  by dimitreOliveira in 19249
* Fix whisper for `pipeline`  by ArthurZucker in 19482
* Extend `nested_XXX` functions to mappings/dicts.  by Guillem96 in 19455
* Syntax issues (lines 126, 203)  by kant in 19444
* CLI: add import protection to datasets  by gante in 19470
* Fix `TFGroupViT` CI  by ydshieh in 19461
* Fix doctests for `DeiT` and `TFGroupViT`  by ydshieh in 19466
* Update `WhisperModelIntegrationTests.test_large_batched_generation`  by ydshieh in 19472
* [Swin] Replace hard-coded batch size to enable dynamic ONNX export  by lewtun in 19475
* TF: TFBart embedding initialization  by gante in 19460
* Make LayoutLM tokenizers independent from BertTokenizer  by arnaudstiegler in 19351
* Make `XLMRoberta` model and config independent from `Roberta`  by asofiaoliveira in 19359
* Fix `get_embedding` dtype at init. time  by ydshieh in 19473
* Decouples `XLMProphet` model from `Prophet`   by srhrshr in 19406
* Implement multiple span support for DocumentQuestionAnswering  by ankrgyl in 19204
* Add warning in `generate` & `device_map=auto` & half precision models  by younesbelkada in 19468
* Update TF whisper doc tests  by amyeroberts in 19484
* Make bert_japanese and cpm independent of their inherited modules  by Davidy22 in 19431
* Added tokenize keyword arguments to feature extraction pipeline  by quancore in 19382
* Adding the README_es.md and reference to it in the others files readme  by Oussamaosman02 in 19427
* [CvT] Tensorflow implementation  by mathieujouffroy in 18597
* `python3` instead of `python` in push CI setup job  by ydshieh in 19492
* Update PT to TF CLI for audio models  by amyeroberts in 19465
* New  by IMvision12 in 19481
* Fix `OPTForQuestionAnswering` doctest  by ydshieh in 19479
* Use a dynamic configuration for circleCI tests  by sgugger in 19325
* Add multi-node conditions in trainer_qa.py and trainer_seq2seq.py  by regisss in 19502
* update doc for perf_train_cpu_many  by sywangyi in 19506
* Avoid Push CI failing to report due to many commits being merged  by ydshieh in 19496
* [Doctest] Add `configuration_bert.py` to doctest  by ydshieh in 19485
* Fix whisper doc  by ArthurZucker in 19518
* Syntax issue (line 497, 526) Documentation by kant in 19442
* Fix pytorch seq2seq qa  by FilipposVentirozos in 19258
* Add depth estimation pipeline  by nandwalritik in 18618
* Adding links to pipelines parameters documentation  by AndreaSottana in 19227
* fix MarkupLMProcessor option flag  by davanstrien in 19526
* [Doctest] Bart configuration update  by imarekkus in 19524
* Remove roberta dependency from longformer fast tokenizer  by sirmammingtonham in 19501
* made tokenization_roformer independent of bert  by naveennamani in 19426
* Remove bert fast dependency from electra  by Threepointone4 in 19520
* [Examples] Fix typos in run speech recognition seq2seq  by sanchit-gandhi in 19514
* [X-CLIP] Fix doc tests  by NielsRogge in 19523
* Update Marian config default vocabulary size  by gante in 19464
* Make `MobileBert` tokenizers independent from `Bert`  by 501Good in 19531
* [Whisper] Fix gradient checkpointing  by sanchit-gandhi in 19538
* Syntax issues (paragraphs 122, 130, 147, 155) Documentation: sgugger  by kant in 19437
* using trunc_normal for weight init & cls_token  by mathieujouffroy in 19486
* Remove `MarkupLMForMaskedLM` from `MODEL_WITH_LM_HEAD_MAPPING_NAMES`  by ydshieh in 19534
* Image transforms library  by amyeroberts in 18520
* Add a decorator for flaky tests  by sgugger in 19498
* [Doctest] Add `configuration_yolos.py`  by daspartho in 19539
* Albert config update  by imarekkus in 19541
* [Doctest] `Add configuration_whisper.py`  by daspartho in 19540
* Throw an error if `getattribute_from_module` can't find anything  by ydshieh in 19535
* [Doctest] Beit Config for doctest  by daspartho in 19542
* Create the arange tensor on device for enabling CUDA-Graph for Clip Encoder  by RezaYazdaniAminabadi in 19503
* [Doctest] GPT2 Config for doctest  by daspartho in 19549
* Build Push CI images also in a daily basis  by ydshieh in 19532
* Fix checkpoint used in `MarkupLMConfig`  by ydshieh in 19547
* add a note to whisper docs clarifying support of long-form decoding  by akashmjn in 19497
* [Whisper] Freeze params of encoder  by sanchit-gandhi in 19527
* [Doctest] Fixing the Doctest for imageGPT config  by RamitPahwa in 19556
* [Doctest] Fixing mobile bert configuration doctest  by RamitPahwa in 19557
* [Doctest] Fixing doctest bert_generation configuration  by Threepointone4 in 19558
* [Doctest] DeiT Config for doctest  by daspartho in 19560
* [Doctest] Reformer Config for doctest  by daspartho in 19562
* [Doctest] RoBERTa Config for doctest  by daspartho in 19563
* [Doctest] Add `configuration_vit.py`  by daspartho in 19561
* [Doctest] bloom config update  by imarekkus in 19566
* [Re-submit] Compute true loss Flax examples  by duongna21 in 19504
* Fix fairseq wav2vec2-xls-r pretrained weights conversion scripts  by heatz123 in 19508
* [Doctest] CTRL config  by imarekkus in 19574
* [Doctest] Add configuration_canine.py  by IzicTemi in 19575
* [Doctests] Config files for `ViTMAE` and `YOSO`  by grgkaran03 in 19567
* Added type hints to `DebertaV2ForMultipleChoice` Pytorch  by IMvision12 in 19536
* [WIP] Add type hints for Lxmert (TF)  by elusenji in 19441
* [Doctests] add `configuration_blenderbot.py`  by grgkaran03 in 19577
* [Doctest] adds trajectory_transformer config to Docs test  by SD-13 in 19586
* [Doctests] add `configuration_blenderbot_small.py`  by grgkaran03 in 19589
* [Doctest] Swin V2 Config for doctest  by daspartho in 19595
* [Doctest] Swin Config for doctest  by daspartho in 19594
* [Doctest]  SEW Config for doctest  by daspartho in 19597
* [Doctest] UniSpeech Config for doctest  by daspartho in 19596
* [Doctest] SEW-D Config for doctest  by daspartho in 19598
* [Doctest] fix doc test for megatron bert  by RamitPahwa in 19600
* Adding type hints for TFXLnet  by thliang01 in 19344
* [Doctest] Add `configuration_bigbird_pegasus.py` and `configuration_big_bird.py`  by Xabilahu in 19606
* Cast masks to np.unit8 before converting to PIL.Image.Image  by amyeroberts in 19616
* [Whisper] Don't return attention mask in feat extractor  by sanchit-gandhi in 19521
* [Time Series Transformer] Add doc tests  by NielsRogge in 19607
* fix BLOOM ONNX config  by NouamaneTazi in 19573
* Fix `test_tf_encode_plus_sent_to_model` for `TAPAS`  by ydshieh in 19559
* Allow usage of TF Text BertTokenizer on TFBertTokenizer to make it servable on TF Serving  by piEsposito in 19590
* add gloo backend support for CPU DDP  by sywangyi in 19555
* Fix `ImageToTextPipelineTests.test_small_model_tf`  by ydshieh in 19565
* Fix `FlaubertTokenizer`  by ydshieh in 19552
* Visual Bert config for doctest  by ztjhz in 19605
* GPTTokenizer dependency removed from deberta class  by RamitPahwa in 19551
* xlm roberta config for doctest  by ztjhz in 19609
* Ernie config for doctest  by ztjhz in 19611
*  xlm roberta xl config for doctest  by ztjhz in 19610
* fix: small error  by 0xflotus in 19612
* Improve error messaging for ASR pipeline.  by Narsil in 19570
* [Doctest] LeViT Config for doctest  by daspartho in 19622
* [Doctest] DistilBERT Config for doctest  by daspartho in 19621
* [Whisper] Fix gradient checkpointing (again!)  by sanchit-gandhi in 19548
*  [Doctest] Add `configuration_resnet.py`  by daspartho in 19620
* Fix whisper doc  by ArthurZucker in 19608
* Sharding fails in TF when absolute scope was modified if `.` in layer name  by ArthurZucker in 19124
* [Doctest] Add configuration_vision_text_dual_encoder.py  by SD-13 in 19580
* [Doctest] Add configuration_vision_encoder_decoder.py  by SD-13 in 19583
* [Doctest] Add configuration_time_series_transformer.py  by SD-13 in 19582
* Tokenizer from_pretrained should not use local files named like tokenizer files  by sgugger in 19626
* [Doctest] CodeGen config for doctest  by AymenBer99 in 19633
* [Doctest] Add `configuration_data2vec_text.py`  by daspartho in 19636
* [Doctest] Conditional DETR config for doctest  by AymenBer99 in 19641
* [Doctest] XLNet config for doctest  by AymenBer99 in 19649
* [Doctest] Add `configuration_trocr.py`  by thliang01 in 19658
* Add doctest info in testingmdx  by ArthurZucker in 19623
* Add pillow to layoutlmv3 example requirements.txt  by Spacefish in 19663
* add return types for tf gptj, xlm, and xlnet  by sirmammingtonham in 19638
* Fix pipeline predict transform methods  by s-udhaya in 19657
* Type hints MCTCT  by rchan26 in 19618
* added type hints for Yolos Pytorch model  by WhiteWolf47 in 19545
* A few CI fixes for `DocumentQuestionAnsweringPipeline`  by ankrgyl in 19584
* Removed Bert interdependency from Funnel transformer  by mukesh663 in 19655
* fix warnings in deberta  by sanderland in 19458
* word replacement line 231  by shreem-123 in 19662
* [Doctest] Add configuration_transfo_xl.py  by thliang01 in 19651
* Update perf_train_gpu_one.mdx  by cakiki in 19676
* object-detection instead of object_detection  by Spacefish in 19677
* add return_tensor parameter for feature extraction  by ajsanjoaquin in 19257
* Fix code examples of DETR and YOLOS  by NielsRogge in 19669
* Revert "add return_tensor parameter for feature extraction  by sgugger in 19257)" 
* Fixed the docstring and type hint for forced_decoder_ids option in Ge…  by koreyou in 19640
* Add normalize to image transforms module  by amyeroberts in 19544
* [Doctest] Data2VecAudio Config for doctest  by daspartho in 19635
* Update ESM checkpoints to point to `facebook/`  by Rocketknight1 in 19675
* Removed XLMModel inheritance from FlaubertModel(torch+tf)  by D3xter1922 in 19432
* [Examples] make default preprocessing_num_workers=1  by Yang-YiFan in 19684
* [Doctest] Add configuration_convbert.py  by AymenBer99 in 19643
* [Doctest] Add configuration_realm.py  by ak04p in 19646
* Update CONTRIBUTING.md  by shreem-123 in 19689
*  [Doctest] Add `configuration_data2vec_vision.py`  by daspartho in 19637
* Fix some CI torch device issues for PyTorch 1.13  by ydshieh in 19681
* Fix checkpoint used in `VisualBertConfig` doc example  by ydshieh in 19692
* Fix dtype in radnomly initialized head  by sgugger in 19690
* fix tests  by ArthurZucker in 19670
* fix test whisper with new max length  by ArthurZucker in 19668
* check decoder_inputs_embeds is None before shifting labels  by ArthurZucker in 19671
* Fix docs  by NielsRogge in 19687
* update documentation  by ArthurZucker in 19706
* Improve DETR models  by NielsRogge in 19644
* Small fixes for TF-ESM1b and ESM-1b weight conversions  by Rocketknight1 in 19683
* Fix typo in perf docs  by cakiki in 19705
* Fix redundant normalization of OWL-ViT text embeddings  by alaradirik in 19712
* Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode  by falcaopetri in 18351
* [Doctest] CVT config for doctest  by AymenBer99 in 19695
* [Doctest] Add configuration_wav2vec2.py to documentation_tests.py  by juancopi81 in 19698
* ]Fixed pegasus config doctest  by mukesh663 in 19722
* fix seq2seqtrainer predict without labels  by IvanSedykh in 19721
* add return_tensors parameter for feature_extraction 2  by Narsil in 19707
* Improving `image-segmentation` pipeline tests.  by Narsil in 19710
* [Doctest] Adding config files for convnext  by soma2000-lang in 19717
* [Doctest] Fixing doctest `configuration_pegasus_x.py`  by mukesh663 in 19725
* Specify TF framework in TF-related pipeline tests  by ydshieh in 19719
* Add docs  by NielsRogge in 19729
* Fix activations being all the same module  by sgugger in 19728
* add `accelerate` support for `Whisper`  by younesbelkada in 19697
* Clean up deprecation warnings  by Davidy22 in 19654
* Repo utils test  by sgugger in 19696
* Add decorator to flaky test  by amyeroberts in 19674
* [Doctest] Add doctest for `FlavaConfig` and `FNetConfig`  by ndrohith09 in 19724
* Update contribution guide  by stevhliu in 19700
* [Doctest] Add wav2vec2_conformer for doctest  by juancopi81 in 19734
* [Doctest] XLM Config for doctest  by AymenBer99 in 19685
* [Doctest] Add `configuration_clip.py`  by daspartho in 19647
* [Doctest] GPTNeoConfig , GPTNeoXConfig , GPTNeoXJapaneseConfig  by ndrohith09 in 19741
* Update modeling_markuplm.py  by IMvision12 in 19723
* Fix issue 19300  by raghavanone in 19483
* [Doctest] Add `configuration_wavlm.py`  by juancopi81 in 19749
* Specify TF framework explicitly in more pipeline tests  by ydshieh in 19748
* Fix cache version file creation  by sgugger in 19750
* Image transforms add center crop  by amyeroberts in 19718
* [Doctest] Add `configuration_decision_transformer.py`  by Xabilahu in 19751
* [Doctest] Add `configuration_detr.py`  by Xabilahu in 19752
* Fixed spacing errors  by shreya24ag in 19754
* All broken links were fixed in contributing file  by mdfaizanahmed786 in 19760
* [Doctest] SpeechToTextTransformer Config for doctest  by daspartho in 19757
* [Doctest] SqueezeBERT Config for doctest  by daspartho in 19758
* [Doctest] SpeechToTextTransformer2 Config for doctest  by daspartho in 19756
* [Doctest] OpenAIGPTConfig and OPTConfig  by ndrohith09 in 19763
* `image-segmentation` pipeline: re-enable `small_model_pt` test.  by Narsil in 19716
* Update modeling_layoutlmv3.py  by IMvision12 in 19753
* adding key pair dataset  by rohit1998 in 19765
* Fix exception thrown using MishActivation  by chinoll in 19739
* [FLAX] Add dtype to embedding for gpt2 model  by merrymercy in 18462
* TF: sample generation compatible with XLA and dynamic batch sizes  by gante in 19773
* Install tf2onnx dev version  by ydshieh in 19755
* Fix docker image build  by ydshieh in 19759
* PT <-> TF for composite models  by ydshieh in 19732
* Add warning about restarting runtime to import errors  by Rocketknight1 in 19774
* Added support for multivariate independent emission heads  by kashif in 19453
* Update `ImageToTextPipelineTests.test_small_model_tf`  by ydshieh in 19785
* Make public versions of private tensor utils  by sgugger in 19775
* Update training.mdx  by ftorres16 in 19791
* [ custom_models.mdx ] - Translated to Portuguese the custom models tutorial.  by davialvb in 19779
* Add sentencepiece to BertJapaneseTokenizer  by conan1024hao in 19769
* Fix CTRL `test_torchscrip_xxx` CI by updating `_create_and_check_torchscript`  by ydshieh in 19786
* Fix nightly test setup  by sgugger in 19792
* Fix image segmentation pipeline errors, resolve backward compatibility issues  by alaradirik in 19768
* Fix error/typo in docstring of TokenClassificationPipeline  by pchr8 in 19798
* Use None to detect if truncation was unset  by sgugger in 19794
* Generate: contrastive search test updates  by gante in 19787
* Run some TF Whisper tests in subprocesses to avoid GPU OOM  by ydshieh in 19772
* Added translation of run_scripts.mdx to Portuguese Issue 16824  by davialvb in 19800
* Generate: minor docstring fix  by gante in 19801
* [Doctest] `MaskFormerConfig` doctest  by sha016 in 19817
* [Doctest] Add `configuration_plbart.py`  by ayaka14732 in 19809
* [Doctest] Add `configuration_poolformer.py`  by ayaka14732 in 19808
* [Doctest] Add `configuration_electra.py`  by ayaka14732 in 19807
* [Doctest] Add `configuration_nezha.py`  by ayaka14732 in 19810
* Display the number of trainable parameters when lauching a training  by regisss in 19835
* replace reference to Datasets in metrics deprecation with Evaluate  by angus-lherrou in 19812
* Fix OOM in Config doctest  by ydshieh in 19840
* fix broken links in testing.mdx  by XFFXFF in 19820
* fix image2test args forwarding  by kventinel in 19648
* Added translation of  converting_tensorflow_models.mdx to Portuguese Issue 16824  by davialvb in 19824
* Fix nightly CircleCI  by ydshieh in 19837
* fixed typo in fp16 training section for perf_train_gpu_one  by dsingal0 in 19736
* Update `LEDModelIntegrationTests` expected values  by ydshieh in 19841
* Improve check copies  by kventinel in 19829
* Fix doctest for `MarkupLM`  by ydshieh in 19845
* add small updates only  by stevhliu in 19847
* Refactor conversion function  by sgugger in 19799
* Spanish translation of multiple_choice.mdx, question_answering.mdx.  by alceballosa in 19821
* Fix doctest for `GenerationMixin.contrastive_search`  by ydshieh in 19863
* Add missing lang tokens in M2M100Tokenizer.get_vocab  by guillaumekln in 18416
* Added translation of serialization.mdx to Portuguese Issue 16824  by davialvb in 19869
* Generate: contrastive search cosmetic tweaks  by gante in 19871
* [Past CI] Vilt only supports PT >= v1.10  by LysandreJik in 19851
* Fix incorrect model<->tokenizer mapping in tokenization testing  by ydshieh in 19872
* Update doc for revision and token  by sgugger in 19793
* Factored out some code in the `image-segmentation` pipeline.  by Narsil in 19727
* [DOCTEST] Config doctest for `MCTCT`, `MBart` and `LayoutLM`  by Revanth2002 in 19889
* Fix LR  by regisss in 19875
* Correct README image text  by KayleeDavisGitHub in 19883
* No conv bn folding in ipex to avoid warning  by sanderland in 19870
* Add missing information on token_type_ids for roberta model  by raghavanone in 19766
* Change the import of kenlm from github to pypi  by raghavanone in 19770
* Update `max_diff` in `test_save_load_fast_init_to_base`  by ydshieh in 19849
* Allow flax subfolder  by patrickvonplaten in 19902
* `accelerate` support for `RoBERTa` family  by younesbelkada in 19906
* Add checkpoint links in a few config classes  by ydshieh in 19910
* Generate: contrastive search uses existing abstractions and conventions  by gante in 19896
* Convert None logits processor/stopping criteria to empty list.  by ccmaymay in 19880
* Some fixes regarding auto mappings and test class names  by ydshieh in 19923
* Fix bug in Wav2Vec2's GPU tests  by falcaopetri in 19803
* Fix warning when collating list of numpy arrays  by sgugger in 19846
* Add type hints to TFPegasusModel  by EdAbati in 19858
* Remove embarrassing debug print() in save_pretrained  by Rocketknight1 in 19922
* Add `accelerate` support for M2M100  by younesbelkada in 19912
* Add RoBERTa resources  by stevhliu in 19911
* Add T5 resources  by stevhliu in 19878
* Add BLOOM resources  by stevhliu in 19881
* Add GPT2 resources  by stevhliu in 19879
* Let inputs of fast tokenizers be tuples as well as lists  by sgugger in 19898
* Add `accelerate` support for BART-like models  by younesbelkada in 19927
* Create dummy models  by ydshieh in 19901
* Support segformer fx  by dwlim-nota in 19924
* Use self._trial  to generate trial_name for Trainer.  by reyoung in 19874
* Add Onnx Config for ImageGPT   by RaghavPrabhakar66 in 19868
* Update Code of Conduct to Contributor Covenant v2.1  by pankali in 19935
* add resources for bart  by stevhliu in 19928
* add resources for distilbert  by stevhliu in 19930
* Add wav2vec2 resources  by stevhliu in 19931
* [Conditional, Deformable DETR] Add postprocessing methods  by NielsRogge in 19709
* Fix ONNX tests for ONNX Runtime v1.13.1  by lewtun in 19950
* donut -> donut-swin  by ydshieh in 19920
* [Doctest] Add configuration_deberta.py  by Saad135 in 19968
* gradient checkpointing for GPT-NeoX  by chiaolun in 19946
* [modelcard] Update for ASR  by sanchit-gandhi in 19985
* [ASR] Update 'tasks' for model card  by sanchit-gandhi in 19986
* Tranformers documentation translation to Italian 17459  by draperkm in 19988
* Pin torch to < 1.13 temporarily  by ydshieh in 19989
* Add support for gradient checkpointing  by NielsRogge in 19990

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* arnaudstiegler
 * Make LayoutLM tokenizers independent from BertTokenizer (19351)
* asofiaoliveira
 * Make `XLMRoberta` model and config independent from `Roberta` (19359)
* srhrshr
 * Decouples `XLMProphet` model from `Prophet`  (19406)
* Davidy22
 * Make bert_japanese and cpm independent of their inherited modules (19431)
 * Clean up deprecation warnings (19654)
* mathieujouffroy
 * [CvT] Tensorflow implementation (18597)
 * using trunc_normal for weight init & cls_token (19486)
* IMvision12
 * New (19481)
 * Added type hints to `DebertaV2ForMultipleChoice` Pytorch (19536)
 * Update modeling_markuplm.py (19723)
 * Update modeling_layoutlmv3.py (19753)
* 501Good
 * Make `MobileBert` tokenizers independent from `Bert` (19531)
* mukesh663
 * Removed Bert interdependency from Funnel transformer (19655)
 * ]Fixed pegasus config doctest (19722)
 * [Doctest] Fixing doctest `configuration_pegasus_x.py` (19725)
* D3xter1922
 * Removed XLMModel inheritance from FlaubertModel(torch+tf) (19432)
* falcaopetri
 * Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode (18351)
 * Fix bug in Wav2Vec2's GPU tests (19803)
* gmftbyGMFTBY
 * Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py (19477)
* davialvb
 * [ custom_models.mdx ] - Translated to Portuguese the custom models tutorial. (19779)
 * Added translation of run_scripts.mdx to Portuguese Issue 16824 (19800)
 * Added translation of  converting_tensorflow_models.mdx to Portuguese Issue 16824 (19824)
 * Added translation of serialization.mdx to Portuguese Issue 16824 (19869)
* alceballosa
 * Spanish translation of multiple_choice.mdx, question_answering.mdx. (19821)

4.23.1

Fix a revert introduced by mistake making the `"automatic-speech-recognition"` for Whisper.

- Fix whisper for pipeline by ArthurZucker in 19482

4.23.0

Whisper

The Whisper model was proposed in [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

Whisper is an encoder-decoder Transformer trained on 680,000 hours of labeled (transcribed) audio. The model shows impressive performance and robustness in a zero-shot setting, in multiple languages.

* Add WhisperModel to transformers  by ArthurZucker in 19166
* Add TF whisper  by amyeroberts in 19378

Deformable DETR

The Deformable DETR model was proposed in [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.

Deformable DETR mitigates the slow convergence issues and limited feature spatial resolution of the original [DETR](https://huggingface.co/docs/transformers/model_doc/detr) by leveraging a new deformable attention module which only attends to a small set of key sampling points around a reference.

* Add Deformable DETR by NielsRogge in 17281
* [fix] Add DeformableDetrFeatureExtractor by NielsRogge in 19140

Conditional DETR

The Conditional DETR model was proposed in [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.

Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than [DETR](https://huggingface.co/docs/transformers/model_doc/detr).

* Add support for conditional detr  by DeppMeng in 18948
* Improve conditional detr docs  by NielsRogge in 19154

Time Series Transformer

The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

The model is trained in a similar way to how one would train an encoder-decoder Transformer (like T5 or BART) for machine translation; i.e. teacher forcing is used. At inference time, one can autoregressively generate samples, one time step at a time.

:warning: This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title).

* time series forecasting model  by kashif in 17965

Masked Siamese Networks

The ViTMSN model was proposed in [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas. 

MSN (masked siamese networks) consists of a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, the method yields excellent performance in the low-shot and extreme low-shot regimes for image classification, outperforming other self-supervised methods such as DINO. For instance, with 1% of ImageNet-1K labels, the method achieves 75.7% top-1 accuracy.

* MSN (Masked Siamese Networks) for ViT  by sayakpaul in 18815

MarkupLM

The MarkupLM model was proposed in [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.

MarkupLM is BERT, but applied to HTML pages instead of raw text documents. The model incorporates additional embedding layers to improve performance, similar to [LayoutLM](https://huggingface.co/docs/transformers/main/en/model_doc/layoutlm).

The model can be used for tasks like question answering on web pages or information extraction from web pages. It obtains state-of-the-art results on 2 important benchmarks: [WebSRC](https://x-lance.github.io/WebSRC/) and [SWDE](https://www.researchgate.net/publication/221299838_From_one_tree_to_a_forest_a_unified_solution_for_structured_web_data_extraction).

* Add MarkupLM by NielsRogge in 19198

Security & safety

We explore a new serialization format not using Pickle that we can then leverage in the three frameworks we support: PyTorch, TensorFlow, and JAX. We leverage the [safetensors](https://github.com/huggingface/safetensors) library for that.

Support is for PyTorch models only at this stage, and still experimental.

* Poc to use safetensors  by sgugger in 19175

Computer vision post-processing methods overhaul

The processors for computer vision have been overhauled to ensure they have consistent naming, input arguments and outputs.
:warning: The existing methods that are superseded by the introduced methods `post_process_object_detection`, `post_process_semantic_segmentation`, `post_process_instance_segmentation`, `post_process_panoptic_segmentation` are now deprecated.

* Improve DETR post-processing methods  by alaradirik in 19205
* Beit postprocessing  by alaradirik in 19099
* Fix BeitFeatureExtractor postprocessing  by alaradirik in 19119
* Add post_process_semantic_segmentation method to SegFormer  by alaradirik in 19072
* Add post_process_semantic_segmentation method to DPTFeatureExtractor  by alaradirik in 19107
* Add semantic segmentation post-processing method to MobileViT  by alaradirik in 19105
* Detr preprocessor fix  by alaradirik in 19007
* Improve and fix ImageSegmentationPipeline  by alaradirik in 19367
* Restructure DETR post-processing, return prediction scores  by alaradirik in 19262
* Maskformer post-processing fixes and improvements  by alaradirik in 19172
* Fix MaskFormer failing postprocess tests  by alaradirik in 19354
* Fix DETR segmentation postprocessing output  by alaradirik in 19363
* fix docs example, add object_detection to DETR docs  by alaradirik in 19377

🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly..

Breaking change for ViT parameter initialization

* 🚨🚨🚨 Fix ViT parameter initialization  by alaradirik in 19341

Breaking change for the `top_p` argument of the `TopPLogitsWarper` of the `generate` method.

* 🚨🚨🚨 Optimize Top P Sampler and fix edge case  by ekagra-ranjan in 18984

Model head additions

OPT and BLOOM now have question answering heads available.

* Add `OPTForQuestionAnswering`  by clementapa in 19402
* Add `BloomForQuestionAnswering`  by younesbelkada in 19310

Pipelines

There is now a zero-shot object detection pipeline.

* Add ZeroShotObjectDetectionPipeline  by sahamrit in 18445) 

TensorFlow architectures

The GroupViT model is now available in TensorFlow.

* [TensorFlow] Adding GroupViT  by ariG23498 in 18020

Bugfixes and improvements

* Fix a broken link for deepspeed ZeRO inference in the docs  by nijkah in 19001
* [doc] debug: fix import  by stas00 in 19042
* [bnb] Small improvements on utils  by younesbelkada in 18646
* Update image segmentation pipeline test  by amyeroberts in 18731
* Fix `test_save_load` for `TFViTMAEModelTest`  by ydshieh in 19040
* Pin minimum PyTorch version for BLOOM ONNX export  by lewtun in 19046
* Update serving signatures and make sure we actually use them  by Rocketknight1 in 19034
* Move cache: expand error message  by sgugger in 19051
* Fixing OPT fast tokenizer option.  by Narsil in 18753
* Fix custom tokenizers test  by sgugger in 19052
* Run `torchdynamo` tests  by ydshieh in 19056
* [fix] Add DeformableDetrFeatureExtractor  by NielsRogge in 19140
* fix arg name in BLOOM testing and remove unused arg document  by shijie-wu in 18843
* Adds package and requirement spec output to version check exception  by colindean in 18702
* fix `use_cache`  by younesbelkada in 19060
* FX support for ConvNext, Wav2Vec2 and ResNet  by michaelbenayoun in 19053
* [doc] Fix link in PreTrainedModel documentation  by tomaarsen in 19065
* Add FP32 cast in ConvNext LayerNorm to prevent rounding errors with FP16 input  by jimypbr in 18746
* Organize test jobs  by sgugger in 19058
* Automatically tag CLIP repos as zero-shot-image-classification  by osanseviero in 19064
* Fix `LeViT` checkpoint  by ydshieh in 19069
* TF: tests for (de)serializable models with resized tokens  by gante in 19013
* Add type hints for PyTorch UniSpeech, MPNet and Nystromformer  by daspartho in 19039
* replace logger.warn by logger.warning  by fxmarty in 19068
* Fix tokenizer load from one file  by sgugger in 19073
* Note about developer mode  by LysandreJik in 19075
* german autoclass  by flozi00 in 19049
* Add tests for legacy load by url and fix bugs  by sgugger in 19078
* Add runner availability check  by ydshieh in 19054
* fix working dir  by ydshieh in 19101
* Added type hints for TFConvBertModel  by kishore-s-15 in 19088
* Added Type hints for VIT MAE  by kishore-s-15 in 19085
* Add type hints for TF MPNet models  by kishore-s-15 in 19089
* Added type hints to ResNetForImageClassification  by kishore-s-15 in 19084
* added type hints  by daspartho in 19076
* Improve vision models docs  by NielsRogge in 19103
* correct spelling in README  by flozi00 in 19092
* Don't warn of move if cache is empty  by sgugger in 19109
* HPO: keep the original logic if there's only one process, pass the trial to trainer  by sywangyi in 19096
* Add documentation of Trainer.create_model_card  by sgugger in 19110
* Added type hints for YolosForObjectDetection  by kishore-s-15 in 19086
* Fix the wrong schedule  by ydshieh in 19117
* Change document question answering pipeline to always return an array  by ankrgyl in 19071
* german processing  by flozi00 in 19121
* Fix: update ltp word segmentation call in mlm_wwm  by xyh1756 in 19047
* Add a missing space in a script arg documentation  by bryant1410 in 19113
* Skip `test_export_to_onnx` for `LongT5` if `torch` < 1.11  by ydshieh in 19122
* Fix GLUE MNLI when using `max_eval_samples`  by lvwerra in 18722
* [BugFix] Fix fsdp option on shard_grad_op.  by ZHUI in 19131
* Fix FlaxPretTrainedModel pt weights check  by mishig25 in 19133
* suppoer deps from github  by lhoestq in 19141
* Fix dummy creation for multi-frameworks objects  by sgugger in 19144
* Allowing users to use the latest `tokenizers` release !  by Narsil in 19139
* Add some tests for check_dummies  by sgugger in 19146
* Fixed typo in generation_utils.py  by nbalepur in 19145
* Add `accelerate` support for ViLT  by younesbelkada in 18683
* TF: check embeddings range  by gante in 19102
* Reduce LR for TF MLM example test  by Rocketknight1 in 19156
* update perf_train_cpu_many doc  by sywangyi in 19151
* fix: ckpt paths.  by sayakpaul in 19159
* Fix TrainingArguments documentation  by sgugger in 19162
* fix HPO DDP GPU problem  by sywangyi in 19168
* [WIP] Trainer supporting evaluation on multiple datasets  by timbmg in 19158
* Add doctests to Perceiver examples  by stevenmanton in 19129
* Add offline runners info in the Slack report  by ydshieh in 19169
* Fix incorrect comments about atten mask for pytorch backend  by lygztq in 18728
* Fixed type hint for pipelines/check_task  by Fei-Wang in 19150
* Update run_clip.py  by enze5088 in 19130
* german training, accelerate and model sharing  by flozi00 in 19171
* Separate Push CI images from Scheduled CI  by ydshieh in 19170
* Remove pos arg from Perceiver's Pre/Postprocessors  by aielawady in 18602
* Use `assertAlmostEqual` in `BloomEmbeddingTest.test_logits`  by ydshieh in 19200
* Move the model type check  by ankrgyl in 19027
* Use repo_type instead of deprecated datasets repo IDs  by sgugger in 19202
* Updated hf_argparser.py  by IMvision12 in 19188
* Add warning for torchaudio <= 0.10 in MCTCTFeatureExtractor  by ydshieh in 19203
* Fix cached_file in offline mode for cached non-existing files  by sgugger in 19206
* Remove unused `cur_len` in generation_utils.py  by ekagra-ranjan in 18874
* add wav2vec2_alignment  by arijitx in 16782
* add doc for hyperparameter search  by sywangyi in 19192
* Add a use_parallel_residual argument to control the residual computing way  by NinedayWang in 18695
* translated add_new_pipeline  by nickprock in 19215
* More tests for regression in cached non existence  by sgugger in 19216
* Use `math.pi` instead of `torch.pi` in `MaskFormer`  by ydshieh in 19201
* Added tests for yaml and json parser  by IMvision12 in 19219
* Fix small use_cache typo in the docs  by ankrgyl in 19191
* Generate: add warning when left padding should be used  by gante in 19067
* Fix deprecation warning for return_all_scores  by ogabrielluiz in 19217
* Fix doctest for `TFDeiTForImageClassification`  by ydshieh in 19173
* Document and validate typical_p in generation  by mapmeld in 19128
* Fix trainer seq2seq qa.py evaluate log and ft script  by iamtatsuki05 in 19208
* Fix cache names in CircleCI jobs  by ydshieh in 19223
* Move AutoClasses under Main Classes  by stevhliu in 19163
* Focus doc around preprocessing classes  by stevhliu in 18768
* Fix confusing working directory in Push CI  by ydshieh in 19234
* XGLM - Fix Softmax NaNs when using FP16  by gsarti in 18057
* Add a getattr method, which replaces _module_getattr in torch.fx.Tracer from PyTorch 1.13+  by michaelbenayoun in 19233
* Fix `m2m_100.mdx` doc example missing `labels`  by Mustapha-AJEGHRIR in 19149
* Fix opt softmax small nit  by younesbelkada in 19243
* Use `hf_raise_for_status` instead of deprecated `_raise_for_status`  by Wauplin in 19244
* Fix TrainingArgs argument serialization  by atturaioe in 19239
* Fix test fetching for examples  by sgugger in 19237
* Cast TF generate() inputs  by Rocketknight1 in 19232
* Skip pipeline tests  by sgugger in 19248
* Add job names in Past CI artifacts  by ydshieh in 19235
* Update Past CI report script  by ydshieh in 19228
* [Wav2Vec2] Fix None loss in doc examples  by rbsteinm in 19218
* Catch `HFValidationError` in `TrainingSummary`  by ydshieh in 19252
* Add expected output to the sample code for `ViTMSNForImageClassification`  by sayakpaul in 19183
* Add stop sequence to text generation pipeline  by KMFODA in 18444
* Add notebooks  by JingyaHuang in 19259
* Add `beautifulsoup4` to the dependency list  by ydshieh in 19253
* Fix Encoder-Decoder testing issue about repo. names  by ydshieh in 19250
* Fix cached lookup filepath on windows for hub  by kjerk in 19178
* Docs - Guide to add a new TensorFlow model  by gante in 19256
* Update no_trainer script for summarization  by divyanshugit in 19277
* Don't automatically add bug label  by sgugger in 19302
* Breakup export guide  by stevhliu in 19271
* Update Protobuf dependency version to fix known vulnerability  by qthequartermasterman in 19247
* Update README.md  by ShubhamJagtap2000 in 19309
* [Docs] Fix link  by patrickvonplaten in 19313
* Fix for sequence regression fit() in TF  by Rocketknight1 in 19316
* Added Type hints for LED TF  by IMvision12 in 19315
* Added type hints for TF: rag model  by debjit-bw in 19284
* alter retrived to retrieved  by gouqi666 in 18863
* ci(stale.yml): upgrade actions/setup-python to v4  by oscard0m in 19281
* ci(workflows): update actions/checkout to v3  by oscard0m in 19280
* wrap forward passes with torch.no_grad()  by daspartho in 19279
* wrap forward passes with torch.no_grad()  by daspartho in 19278
* wrap forward passes with torch.no_grad()  by daspartho in 19274
* wrap forward passes with torch.no_grad()  by daspartho in 19273
* Removing BertConfig inheritance from LayoutLMConfig  by arnaudstiegler in 19307
* docker-build: Update actions/checkout to v3  by Sushrut1101 in 19288
* Clamping hidden state values to allow FP16  by SSamDav in 19229
* Remove interdependency from OpenAI tokenizer  by E-Aho in 19327
* removing XLMConfig inheritance from FlaubertConfig  by D3xter1922 in 19326 
* Removed interdependency of BERT's Tokenizer in tokenization of prophetnet  by divyanshugit in 19331
* Remove bert interdependency from clip tokenizer  by shyamsn97 in 19332
* [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer  by D3xter1922 in 19330
* Making camembert independent from roberta, clean  by Mustapha-AJEGHRIR in 19337
* Add sudachi and jumanpp tokenizers for bert_japanese  by r-terada in 19043
* Frees LongformerTokenizer of the Roberta dependency  by srhrshr in 19346
* Change `BloomConfig` docstring  by younesbelkada in 19336
* Test failing test while we resolve the issue.  by sgugger in 19355
* Call _set_save_spec() when creating TF models  by Rocketknight1 in 19321
* correct typos in README  by paulaxisabel in 19304
* Removes Roberta and Bert config dependencies from Longformer  by srhrshr in 19343
* Fix gather for metrics  by muellerzr in 19360
* Fix pipeline tests for Roberta-like tokenizers  by sgugger in 19365
* Change link of repojacking vulnerable link  by Ilaygoldman in 19393
* Making `ConvBert Tokenizer` independent from `bert Tokenizer`  by IMvision12 in 19347
* Fix gather for metrics  by muellerzr in 19389
* Added Type hints for XLM TF  by IMvision12 in 19333
* add ONNX support for swin transformer  by bibhabasumohapatra in 19390
* removes prophet config dependencies from xlm-prophet  by srhrshr in 19400
* Added type hints for TF: TransfoXL  by thliang01 in 19380
* HF <-> megatron checkpoint reshaping and conversion for GPT  by pacman100 in 19317
* Remove unneded words from audio-related feature extractors  by osanseviero in 19405
* edit: cast attention_mask to long in DataCollatorCTCWithPadding  by ddobokki in 19369
* Copy BertTokenizer dependency into retribert tokenizer  by Davidy22 in 19371
* Export TensorFlow models to ONNX with dynamic input shapes  by dwyatte in 19255
* update attention mask handling  by ArthurZucker in 19385
* Remove dependency of Bert from Squeezebert tokenizer  by rchan26 in 19403
* Removed Bert and XML Dependency from Herbert  by harry7337 in 19410
* Clip device map  by patrickvonplaten in 19409
* Remove Dependency between Bart and LED (slow/fast)  by Infrared1029 in 19408
* Removed `Bert` interdependency in `tokenization_electra.py`  by OtherHorizon in 19356
* Make `Camembert` TF version independent from `Roberta`  by Mustapha-AJEGHRIR in 19364
* Removed Bert dependency from BertGeneration code base.  by Threepointone4 in 19370
* Rework pipeline tests  by sgugger in 19366
* Fix `ViTMSNForImageClassification` doctest  by ydshieh in 19275
* Skip `BloomEmbeddingTest.test_embeddings` for PyTorch < 1.10  by ydshieh in 19261
* remove RobertaConfig inheritance from MarkupLMConfig  by D3xter1922 in 19404
* Backtick fixed (paragraph 68)  by kant in 19440
* Fixed duplicated line (paragraph 83) Documentation: sgugger  by kant in 19436
* fix marianMT convertion to onnx  by kventinel in 19287
* Fix typo in image-classification/README.md  by zhawe01 in 19424
* Stop relying on huggingface_hub's private methods  by LysandreJik in 19392
* Add onnx support for VisionEncoderDecoder  by mht-sharma in 19254
* Remove dependency of Roberta in Blenderbot  by rchan26 in 19411
* fix: renamed variable name  by ariG23498 in 18850
* Fix the error message in run_t5_mlm_flax.py  by yangky11 in 19282
* Add Italian translation for `add_new_model.mdx`  by Steboss89 in 18713
* Fix momentum and epsilon values  by amyeroberts in 19454
* Generate: corrected exponential_decay_length_penalty type hint  by ShivangMishra in 19376
* Fix misspelled word in docstring  by Bearnardd in 19415
* Fixed a non-working hyperlink in the README.md file  by MikailINTech in 19434
* fix  by ydshieh in 19469
* wrap forward passes with torch.no_grad()  by daspartho in 19439
* wrap forward passes with torch.no_grad()  by daspartho in 19438
* wrap forward passes with torch.no_grad()  by daspartho in 19416
* wrap forward passes with torch.no_grad()  by daspartho in 19414
* wrap forward passes with torch.no_grad()  by daspartho in 19413
* wrap forward passes with torch.no_grad()  by daspartho in 19412

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* flozi00
 * german autoclass (19049)
 * correct spelling in README (19092)
 * german processing (19121)
 * german training, accelerate and model sharing (19171)
* DeppMeng
 * Add support for conditional detr (18948)
* sayakpaul
 * MSN (Masked Siamese Networks) for ViT (18815)
 * fix: ckpt paths. (19159)
 * Add expected output to the sample code for `ViTMSNForImageClassification` (19183)
* IMvision12
 * Updated hf_argparser.py (19188)
 * Added tests for yaml and json parser (19219)
 * Added Type hints for LED TF (19315)
 * Making `ConvBert Tokenizer` independent from `bert Tokenizer` (19347)
 * Added Type hints for XLM TF (19333)
* ariG23498
 * [TensorFlow] Adding GroupViT (18020)
 * fix: renamed variable name (18850)
* Mustapha-AJEGHRIR
 * Fix `m2m_100.mdx` doc example missing `labels` (19149)
 * Making camembert independent from roberta, clean (19337)
 * Make `Camembert` TF version independent from `Roberta` (19364)
* D3xter1922
 * removing XLMConfig inheritance from FlaubertConfig (19326)
 * [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer (19330)
 * remove RobertaConfig inheritance from MarkupLMConfig (19404)
* srhrshr
 * Frees LongformerTokenizer of the Roberta dependency (19346)
 * Removes Roberta and Bert config dependencies from Longformer (19343)
 * removes prophet config dependencies from xlm-prophet (19400)
* sahamrit
 * [WIP] Add ZeroShotObjectDetectionPipeline (18445) (18930)
* Davidy22
 * Copy BertTokenizer dependency into retribert tokenizer (19371)
* rchan26
 * Remove dependency of Bert from Squeezebert tokenizer (19403)
 * Remove dependency of Roberta in Blenderbot (19411)
* harry7337
 * Removed Bert and XML Dependency from Herbert (19410)
* Infrared1029
 * Remove Dependency between Bart and LED (slow/fast) (19408)
* Steboss89
 * Add Italian translation for `add_new_model.mdx` (18713)

4.22.2

Fixes a bug where a cached tokenizer/model was not accessible anymore offline (either forcing offline mode or because of an internet issue).

- More tests for regression in cached non existence by sgugger in 19216 
- Fix cached_file in offline mode for cached non-existing files by sgugger in 19206
- Don't warn of move if cache is empty by sgugger in 19109

4.22.1

Patch release for the following PRs:

- [Add tests for legacy load by url and fix bugs (](https://github.com/huggingface/transformers/commit/654c584f388ac160db83071d751e9dead4887d82)[#19078](https://github.com/huggingface/transformers/pull/19078)[)](https://github.com/huggingface/transformers/commit/654c584f388ac160db83071d751e9dead4887d82)
- [Note about developer mode (](https://github.com/huggingface/transformers/commit/6d034d58c583dcf4299c8a34f949ace046ac0208)[#19075](https://github.com/huggingface/transformers/pull/19075)[)](https://github.com/huggingface/transformers/commit/6d034d58c583dcf4299c8a34f949ace046ac0208)
- [Fix tokenizer load from one file (](https://github.com/huggingface/transformers/commit/af20bbb3188a6ffeaa126fa5118c9cabb529c26a)[#19073](https://github.com/huggingface/transformers/pull/19073)[)](https://github.com/huggingface/transformers/commit/af20bbb3188a6ffeaa126fa5118c9cabb529c26a)
- [Fixing OPT fast tokenizer option. (](https://github.com/huggingface/transformers/commit/1504b5311a3ee62bd820ac31b4ec2feffb2845f3)[#18753](https://github.com/huggingface/transformers/pull/18753)[)](https://github.com/huggingface/transformers/commit/1504b5311a3ee62bd820ac31b4ec2feffb2845f3)
- [Move cache: expand error message (](https://github.com/huggingface/transformers/commit/defd039bae9f44f6c7a847ed8f5d3609f6667540)[#19051](https://github.com/huggingface/transformers/pull/19051)[)](https://github.com/huggingface/transformers/commit/defd039bae9f44f6c7a847ed8f5d3609f6667540)

4.22.0

Swin Transformer v2

The Swin Transformer V2 model was proposed in [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.

Swin Transformer v2 improves the original [Swin Transformer](https://huggingface.co/docs/transformers/main/en/model_doc/swin) using 3 main techniques: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) a log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

* Add swin transformer v2 by nandwalritik in 17469

VideoMAE

The VideoMAE model was proposed in [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. VideoMAE extends masked auto encoders ([MAE](https://huggingface.co/docs/transformers/main/en/model_doc/vit_mae)) to video, claiming state-of-the-art performance on several video classification benchmarks.

VideoMAE is an extension of [ViTMAE](https://huggingface.co/docs/transformers/main/en/model_doc/vit_mae) for video.

* Add VideoMAE by NielsRogge in 17821

Donut

The Donut model was proposed in [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. Donut consists of an image Transformer encoder and an autoregressive text Transformer decoder to perform document understanding tasks such as document image classification, form understanding and visual question answering.

* Add Donut by NielsRogge in 18488

Pegasus-X

The PEGASUS-X model was proposed in [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao and Peter J. Liu.

PEGASUS-X (PEGASUS eXtended) extends the PEGASUS models for long input summarization through additional long input pretraining and using staggered block-local attention with global tokens in the encoder.

* PEGASUS-X by zphang in 18551

X-CLIP

The X-CLIP model was proposed in [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. X-CLIP is a minimal extension of [CLIP](https://huggingface.co/docs/transformers/main/en/model_doc/clip) for video. The model consists of a text encoder, a cross-frame vision encoder, a multi-frame integration Transformer, and a video-specific prompt generator.

X-CLIP is a minimal extension of CLIP for video-language understanding.

* Add X-CLIP by NielsRogge in 18852

ERNIE

ERNIE is a series of powerful models proposed by baidu, especially in Chinese tasks, including [ERNIE1.0](https://arxiv.org/abs/1904.09223), [ERNIE2.0](https://ojs.aaai.org/index.php/AAAI/article/view/6428), [ERNIE3.0](https://arxiv.org/abs/2107.02137), [ERNIE-Gram](https://arxiv.org/abs/2010.12148), [ERNIE-health](https://arxiv.org/abs/2110.07244), etc.
These models are contributed by nghuyong and the official code can be found in PaddleNLP (in PaddlePaddle).

* ERNIE-2.0 and ERNIE-3.0 models by nghuyong in 18686

TensorFlow models

MobileViT and LayoutLMv3 are now available in TensorFlow.

* TensorFlow MobileViT by sayakpaul in 18555
* [LayoutLMv3] Add TensorFlow implementation by ChrisFugl in 18678

New task-specific architectures

A new question answering head was added for the LayoutLM model.

* Add LayoutLMForQuestionAnswering model by ankrgyl in 18407

New pipelines

Two new pipelines are available in `transformers`: a document question answering pipeline, as well as an image to text generation pipeline.

* Add DocumentQuestionAnswering pipeline by ankrgyl in 18414
* Add Image To Text Generation pipeline by OlivierDehaene in 18821

M1 support

There is now Mac M1 support in PyTorch in `transformers` in pipelines and the Trainer.

* `pipeline` support for `device="mps"` (or any other string) by julien-c in 18494
* mac m1 `mps` integration by pacman100 in 18598

Backend version compatibility

Starting from version v4.22.0, we'll now officially support PyTorch and TensorFlow versions that were released up to two years ago. 
Versions older than two years-old will not be supported going forward.

We're making this change as we begin actively testing transformers compatibility on older versions.
This project can be followed [here](https://github.com/huggingface/transformers/issues/18817).

* PyTorch >= 1.7.0 and TensorFlow >= 2.4.0 by sgugger in 19016

Generate method updates

The `generate` method now starts enforcing stronger validation in order to ensure proper usage.

* Generate: validate `model_kwargs` (and catch typos in generate arguments) by gante in 18261
* Generate: validate `model_kwargs` on TF (and catch typos in generate arguments) by gante in 18651
* Generate: add model class validation by gante in 18902

API changes

The `as_target_tokenizer` and `as_target_processor` context managers have been deprecated. The new API is to use the call method of the tokenizer/processor with keyword arguments. For instance:
py
with tokenizer.as_target_tokenizer():
 encoded_labels = tokenizer(labels, padding=True)

becomes
py
encoded_labels = tokenizer(text_target=labels, padding=True)


* Replace `as_target` context managers by direct calls by sgugger in 18325

Bits and bytes integration

Bits and bytes is now integrated within transformers. This feature can reduce the size of large models by up to 2, with low loss in precision.

* Supporting seq2seq models for `bitsandbytes` integration by younesbelkada in 18579
* `bitsandbytes` - `Linear8bitLt` integration into `transformers` models by younesbelkada in 17901

Large model support

Models that have sharded checkpoints in PyTorch can be loaded in Flax.

* Load sharded pt to flax by ArthurZucker in 18419

TensorFlow improvements

The TensorFlow examples have been rewritten to support all recent features developped in the past months.

* TF Examples Rewrite by Rocketknight1 in 18451

DeBERTa-v2 is now trainable with XLA.

* TF: XLA-trainable DeBERTa v2 by gante in 18546

Documentation changes

* Split model list on modality by stevhliu in 18328

Improvements and bugfixes

* sentencepiece shouldn't be required for the fast LayoutXLM tokenizer by LysandreJik in 18320
* Fix sacremoses sof dependency for Transformers XL by sgugger in 18321
* Owlvit test fixes by alaradirik in 18303
* [Flax] Fix incomplete batches in example scripts by sanchit-gandhi in 17863
* start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch … by sywangyi in 18229
* Update feature extractor docs by stevhliu in 18324
* fixed typo by banda-larga in 18331
* updated translation by banda-larga in 18333
* Updated _toctree.yml by nickprock in 18337
* Update automatic_speech_recognition.py by bofenghuang in 18339
* Fix codeparrot deduplication - ignore whitespaces by loubnabnl in 18023
* Remove Flax OPT from doctest for now by ydshieh in 18338
* Include tensorflow-aarch64 as a candidate by ankrgyl in 18345
* [BLOOM] Deprecate `position_ids` by thomasw21 in 18342
* Migrate metric to Evaluate library for tensorflow examples  by VijayKalmath in 18327
* Migrate metrics used in flax examples to Evaluate by VijayKalmath in 18348
* [Docs] Fix Speech Encoder Decoder doc sample by sanchit-gandhi in 18346
* Fix OwlViT torchscript tests by ydshieh in 18347
* Fix some doctests by ydshieh in 18359
* [FX] Symbolic trace for Bloom by michaelbenayoun in 18356
* Fix TFSegformerForSemanticSegmentation doctest by ydshieh in 18362
* fix FSDP ShardedGradScaler by pacman100 in 18358
* Migrate metric to Evaluate in Pytorch examples by atturaioe in 18369
* Correct the spelling of bleu metric by ToluClassics in 18375
* Remove pt-like calls on tf tensor by amyeroberts in 18393
* Fix from_pretrained kwargs passing by YouJiacheng in 18387
* Add a check regarding the number of occurrences of  by ydshieh in 18389
* Add evaluate to test dependencies by sgugger in 18396
* Fix OPT doc tests by ArthurZucker in 18365
* Fix doc tests by NielsRogge in 18397
* Add balanced strategies for device_map in from_pretrained by sgugger in 18349
* Fix docs by NielsRogge in 18399
* Adding fine-tuning models to LUKE by ikuyamada in 18353
* Fix ROUGE add example check and update README by sgugger in 18398
* Add Flax BART pretraining script by duongna21 in 18297
* Rewrite push_to_hub to use upload_files by sgugger in 18366
* Layoutlmv2 tesseractconfig by kelvinAI in 17733
* fix: create a copy for tokenizer object by YBooks in 18408
* Fix uninitialized parameter in conformer relative attention. by PiotrDabkowski in 18368
* Fix the hub user name in a longformer doctest checkpoint by ydshieh in 18418
* Change audio kwarg to images in TROCR processor by ydshieh in 18421
* update maskformer docs by alaradirik in 18423
* Fix `test_load_default_pipelines_tf` test error by ydshieh in 18422
* fix run_clip README by ydshieh in 18332
* Improve `generate` docstring by JoaoLages in 18198
* Accept `trust_remote_code` and ignore it in `PreTrainedModel.from_pretrained` by ydshieh in 18428
* Update pipeline word heuristic to work with whitespace in token offsets by davidbenton in 18402
* Add programming languages by cakiki in 18434
* fixing error when using sharded ddp by pacman100 in 18435
* Update _toctree.yml by stevhliu in 18440
* support ONNX export of XDropout in deberta{,_v2} and sew_d by garymm in 17502
* Add Spanish translation of run_scripts.mdx by donelianc in 18415
* Update no trainer scripts for language modeling and image classification examples by nandwalritik in 18443
* Update pinned hhub version by osanseviero in 18448
* Fix failing tests for XLA generation in TF by dsuess in 18298
* add zero-shot obj detection notebook to docs by alaradirik in 18453
* fix: keras fit tests for segformer tf and minor refactors. by sayakpaul in 18412
* Fix torch version comparisons by LSinev in 18460
* [BLOOM] Clean modeling code by thomasw21 in 18344
* change shape to support dynamic batch input in tf.function XLA generate for tf serving by nlpcat in 18372
* HFTracer.trace can now take callables and torch.nn.Module by michaelbenayoun in 18457
* Update no trainer scripts for multiple-choice by kiansierra in 18468
* Fix load of model checkpoints in the Trainer by sgugger in 18470
* Add FX support for torch.baddbmm andd torch.Tensor.baddbmm by thomasw21 in 18363
* Add machine type in the artifact of Examples directory job by ydshieh in 18459
* Update no trainer examples for QA and Semantic Segmentation by kiansierra in 18474
* Add `TF_MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING` by ydshieh in 18469
* Fixing issue where generic model types wouldn't load properly with the pipeline by Narsil in 18392
* Fix TFSwinSelfAttention to have relative position index as non-trainable weight by harrydrippin in 18226
* Refactor `TFSwinLayer` to increase serving compatibility by harrydrippin in 18352
* Add TF prefix to TF-Res test class by ydshieh in 18481
* Remove py.typed by sgugger in 18485
* Fix pipeline tests by sgugger in 18487
* Use new huggingface_hub tools for download models by sgugger in 18438
* Fix `test_dbmdz_english` by updating expected values by ydshieh in 18482
* Move cache folder to huggingface/hub for consistency with hf_hub by sgugger in 18492
* Update some expected values in `quicktour.mdx` for `resampy 0.3.0` by ydshieh in 18484
* disable Onnx test for google/long-t5-tglobal-base by ydshieh in 18454
* Typo reported by Joel Grus on TWTR by julien-c in 18493
* Just re-reading the whole doc every couple of months 😬 by julien-c in 18489
* `transformers-cli login` => `huggingface-cli login` by julien-c in 18490
* Add seed setting to image classification example by regisss in 18519
* [DX fix] Fixing QA pipeline streaming a dataset. by Narsil in 18516
* Clean up hub by sgugger in 18497
* update fsdp docs by pacman100 in 18521
* Fix compatibility with 1.12 by sgugger in 17925
* Specify en in doc-builder README example by ankrgyl in 18526
* New cache fixes: add safeguard before looking in folders by sgugger in 18522
* unpin resampy by ydshieh in 18527
* ✨ update to use interlibrary links instead of Markdown by stevhliu in 18500
* Add example of multimodal usage to pipeline tutorial by stevhliu in 18498
* [VideoMAE] Add model to doc tests by NielsRogge in 18523
* Update perf_train_gpu_one.mdx by mishig25 in 18532
* Update no_trainer.py scripts to include accelerate gradient accumulation wrapper by Rasmusafj in 18473
* Add Spanish translation of converting_tensorflow_models.mdx by donelianc in 18512
* Spanish translation of summarization.mdx by AguilaCudicio in 15947) 
* Let's not cast them all by younesbelkada in 18471
* fix: data2vec-vision Onnx ready-made configuration. by NikeNano in 18427
* Add mt5 onnx config by ChainYo in 18394
* Minor update of `run_call_with_unpacked_inputs` by ydshieh in 18541
* BART - Fix attention mask device issue on copied models by younesbelkada in 18540
* Adding a new `align_to_words` param to qa pipeline. by Narsil in 18010
* 📝 update metric with evaluate by stevhliu in 18535
* Restore _init_weights value in no_init_weights by YouJiacheng in 18504
* 📝 update documentation build section by stevhliu in 18548
* Preserve hub-related kwargs in AutoModel.from_pretrained by sgugger in 18545
* Use commit hash to look in cache instead of calling head by sgugger in 18534
* Update philosophy to include other preprocessing classes by stevhliu in 18550
* Properly move cache when it is not in default path by sgugger in 18563
* Adds CLIP to models exportable with ONNX by unography in 18515
* raise atol for MT5OnnxConfig by ydshieh in 18560
* fix string by mrwyattii in 18568
* Segformer TF: fix output size in documentation by joihn in 18572
* Fix resizing bug in OWL-ViT by alaradirik in 18573
* Fix LayoutLMv3 documentation by pocca2048 in 17932
* Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training by donebydan in 18486
* german docs translation by flozi00 in 18544
* Deberta V2: Fix critical trace warnings to allow ONNX export by iiLaurens in 18272
* [FX] _generate_dummy_input supports audio-classifi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants