* Add flag to run inference with partial dataset (huggingface#1420)
* Add peft generation example (huggingface#1427)
* Upgrade to SynapseAI 1.18.0 (huggingface#1418)
* Simplify HQT config files (huggingface#1219)
* unify_measurements.py script support to unify PCQ 70B 8x (huggingface#1322)
* Add misc. training args (huggingface#1346)
* Add quantization config for low bs case (huggingface#1377)
* Remove HQT from OHF (huggingface#1257)
Co-authored-by: Adam Stachowicz <astachowicz@habana.ai>
Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
Co-authored-by: Yeonsil Yoon <yyoon@habana.ai>
* Load INC GPTQ checkpoint & rename params (huggingface#1364)
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com>
Co-authored-by: Yeonsil Yoon <yyoon@habana.ai>
* Enable FusedSDPA fp8 in Llama FT (huggingface#1388)
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com>
* Valid sequence length for sdpa (huggingface#1183)
Co-authored-by: Harish <hsubramony@habana.ai>
Co-authored-by: Libin Tang <litang@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Multiple fixes (dynamo graph break, qwen-moe, multicard) (huggingface#1410)
* datasets downgrade version to 2.21.0 (huggingface#1413)
* Update ci sentence_transformer.sh (huggingface#1424)
* Fix load INC load weights compile error due to Transformer 4.45 upgrade. (huggingface#1421)
* Update language-modeling README.md, add trust_remote_code for flan-t5-xl (huggingface#1422)
* Update unify_measurements.py support info (huggingface#1425)
* GPT2 torch.compile fix (huggingface#1434)
* Added missing allocate_kv_cache() call in CausalLM class (huggingface#1431)
* Fix merge error and update text-to-speech readme (huggingface#1436)
* Fix OOM error for code llama (huggingface#1437)
* Fix error on 4bit checkpoint load with run_lm_eval on TF4.45.2 (huggingface#1439)
* Fix scoped linear all-reduce for starcoder model (huggingface#1432)
* Fixed recursion error in SentenceTransformer (huggingface#1428)
* Fix Llama 3.1 generation (huggingface#1444)
* Update text-gen README.md to add auto-gptq fork install steps (huggingface#1442)
* Added gemma specific fp8 quantization file (huggingface#1445)
* Remove cache folder from image data folder (huggingface#1446)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Bump dev version
* Enable DeepSpeed for image-to-text example (huggingface#1455)
* Fix bug when loading 4bit checkpoint quantized in INC (huggingface#1447)
* Fixes 'Tokenizer does not have padding token' introduced by huggingface#1444 for Llama3.1 (huggingface#1457)
* Fix facebook/hf-seamless-m4t-medium crash (huggingface#1433)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Fix bias update in scoped all reduce (huggingface#1456)
* Added skip for unsuported tests for mistral/mixtral (huggingface#1462)
* Update sentence transformer to v3.2.1 (huggingface#1470)
* Optimized inference of Cohere model on HPU (huggingface#1329)
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
* Idefics2 (huggingface#1270)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Remove deprecated Mixed precision flags (huggingface#1471)
Change-Id: I1c2e2460dc2072ba7b311f239441b304694918c8
* Optimized inference of XGLM model on HPU (huggingface#1323)
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
* Add mllama support (huggingface#1419)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Enable flash attention for gemma (huggingface#1454)
* Readme: replace tabs with spaces (huggingface#1485)
* Move fast tests to Gaudi2 (huggingface#1498)
* Support loading 4 bit Qwen2 (huggingface#1476)
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
* Add textual inversion XL for Gaudi (huggingface#868)
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
* Remove torch req from LM example (huggingface#1491)
* Remove keep_input_mutations (huggingface#1492)
* Fix trust_remote_code (huggingface#1493)
* Upgrade ViT README with torch.compile (huggingface#1494)
* Tests for text gen output text (huggingface#1411)
* Corrected Throughput measure for GaudiDDPMPipeline (huggingface#1460)
* Fix text generation test
* Add G3 in T5-L README (huggingface#1523)
* Fix tuple object error (huggingface#1354)
* Add warmup time and compile time log for the eval/prediction. (huggingface#1489)
* Fix style
* Enable `paligemma` model for image-to-text example (huggingface#1407)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Add support for MLPERF optimized pipeline from example (huggingface#1465)
Co-authored-by: sushil dubey <sdubey@habana.ai>
* Enable Gemma2 Inference on Gaudi (huggingface#1504)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: billishyahao <yahao.he@intel.com>
Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com>
Co-authored-by: Yeonsil Yoon <yyoon@habana.ai>
Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: xinhe <xin3.he@intel.com>
Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Soila Kavulya <soila.p.kavulya@intel.com>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
Co-authored-by: ZhengHongming888 <hongming.zheng@intel.com>
Co-authored-by: XinyuYe-Intel <xinyu.ye@intel.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com>
Co-authored-by: Miroslav Goncharenko <miroslav.goncharenko@intel.com>
Co-authored-by: Wang, Mengni <mengni.wang@intel.com>
Co-authored-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
Co-authored-by: Vidya Galli <vidya.s.galli@intel.com>
Co-authored-by: deepak-gowda-narayana <140652370+deepak-gowda-narayana@users.noreply.github.com>
* Add check_neural_compressor_min_version for 4 bit behavior (huggingface#1500)
Signed-off-by: Xin <xin3.he@intel.com>
Signed-off-by: xinhe3 <xinhe3@habana.ai>
Co-authored-by: xinhe3 <xinhe3@habana.ai>
* Fixed Gemma FP8 flash_attention lower throughput issue (huggingface#1510)
* Pass "lazy_mode" arg to GaudiLlamaModel GaudiTrainer (huggingface#1515)
Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>
* Removed workaround for NaN bug causing graph break. (huggingface#1516)
Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>
* Disable default sdpa in Albert (#22) (huggingface#1517)
Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
* Implement fused sdpa for wav2vec2 (#18) (huggingface#1520)
* Memory optimization for gpt_bitcode (#4) (huggingface#1513)
Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
* text_generation: improve parameters check (huggingface#1527)
* transformers: fixed some typos (huggingface#1528)
* Update DeepSpeed CI baselines
* Update FSDP CI baseline
* Optimum-Habana docs re-org (huggingface#1488)
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Greg Serochi <greg.serochi@intel.com>
Co-authored-by: Kiangpeng Lau <kiangpeng.lau@intel.com>
Co-authored-by: Seethong Vang <seethong.vang@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Anastasia Uvarova <anastasia.uvarova@intel.com>
Co-authored-by: Mohit Deopujari <mohit.deopujari@intel.com>
Co-authored-by: Chen Levkovich <chen.levkovich@intel.com>
Co-authored-by: Libin Tang <libin.tang@intel.com>
* Makes the with_stack of the profiler changeable (huggingface#1497)
* FLUX with diffusers 0.31.0 (huggingface#1450)
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Baochen Yang <baochen.yang@intel.com>
Co-authored-by: Huijuan Zhou <huijuan.zhou@intel.com>
Co-authored-by: Sergey Plotnikov <sergey.plotnikov@intel.com>
Co-authored-by: Deepak Narayana <deepak.narayana@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Fix some CI baselines
* Add split runners to CI (2 devices per runner for fast tests)
* Fix fast CI to work with split runners (huggingface#1534)
* Fix dtype issue with valid sequence length in torch.compile bs=1 (huggingface#1532)
* Support beam search with reuse_cache and bucket_internal (huggingface#1472)
* Add mixtral trl sft (huggingface#1349)
* Enable tiiuae/falcon-11B-vlm in image_to_text example (huggingface#1490)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Add Llama 3.1 ft to CI (huggingface#1529)
* Migrate OH CLIP (roberta-clip) training to torch.compile (huggingface#1507)
* test_text_generation: fix non-Gaudi2 case (huggingface#1530)
* text-generation: improve output printing (huggingface#1486)
* Text-generation, model set-up: torch.compile for attributes instead of models' types (huggingface#1452)
* FLUX Fine-Tuning for Gaudi (huggingface#1482)
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
* Enable fusedsdpa kernel for vision part of mllama (huggingface#1531)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Minicpm enabling (huggingface#1342)
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
* Fix bridgetower example (#312) (huggingface#1481)
* Migrate OH Wave2Vec-AC training to torch.compile - README update (huggingface#1537)
Co-authored-by: Chaojun Zhang <chzhang@habana.ai>
* Flux Image-To-Image pipeline (huggingface#1524)
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
* Enable Falcon-mamba (huggingface#1480)
Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Enable dynamic compile for mpi(training) (huggingface#1509)
* Migrate OH T5-large training to torch.compile (huggingface#1506)
* Add support for Baichuan2 (huggingface#1479)
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com>
Co-authored-by: Wei Lin <wei2.lin@intel.com>
* trainer: fixed spelling (huggingface#1538)
* Create CI Eager/Lazy for Language Modeling (huggingface#1448)
* Fixes for llava-next test failures in 1.19 (huggingface#1535)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Enable DeepSeek-V2 (huggingface#1475)
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* Refactor Qwen2 Family (huggingface#1541)
* Add support for optimized SDXL pipeline (huggingface#1519)
* Make style
* Add the checkout parameters of falcon-mamba pytest (huggingface#1540)
Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Avoid negative values in eval metrics (huggingface#1533)
* Fixes in unify_measurements (huggingface#1496)
Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai>
Co-authored-by: Eran Geva <egeva@habana.ai>
* Fix lm_eval script for starcoder and gemma (huggingface#1463)
* Add option to use bf16 in PT sdp (#5) (huggingface#1514)
Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
* Fix tests.test_peft_inference failure (huggingface#1543)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* [wav2vec2] Remove tensor.item and dynamic slicing operations in the loop that cause graph break (huggingface#1508)
* Update lm_eval version (huggingface#1473)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Fix lm_eval script for starcoder and gemma (huggingface#1463)
* Add option to use bf16 in PT sdp (#5) (huggingface#1514)
Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
* Fix tests.test_peft_inference failure (huggingface#1543)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Update lm_eval version (huggingface#1473)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Fix bad import in Baichuan code (huggingface#1547)
* Restore performance in generate (huggingface#1546)
Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com>
Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>
Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
* Enable pyTorch-IMage-Models (TIMM) with HPUs (huggingface#1459)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Add HF login for 8x Gaudi2 CI
* Adding support for Context Parallelism using Deepseed's DistributedAttention (huggingface#1501)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Fix bad import in Baichuan code (huggingface#1547)
* Restore performance in generate (huggingface#1546)
Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com>
Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>
Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
* Enable pyTorch-IMage-Models (TIMM) with HPUs (huggingface#1459)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Add HF login for 8x Gaudi2 CI
* Adding support for Context Parallelism using Deepseed's DistributedAttention (huggingface#1501)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Fix Llama CI
* Fix Llama CI
* Add DynamicMoE support for Mixtral (huggingface#1511)
Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
* Fix for llava models not generating text with test failures in 1.19 (huggingface#1548)
* Refactor KV cache, Rope , reduce common code (huggingface#1148)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Adjust Qwen2-7B test case (huggingface#1551)
* [run_lm_eval.py] Fixed too many print dump json info (huggingface#1553)
Signed-off-by: Focus Luo <focus.luo@intel.com>
* Fix for single_card llama7b and falcon40b CI errors (huggingface#1549)
* Implemented fusedSDPA for stable diffusion (#36) (huggingface#1545)
Co-authored-by: Yixiu Chen <yixiu.chen@intel.com>
Co-authored-by: Libin Tang <litang@habana.ai>
* Apply --sdp_on_bf16 to image-to-text examples (huggingface#1557)
* Fix accuracy regression in Gemma (huggingface#1556)
* Fix FusedSDPA wrapper from TransformerEngine (huggingface#1562)
* Add DynamicMoE support for Mixtral (huggingface#1511)
Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
* Fix for llava models not generating text with test failures in 1.19 (huggingface#1548)
* Refactor KV cache, Rope , reduce common code (huggingface#1148)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Adjust Qwen2-7B test case (huggingface#1551)
* [run_lm_eval.py] Fixed too many print dump json info (huggingface#1553)
Signed-off-by: Focus Luo <focus.luo@intel.com>
* Fix for single_card llama7b and falcon40b CI errors (huggingface#1549)
* Implemented fusedSDPA for stable diffusion (#36) (huggingface#1545)
Co-authored-by: Yixiu Chen <yixiu.chen@intel.com>
Co-authored-by: Libin Tang <litang@habana.ai>
* Apply --sdp_on_bf16 to image-to-text examples (huggingface#1557)
* Fix accuracy regression in Gemma (huggingface#1556)
* Fix FusedSDPA wrapper from TransformerEngine (huggingface#1562)
* Run albert-xxlarge-v1 CI as torch.compile mode (huggingface#1563)
* Update README commands for the models to use --sdp_on_bf16 (huggingface#1566)
* Minicpm patch (huggingface#1567)
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
* Updated gemma_2b_it CI (huggingface#1561)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Fixed Adalora Test for OH 1.15 (huggingface#1564)
* Fixed LORACP Test for OH 1.15 (huggingface#1568)
* Run albert-xxlarge-v1 CI as torch.compile mode (huggingface#1563)
* Update README commands for the models to use --sdp_on_bf16 (huggingface#1566)
* Minicpm patch (huggingface#1567)
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
* Updated gemma_2b_it CI (huggingface#1561)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Fixed Adalora Test for OH 1.15 (huggingface#1564)
* Fixed LORACP Test for OH 1.15 (huggingface#1568)
* Add requirements.txt
* Update the baseline for 1.18 to reflect performance in 1.19 (huggingface#1571)
* Fix prefix llama ci failure (huggingface#1570)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* fusedsdpa for stable diffusion xl (huggingface#1565)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Fix prefix llama ci failure (huggingface#1570)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Add sdp_on_bf16 to tests,text-gen (huggingface#1559)
* Fix mllama test (huggingface#1569)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Fix lazy_mode assignment (huggingface#1558)
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
* Fix mllama test (huggingface#1569)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Fix lazy_mode assignment (huggingface#1558)
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
* Fix diffusers import (huggingface#1574)
* Update README commands for more models to use --sdp_on_bf16 (huggingface#1575)
Co-authored-by: Libin Tang <litang@habana.ai>
* Generation utils update (minor) (huggingface#1468)
* style: removed tabs (huggingface#1577)
* Add chatglm (huggingface#1478)
Co-authored-by: Wei Lin <wei2.lin@intel.com>
Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com>
Co-authored-by: Leo Zhao <leo.zhao@intel.com>
* Enable num_return_sequences in beam search (huggingface#1536)
* gpt_bigcode: added internal bucketing fix (huggingface#1526)
* Update the Gaudi trainer with transformers 4.45.2 (huggingface#1398)
* Revert "add check_neural_compressor_min_version for 4 bit behavior" (huggingface#1578)
* Revert PR huggingface#1473 (huggingface#1582)
* Enable num_return_sequences in beam search (huggingface#1536)
* gpt_bigcode: added internal bucketing fix (huggingface#1526)
* Revert "add check_neural_compressor_min_version for 4 bit behavior" (huggingface#1578)
* Revert PR huggingface#1473 (huggingface#1582)
* Remove deprecated env variables
* Add sdp_on_bf16 argument to CI for run_image2text_lora_finetune and a… (huggingface#1585)
* Remove unnecessary neural compressor fix for 1.19 release (huggingface#1584)
* Make style
* Fixed spelling (huggingface#1576)
* Update docs for baichuan2 training (huggingface#1586)
* Fixed spelling (huggingface#1576)
* Update docs for baichuan2 training (huggingface#1586)
* Adjust bert and roberta targets (huggingface#1588)
* Update text-gen readme for autogptq (huggingface#1589)
* Update README to Include Information on Performance Degradation and Mitigation Options (huggingface#1555)
* Fix Accuracy Calculation Issue in GPT-NeoX (huggingface#1591)
* Readme update for llama-405B (huggingface#1587)
Co-authored-by: Mohit Sinha <msinha@habana.ai>
Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Fix Accuracy Calculation Issue in GPT-NeoX (huggingface#1591)
* Add WA flag for falcon-180b to resolve text-gen critical reset error during tests (huggingface#1590)
* Add WA flag for falcon-180b to resolve text-gen critical reset error during tests (huggingface#1590)
* Add sdp_on_bf16 option to diffusers and image/audio classicifation tests (huggingface#1592)
* Update transformers tests generation util v4.45.2 (huggingface#1441)
Co-authored-by: Gustavo <gustavo.malkomes>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Update README.md (huggingface#1595)
* Limit position embeddings in inference (huggingface#1598)
Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
* Verify model output is provided when check_output is enabled (huggingface#1597)
* Limit position embeddings in inference (huggingface#1598)
Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
* Verify model output is provided when check_output is enabled (huggingface#1597)
* Update README.md (huggingface#1595)
* Fix scikit-learn to 1.5.2 to fix f1 evaluation crash in 1.6.0 (huggingface#1596)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Revert common KVCache not to check token_idx (huggingface#1594)
* Update language-modeling README file (huggingface#1599)
Co-authored-by: Libin Tang <litang@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Update readme for audio-classification example (huggingface#1602)
* SDPA flag update - static code analysis (huggingface#1601)
* Revert common KVCache not to check token_idx (huggingface#1594)
* Remove unwanted merged changes in SD pipeline
* Revert LlamaKVCache due to memory increase (huggingface#1605)
* Check rope_scaling attr (huggingface#1609)
* skip certain tests for G1 with empty param list (huggingface#1613)
* Revert "Update transformers tests generation util v4.45.2 (huggingface#1441)" (huggingface#1614)
This reverts commit 2ba520a.
* audio classification readme update (huggingface#1604)
* fix readme cmds for clip-roberta (huggingface#1603)
* fix readme cmds for clip-roberta
* comments and cleanup
* Fix run_generation test commands for TRL out usage example (huggingface#1624)
Fix run_generation example
* Add arbitrary scales (#15) (huggingface#1625)
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
* Modify Qwen2 TRL command to avoid OOM. (huggingface#1630)
Add --use_flash_attention to avoid OOM for Qwen2
* Replace the UNET custom attention processors (huggingface#1608)
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
* Falcon Model Support (huggingface#1612)
Co-authored-by: leopck <sckphoong@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Update sdp_on_bf16 option for ST example (huggingface#1615)
* Update save lora weights for diffusers with text_encoder_2 layers (huggingface#1626)
* Fix `save_lora_weights` in `pipeline_utils.py` (huggingface#1643)
* Refactor mixtral moe block. (huggingface#1635)
* speech-recognition: downgrade datasets version (huggingface#1646)
* add sdp_on_bf16 to controlnet (huggingface#1631)
* add sdp_on_bf16 to controlnet
* Update pipeline_controlnet.py
pass sdp_on_bf16 to controlnet_pipeline
* Update text_to_image_generation.py
* Update text_to_image_generation.py
* Quick fix for quantization/custom op list loading (huggingface#1657)
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
* Update multi-node test dockerfile (huggingface#1662)
* Fixes on OH 1.15 pre release (huggingface#1661)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Fix distributed issue for ST Trainer (huggingface#1649)
* Fix distributed issue for timm (huggingface#1653)
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
* Added missing parameter for llama function call (huggingface#1663)
Co-authored-by: Libin Tang <litang@habana.ai>
* Add reuse_cache for llama3-405b measurement (huggingface#1664)
* Update EFA dockerfile to SynapseAI 1.19.0 (huggingface#1665)
Co-authored-by: Libin Tang <litang@habana.ai>
* Fix bug for GaudiMixtralAttentionLongSequence forward (huggingface#1650)
Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>
* Update to SynapseAI v1.19
* Release: v1.15.0
* Fix style
* save_model - incorrect conflict resolution
* Fix style
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Xin <xin3.he@intel.com>
Signed-off-by: xinhe3 <xinhe3@habana.ai>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: yuanwu <yuan.wu@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com>
Signed-off-by: Focus Luo <focus.luo@intel.com>
Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>
Co-authored-by: Pramod Kumar <144990617+pramodkumar-habanalabs@users.noreply.github.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Roi Tiefenbrunn <roi.tief97@gmail.com>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Konrad Drozd <konrad.drozd@intel.com>
Co-authored-by: Uri Livne <ulivne@habana.ai>
Co-authored-by: Yeonsil Yoon <yyoon@habana.ai>
Co-authored-by: Danny Semiat <dsemiat@habana.ai>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com>
Co-authored-by: Piotr Bielak <pbielak@users.noreply.github.com>
Co-authored-by: Sayantan Sarkar <supersarkar@gmail.com>
Co-authored-by: Harish <hsubramony@habana.ai>
Co-authored-by: Libin Tang <litang@habana.ai>
Co-authored-by: ZhengHongming888 <hongming.zheng@intel.com>
Co-authored-by: Jimin Ha <jha@habana.ai>
Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai>
Co-authored-by: Dmitry <dmitry.smertin@intel.com>
Co-authored-by: Soila Kavulya <soila.p.kavulya@intel.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: xinhe <xin3.he@intel.com>
Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
Co-authored-by: XinyuYe-Intel <xinyu.ye@intel.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com>
Co-authored-by: Miroslav Goncharenko <miroslav.goncharenko@intel.com>
Co-authored-by: Wang, Mengni <mengni.wang@intel.com>
Co-authored-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Vidya Galli <vidya.s.galli@intel.com>
Co-authored-by: deepak-gowda-narayana <140652370+deepak-gowda-narayana@users.noreply.github.com>
Co-authored-by: Supreet Singh <100715017+SupreetSinghPalne@users.noreply.github.com>
Co-authored-by: kaixuanliu <kaixuan.liu@intel.com>
Co-authored-by: ANSHUMAN TRIPATHY <a.tripathy87@gmail.com>
Co-authored-by: sushil dubey <sdubey@habana.ai>
Co-authored-by: Luca Calabria <luca.calabria@intel.com>
Co-authored-by: billishyahao <yahao.he@intel.com>
Co-authored-by: xinhe3 <xinhe3@habana.ai>
Co-authored-by: KP (Edwin) Lau <kiangpeng.lau@intel.com>
Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>
Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
Co-authored-by: Greg Serochi <greg.serochi@intel.com>
Co-authored-by: Seethong Vang <seethong.vang@intel.com>
Co-authored-by: Anastasia Uvarova <anastasia.uvarova@intel.com>
Co-authored-by: Mohit Deopujari <mohit.deopujari@intel.com>
Co-authored-by: Chen Levkovich <chen.levkovich@intel.com>
Co-authored-by: Libin Tang <libin.tang@intel.com>
Co-authored-by: ranzhejiang <zhejiang.ran@intel.com>
Co-authored-by: Baochen Yang <baochen.yang@intel.com>
Co-authored-by: Huijuan Zhou <huijuan.zhou@intel.com>
Co-authored-by: Sergey Plotnikov <sergey.plotnikov@intel.com>
Co-authored-by: Deepak Narayana <deepak.narayana@intel.com>
Co-authored-by: Witold Szczurek <152967125+wszczurekhabana@users.noreply.github.com>
Co-authored-by: Wei Lin <forever871001@163.com>
Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com>
Co-authored-by: Chaojun Zhang <chzhang@habana.ai>
Co-authored-by: Daniel Huang <daniel1.huang@intel.com>
Co-authored-by: Yuan Wu <yuan.wu@intel.com>
Co-authored-by: Xiang, Haihao <haihao.xiang@intel.com>
Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com>
Co-authored-by: Wei Lin <wei2.lin@intel.com>
Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai>
Co-authored-by: Yao Matrix <yaoweifeng0301@126.com>
Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai>
Co-authored-by: Eran Geva <egeva@habana.ai>
Co-authored-by: Alexey Belyakov <alexey.belyakov@intel.com>
Co-authored-by: Bhargav <beede@habana.ai>
Co-authored-by: Krzysztof Wiśniewski <krzysztof2.wisniewski@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: FocusLuo <focus.luo@intel.com>
Co-authored-by: Yixiu Chen <yixiu.chen@intel.com>
Co-authored-by: Nariman Piroozan <87953329+npiroozan@users.noreply.github.com>
Co-authored-by: Edward Mascarenhas <edward.mascarenhas@intel.com>
Co-authored-by: Shiv Kaul <skaul@habana.ai>
Co-authored-by: bmengke <mengkejiergeli.ba@intel.com>
Co-authored-by: Leo Zhao <leo.zhao@intel.com>
Co-authored-by: Mohit Sinha <msinha@habana.ai>
Co-authored-by: Harshvardhan Chauhan <hchauhan@habana.ai>
Co-authored-by: Gustavo Malkomes <gustavo.malkomes@intel.com>
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
Co-authored-by: Alexey Fadeev <alexey.fadeev@intel.com>
Co-authored-by: leopck <sckphoong@habana.ai>
Add missing flash attention flags to gemma
What does this PR do?
Fixes # (issue)
Before submitting