Skip to content

Tests for text gen output text#1411

Merged
regisss merged 6 commits intohuggingface:mainfrom
vidyasiv:functional_test_text_gen
Nov 21, 2024
Merged

Tests for text gen output text#1411
regisss merged 6 commits intohuggingface:mainfrom
vidyasiv:functional_test_text_gen

Conversation

@vidyasiv
Copy link
Copy Markdown
Contributor

@vidyasiv vidyasiv commented Oct 10, 2024

What does this PR do?

Fixes # (issue)

  • Fix to run_generation.py so results.json stores all outputs instead of just the last when batch_size > 1
  • Functional verification of select text gen models to be run for every PR (fast tests)
  • Duration of added fast tests on Gaudi2: 7 passed, 1 warning in 1991.08s (0:33:11)
  • Functional verification added to slow test test_text_generation_example.py for select BF16 1HPU cases
  • Found issues in bigcode/starcoder output in fast(passes) vs slow(fails due to bucketing options), will file ticket to fix that.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@vidyasiv vidyasiv changed the title initial commit Sanity test text gen output Oct 10, 2024
@vidyasiv vidyasiv force-pushed the functional_test_text_gen branch from d58c4d6 to e4e96e9 Compare October 14, 2024 19:59
@vidyasiv
Copy link
Copy Markdown
Contributor Author

@regisss , these tests add 30min to fast tests, let me know if you'd like me to cut down on the models

@vidyasiv vidyasiv marked this pull request as ready for review October 16, 2024 20:33
@vidyasiv vidyasiv requested a review from regisss as a code owner October 16, 2024 20:33
@jiminha
Copy link
Copy Markdown
Contributor

jiminha commented Oct 22, 2024

@vidyasiv this is good, but is it possible we actually add this accuracy check also as part of the test_text_generation_example.py? We already run so many models test there for token generation, so I think it will not add much time to just compare the 1st or last token outputs along with checking throughputs.

@vidyasiv
Copy link
Copy Markdown
Contributor Author

@vidyasiv this is good, but is it possible we actually add this accuracy check also as part of the test_text_generation_example.py? We already run so many models test there for token generation, so I think it will not add much time to just compare the 1st or last token outputs along with checking throughputs.

It won't be part of every PR check then i.e fast tests.

@jiminha
Copy link
Copy Markdown
Contributor

jiminha commented Oct 23, 2024

@vidyasiv this is good, but is it possible we actually add this accuracy check also as part of the test_text_generation_example.py? We already run so many models test there for token generation, so I think it will not add much time to just compare the 1st or last token outputs along with checking throughputs.

It won't be part of every PR check then i.e fast tests.

Got it. Then, can we keep this as it is for fast test, but also add similar logic for the existing text-gen test?

@vidyasiv vidyasiv changed the title Sanity test text gen output Tests for text gen output text Oct 23, 2024
@vidyasiv vidyasiv force-pushed the functional_test_text_gen branch from 6a78e95 to 28eaf8f Compare October 23, 2024 23:47
("Qwen/Qwen2-7B", 512, False, 9669.45787, True),
("Qwen/Qwen1.5-MoE-A2.7B", 1, True, 44.25834541569395, False),
("EleutherAI/gpt-neo-2.7B", 1, False, 257.2476416844122, False),
],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So only the model with "True" will have output tested? If we want to add more output check later on, all we need to do is change this last value to True, and add the values 1st token output to the "MODEL_OUTPUTS" table below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is right

("gpt2-xl", 1, False, 51.61471298016438),
],
}
MODEL_OUTPUTS = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be also used for fp8 test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have not tried but perhaps it can be a later effort

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should add it for fp8 output check as well - with fp8, and int4. Maybe for another PR.

@jiminha
Copy link
Copy Markdown
Contributor

jiminha commented Oct 29, 2024

@vidyasiv Is this time including model downloading time?
Duration of added fast tests on Gaudi2: 7 passed, 1 warning in 1991.08s (0:33:11)

@regisss Could you please review this? Vidya updated existing text-gen slow test, and also added fast_test for the 7 model output check.

@vidyasiv
Copy link
Copy Markdown
Contributor Author

vidyasiv commented Oct 30, 2024

@jiminha , i deleted cache and kicked off a run, will let u know.
update: 7 passed, 1 warning in 2894.18s (0:48:14)

MODEL_OUTPUTS = {
"bigcode/starcoder": 'def print_hello_world():\n print("Hello World")\n\ndef print_hello_world_twice():\n print_hello_world()\n print_hello_world()\n\ndef print_hello_world_thrice():\n print_hello_world()\n print_hello_world()\n print_hello_world()\n\ndef print_hello_world_four_times():\n print_hello_world()\n print_hello_world()\n print_hello_world()\n ',
"bigcode/starcoder2-3b": 'def print_hello_world():\n print("Hello World")\n\ndef print_hello_world_with_name(name):\n print("Hello World, " + name)\n\ndef print_hello_world_with_name_and_age(name, age):\n print("Hello World, " + name + ", " + str(age))\n\ndef print_hello_world_with_name_and_age_and_gender(name, age, gender):\n print("Hello',
"google/gemma-7b": "DeepSpeed is a machine learning framework that enables training of large-scale models on commodity hardware. It is designed to be a drop-in replacement for PyTorch, and it is compatible with the existing PyTorch ecosystem. DeepSpeed is designed to be easy to use, and it provides a number of features that make it easy to train large-scale models.\n\nDeepSpeed is a machine learning framework that enables training of large-scale models on commodity hardware. It is designed to be a drop-in replacement for PyTorch, and",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regisss , a common theme I observe in the output (except starcoder/starcoder2) is repeated sentences, not sure if that is some other bug or expected

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not surprised, greedy search tends to produce very repetitive outputs as it only looks for the most likely token to generate in each iteration. Tweaking some generation parameters (for example to penalize more the sequences of tokens that are already part of the output) and/or using sampling (to consider several candidates and not only the most likely one) usually helps to give the model more flexibility and to get more realistic outputs.

@vidyasiv
Copy link
Copy Markdown
Contributor Author

@jiminha can u approve/ apply run-test?

@jiminha
Copy link
Copy Markdown
Contributor

jiminha commented Nov 13, 2024

@regisss Could you check this PR? We are seeing quite a few accuracy issues recently with many PRs being merged. This will run basic accuracy check on some models with fast test (48min added as Vidya tested). Please check if this is ok.

@jiminha jiminha added the run-test Run CI for PRs from external contributors label Nov 13, 2024
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR!

I would just not add this type of check to the fast tests yet, see my comment below. The reason is that I recently migrated the fast tests from DL1 instances to Gaudi2. And I have only one Gaudi2 server for the whole CI (i.e. all the fast tests of all the PRs + nightly tests) vs up to 4 DL1 instances in parallel before. I'm afraid this would increase the queue of workflows quite a lot. Happy to reconsider it if/when I have at least one more server for CI. In the meantime, these checks will still be performed in the nightly tests.

MODEL_OUTPUTS = {
"bigcode/starcoder": 'def print_hello_world():\n print("Hello World")\n\ndef print_hello_world_twice():\n print_hello_world()\n print_hello_world()\n\ndef print_hello_world_thrice():\n print_hello_world()\n print_hello_world()\n print_hello_world()\n\ndef print_hello_world_four_times():\n print_hello_world()\n print_hello_world()\n print_hello_world()\n ',
"bigcode/starcoder2-3b": 'def print_hello_world():\n print("Hello World")\n\ndef print_hello_world_with_name(name):\n print("Hello World, " + name)\n\ndef print_hello_world_with_name_and_age(name, age):\n print("Hello World, " + name + ", " + str(age))\n\ndef print_hello_world_with_name_and_age_and_gender(name, age, gender):\n print("Hello',
"google/gemma-7b": "DeepSpeed is a machine learning framework that enables training of large-scale models on commodity hardware. It is designed to be a drop-in replacement for PyTorch, and it is compatible with the existing PyTorch ecosystem. DeepSpeed is designed to be easy to use, and it provides a number of features that make it easy to train large-scale models.\n\nDeepSpeed is a machine learning framework that enables training of large-scale models on commodity hardware. It is designed to be a drop-in replacement for PyTorch, and",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not surprised, greedy search tends to produce very repetitive outputs as it only looks for the most likely token to generate in each iteration. Tweaking some generation parameters (for example to penalize more the sequences of tokens that are already part of the output) and/or using sampling (to consider several candidates and not only the most likely one) usually helps to give the model more flexibility and to get more realistic outputs.

Makefile Outdated
fast_tests:
python -m pip install .[tests]
python -m pytest tests/test_gaudi_configuration.py tests/test_trainer_distributed.py tests/test_trainer.py tests/test_trainer_seq2seq.py
python -m pytest test_functional_text_generation_example.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit tricky to add this one to the fast tests now as it's going to be quite slow with the current infrastructure. Maybe we can comment this line and add a TODO comment to uncomment it when we have more servers dedicated to CI.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@regisss regisss merged commit a69c201 into huggingface:main Nov 21, 2024
Luca-Calabria pushed a commit to Luca-Calabria/optimum-habana that referenced this pull request Nov 25, 2024
HolyFalafel pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Nov 26, 2024
Liangyx2 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jan 20, 2025
@vidyasiv vidyasiv deleted the functional_test_text_gen branch January 22, 2025 22:14
xinyu-intel pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Mar 4, 2025
* Add flag to run inference with partial dataset (huggingface#1420)

* Add peft generation example (huggingface#1427)

* Upgrade to SynapseAI 1.18.0 (huggingface#1418)

* Simplify HQT config files (huggingface#1219)

* unify_measurements.py script support to unify PCQ 70B 8x (huggingface#1322)

* Add misc. training args (huggingface#1346)

* Add quantization config for low bs case (huggingface#1377)

* Remove HQT from OHF (huggingface#1257)

Co-authored-by: Adam Stachowicz <astachowicz@habana.ai>
Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
Co-authored-by: Yeonsil Yoon <yyoon@habana.ai>

* Load INC GPTQ checkpoint & rename params (huggingface#1364)

Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com>
Co-authored-by: Yeonsil Yoon <yyoon@habana.ai>

* Enable FusedSDPA fp8 in Llama FT (huggingface#1388)

Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com>

* Valid sequence length for sdpa (huggingface#1183)

Co-authored-by: Harish <hsubramony@habana.ai>
Co-authored-by: Libin Tang <litang@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Multiple fixes (dynamo graph break, qwen-moe, multicard) (huggingface#1410)

* datasets downgrade version to 2.21.0 (huggingface#1413)

* Update ci sentence_transformer.sh (huggingface#1424)

* Fix load INC load weights compile error due to Transformer 4.45 upgrade.  (huggingface#1421)

* Update language-modeling README.md, add trust_remote_code for flan-t5-xl (huggingface#1422)

* Update unify_measurements.py support info (huggingface#1425)

* GPT2 torch.compile fix (huggingface#1434)

* Added missing allocate_kv_cache() call in CausalLM class (huggingface#1431)

* Fix merge error and update text-to-speech readme (huggingface#1436)

* Fix OOM error for code llama (huggingface#1437)

* Fix error on 4bit checkpoint load with run_lm_eval on TF4.45.2 (huggingface#1439)

* Fix scoped linear all-reduce for starcoder model (huggingface#1432)

* Fixed recursion error in SentenceTransformer (huggingface#1428)

* Fix Llama 3.1 generation (huggingface#1444)

* Update text-gen README.md to add auto-gptq fork install steps (huggingface#1442)

* Added gemma specific fp8 quantization file (huggingface#1445)

* Remove cache folder from image data folder (huggingface#1446)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Bump dev version

* Enable DeepSpeed for image-to-text example (huggingface#1455)

* Fix bug when loading 4bit checkpoint quantized in INC (huggingface#1447)

* Fixes 'Tokenizer does not have padding token' introduced by  huggingface#1444 for Llama3.1 (huggingface#1457)

* Fix facebook/hf-seamless-m4t-medium crash (huggingface#1433)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Fix bias update in scoped all reduce (huggingface#1456)

* Added skip for unsuported tests for mistral/mixtral (huggingface#1462)

* Update sentence transformer to v3.2.1 (huggingface#1470)

* Optimized inference of Cohere model on HPU (huggingface#1329)

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>

* Idefics2 (huggingface#1270)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Remove deprecated Mixed precision flags (huggingface#1471)

Change-Id: I1c2e2460dc2072ba7b311f239441b304694918c8

* Optimized inference of XGLM model on HPU (huggingface#1323)

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>

* Add mllama support (huggingface#1419)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Enable flash attention for gemma (huggingface#1454)

* Readme: replace tabs with spaces (huggingface#1485)

* Move fast tests to Gaudi2 (huggingface#1498)

* Support loading 4 bit Qwen2 (huggingface#1476)

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

* Add textual inversion XL for Gaudi (huggingface#868)

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>

* Remove torch req from LM example (huggingface#1491)

* Remove keep_input_mutations (huggingface#1492)

* Fix trust_remote_code (huggingface#1493)

* Upgrade ViT README with torch.compile (huggingface#1494)

* Tests for text gen output text (huggingface#1411)

* Corrected Throughput measure for GaudiDDPMPipeline (huggingface#1460)

* Fix text generation test

* Add G3 in T5-L README (huggingface#1523)

* Fix tuple object error (huggingface#1354)

* Add warmup time and compile time log for the eval/prediction.  (huggingface#1489)

* Fix style

* Enable `paligemma` model for image-to-text example (huggingface#1407)

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Add support for MLPERF optimized pipeline from example (huggingface#1465)

Co-authored-by: sushil dubey <sdubey@habana.ai>

* Enable Gemma2 Inference on Gaudi (huggingface#1504)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: billishyahao <yahao.he@intel.com>
Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com>
Co-authored-by: Yeonsil Yoon <yyoon@habana.ai>
Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: xinhe <xin3.he@intel.com>
Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Soila Kavulya <soila.p.kavulya@intel.com>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
Co-authored-by: ZhengHongming888 <hongming.zheng@intel.com>
Co-authored-by: XinyuYe-Intel <xinyu.ye@intel.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com>
Co-authored-by: Miroslav Goncharenko <miroslav.goncharenko@intel.com>
Co-authored-by: Wang, Mengni <mengni.wang@intel.com>
Co-authored-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
Co-authored-by: Vidya Galli <vidya.s.galli@intel.com>
Co-authored-by: deepak-gowda-narayana <140652370+deepak-gowda-narayana@users.noreply.github.com>

* Add check_neural_compressor_min_version for 4 bit behavior (huggingface#1500)

Signed-off-by: Xin <xin3.he@intel.com>
Signed-off-by: xinhe3 <xinhe3@habana.ai>
Co-authored-by: xinhe3 <xinhe3@habana.ai>

* Fixed Gemma FP8 flash_attention lower throughput issue (huggingface#1510)

* Pass "lazy_mode" arg to GaudiLlamaModel GaudiTrainer (huggingface#1515)

Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>

* Removed workaround for NaN bug causing graph break. (huggingface#1516)

Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>

* Disable default sdpa in Albert (#22) (huggingface#1517)

Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>

* Implement fused sdpa for wav2vec2 (#18) (huggingface#1520)

* Memory optimization for gpt_bitcode (#4) (huggingface#1513)

Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>

* text_generation: improve parameters check (huggingface#1527)

* transformers: fixed some typos (huggingface#1528)

* Update DeepSpeed CI baselines

* Update FSDP CI baseline

* Optimum-Habana docs re-org (huggingface#1488)

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Greg Serochi <greg.serochi@intel.com>
Co-authored-by: Kiangpeng Lau <kiangpeng.lau@intel.com>
Co-authored-by: Seethong Vang <seethong.vang@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Anastasia Uvarova <anastasia.uvarova@intel.com>
Co-authored-by: Mohit Deopujari <mohit.deopujari@intel.com>
Co-authored-by: Chen Levkovich <chen.levkovich@intel.com>
Co-authored-by: Libin Tang <libin.tang@intel.com>

* Makes the with_stack of the profiler changeable (huggingface#1497)

* FLUX with diffusers 0.31.0 (huggingface#1450)

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Baochen Yang <baochen.yang@intel.com>
Co-authored-by: Huijuan Zhou <huijuan.zhou@intel.com>
Co-authored-by: Sergey Plotnikov <sergey.plotnikov@intel.com>
Co-authored-by: Deepak Narayana <deepak.narayana@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Fix some CI baselines

* Add split runners to CI (2 devices per runner for fast tests)

* Fix fast CI to work with split runners (huggingface#1534)

* Fix dtype issue with valid sequence length in torch.compile bs=1 (huggingface#1532)

* Support beam search with reuse_cache and bucket_internal (huggingface#1472)

* Add mixtral trl sft (huggingface#1349)

* Enable tiiuae/falcon-11B-vlm in image_to_text example (huggingface#1490)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Add Llama 3.1 ft to CI (huggingface#1529)

* Migrate OH CLIP (roberta-clip) training to torch.compile (huggingface#1507)

* test_text_generation: fix non-Gaudi2 case (huggingface#1530)

* text-generation: improve output printing (huggingface#1486)

* Text-generation, model set-up: torch.compile for attributes instead of models' types (huggingface#1452)

* FLUX Fine-Tuning for Gaudi (huggingface#1482)

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

* Enable fusedsdpa kernel for vision part of mllama (huggingface#1531)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Minicpm enabling (huggingface#1342)

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Fix bridgetower example (#312) (huggingface#1481)

* Migrate OH Wave2Vec-AC training to torch.compile - README update (huggingface#1537)

Co-authored-by: Chaojun Zhang <chzhang@habana.ai>

* Flux Image-To-Image pipeline (huggingface#1524)

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>

* Enable Falcon-mamba (huggingface#1480)

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Enable dynamic compile for mpi(training) (huggingface#1509)

* Migrate OH T5-large training to torch.compile (huggingface#1506)

* Add support for Baichuan2 (huggingface#1479)

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com>
Co-authored-by: Wei Lin <wei2.lin@intel.com>

* trainer: fixed spelling (huggingface#1538)

* Create CI Eager/Lazy for Language Modeling (huggingface#1448)

* Fixes for llava-next test failures in 1.19 (huggingface#1535)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Enable DeepSeek-V2 (huggingface#1475)

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* Refactor Qwen2 Family (huggingface#1541)

* Add support for optimized SDXL pipeline (huggingface#1519)

* Make style

* Add the checkout parameters of falcon-mamba pytest (huggingface#1540)

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Avoid negative values in eval metrics (huggingface#1533)

* Fixes in unify_measurements (huggingface#1496)

Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai>
Co-authored-by: Eran Geva <egeva@habana.ai>

* Fix lm_eval script for starcoder and gemma (huggingface#1463)

* Add option to use bf16 in PT sdp (#5) (huggingface#1514)

Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>

* Fix tests.test_peft_inference failure (huggingface#1543)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* [wav2vec2] Remove tensor.item and dynamic slicing operations in the loop that cause graph break (huggingface#1508)

* Update lm_eval version (huggingface#1473)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Fix lm_eval script for starcoder and gemma (huggingface#1463)

* Add option to use bf16 in PT sdp (#5) (huggingface#1514)

Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>

* Fix tests.test_peft_inference failure (huggingface#1543)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Update lm_eval version (huggingface#1473)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Fix bad import in Baichuan code (huggingface#1547)

* Restore performance in generate (huggingface#1546)

Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com>
Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>
Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>

* Enable pyTorch-IMage-Models (TIMM) with HPUs (huggingface#1459)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Add HF login for 8x Gaudi2 CI

* Adding support for Context Parallelism using Deepseed's DistributedAttention (huggingface#1501)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Fix bad import in Baichuan code (huggingface#1547)

* Restore performance in generate (huggingface#1546)

Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com>
Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>
Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>

* Enable pyTorch-IMage-Models (TIMM) with HPUs (huggingface#1459)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Add HF login for 8x Gaudi2 CI

* Adding support for Context Parallelism using Deepseed's DistributedAttention (huggingface#1501)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Fix Llama CI

* Fix Llama CI

* Add DynamicMoE support for Mixtral (huggingface#1511)

Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>

* Fix for llava models not generating text with test failures in 1.19 (huggingface#1548)

* Refactor KV cache, Rope  , reduce common code  (huggingface#1148)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Adjust Qwen2-7B test case (huggingface#1551)

* [run_lm_eval.py] Fixed too many print dump json info (huggingface#1553)

Signed-off-by: Focus Luo <focus.luo@intel.com>

* Fix for single_card llama7b and falcon40b CI errors (huggingface#1549)

* Implemented fusedSDPA for stable diffusion (#36) (huggingface#1545)

Co-authored-by: Yixiu Chen <yixiu.chen@intel.com>
Co-authored-by: Libin Tang <litang@habana.ai>

* Apply --sdp_on_bf16 to image-to-text examples (huggingface#1557)

* Fix accuracy regression in Gemma (huggingface#1556)

* Fix FusedSDPA wrapper from TransformerEngine (huggingface#1562)

* Add DynamicMoE support for Mixtral (huggingface#1511)

Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>

* Fix for llava models not generating text with test failures in 1.19 (huggingface#1548)

* Refactor KV cache, Rope  , reduce common code  (huggingface#1148)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Adjust Qwen2-7B test case (huggingface#1551)

* [run_lm_eval.py] Fixed too many print dump json info (huggingface#1553)

Signed-off-by: Focus Luo <focus.luo@intel.com>

* Fix for single_card llama7b and falcon40b CI errors (huggingface#1549)

* Implemented fusedSDPA for stable diffusion (#36) (huggingface#1545)

Co-authored-by: Yixiu Chen <yixiu.chen@intel.com>
Co-authored-by: Libin Tang <litang@habana.ai>

* Apply --sdp_on_bf16 to image-to-text examples (huggingface#1557)

* Fix accuracy regression in Gemma (huggingface#1556)

* Fix FusedSDPA wrapper from TransformerEngine (huggingface#1562)

* Run albert-xxlarge-v1 CI as torch.compile mode (huggingface#1563)

* Update README commands for the models to use --sdp_on_bf16 (huggingface#1566)

* Minicpm patch (huggingface#1567)

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Updated gemma_2b_it CI (huggingface#1561)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Fixed Adalora Test for OH 1.15 (huggingface#1564)

* Fixed LORACP Test for OH 1.15 (huggingface#1568)

* Run albert-xxlarge-v1 CI as torch.compile mode (huggingface#1563)

* Update README commands for the models to use --sdp_on_bf16 (huggingface#1566)

* Minicpm patch (huggingface#1567)

Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

* Updated gemma_2b_it CI (huggingface#1561)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Fixed Adalora Test for OH 1.15 (huggingface#1564)

* Fixed LORACP Test for OH 1.15 (huggingface#1568)

* Add requirements.txt

* Update the baseline for 1.18 to reflect performance in 1.19 (huggingface#1571)

* Fix prefix llama ci failure (huggingface#1570)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fusedsdpa for stable diffusion xl (huggingface#1565)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Fix prefix llama ci failure (huggingface#1570)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Add sdp_on_bf16 to tests,text-gen (huggingface#1559)

* Fix mllama test (huggingface#1569)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Fix lazy_mode assignment (huggingface#1558)

Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>

* Fix mllama test (huggingface#1569)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Fix lazy_mode assignment (huggingface#1558)

Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>

* Fix diffusers import (huggingface#1574)

* Update README commands for more models to use --sdp_on_bf16 (huggingface#1575)

Co-authored-by: Libin Tang <litang@habana.ai>

* Generation utils update (minor) (huggingface#1468)

* style: removed tabs (huggingface#1577)

* Add chatglm (huggingface#1478)

Co-authored-by: Wei Lin <wei2.lin@intel.com>
Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com>
Co-authored-by: Leo Zhao <leo.zhao@intel.com>

* Enable num_return_sequences in beam search (huggingface#1536)

* gpt_bigcode: added internal bucketing fix (huggingface#1526)

* Update the Gaudi trainer with transformers 4.45.2 (huggingface#1398)

* Revert "add check_neural_compressor_min_version for 4 bit behavior" (huggingface#1578)

* Revert PR huggingface#1473 (huggingface#1582)

* Enable num_return_sequences in beam search (huggingface#1536)

* gpt_bigcode: added internal bucketing fix (huggingface#1526)

* Revert "add check_neural_compressor_min_version for 4 bit behavior" (huggingface#1578)

* Revert PR huggingface#1473 (huggingface#1582)

* Remove deprecated env variables

* Add sdp_on_bf16 argument to CI for run_image2text_lora_finetune and a… (huggingface#1585)

* Remove unnecessary neural compressor fix for 1.19 release (huggingface#1584)

* Make style

* Fixed spelling (huggingface#1576)

* Update docs for baichuan2 training (huggingface#1586)

* Fixed spelling (huggingface#1576)

* Update docs for baichuan2 training (huggingface#1586)

* Adjust bert and roberta targets (huggingface#1588)

* Update text-gen readme for autogptq (huggingface#1589)

* Update README to Include Information on Performance Degradation and Mitigation Options (huggingface#1555)

* Fix Accuracy Calculation Issue in GPT-NeoX (huggingface#1591)

* Readme update for llama-405B (huggingface#1587)

Co-authored-by: Mohit Sinha <msinha@habana.ai>
Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Fix Accuracy Calculation Issue in GPT-NeoX (huggingface#1591)

* Add WA flag for falcon-180b to resolve text-gen critical reset error during tests (huggingface#1590)

* Add WA flag for falcon-180b to resolve text-gen critical reset error during tests (huggingface#1590)

* Add sdp_on_bf16 option to diffusers and image/audio classicifation tests (huggingface#1592)

* Update transformers tests generation util v4.45.2 (huggingface#1441)

Co-authored-by: Gustavo <gustavo.malkomes>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Update README.md (huggingface#1595)

* Limit position embeddings in inference (huggingface#1598)

Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>

* Verify model output is provided when check_output is enabled (huggingface#1597)

* Limit position embeddings in inference (huggingface#1598)

Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>

* Verify model output is provided when check_output is enabled (huggingface#1597)

* Update README.md (huggingface#1595)

* Fix scikit-learn to 1.5.2 to fix f1 evaluation crash in 1.6.0 (huggingface#1596)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Revert common KVCache not to check token_idx (huggingface#1594)

* Update language-modeling README file (huggingface#1599)

Co-authored-by: Libin Tang <litang@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Update readme for audio-classification example (huggingface#1602)

* SDPA flag update - static code analysis (huggingface#1601)

* Revert common KVCache not to check token_idx (huggingface#1594)

* Remove unwanted merged changes in SD pipeline

* Revert LlamaKVCache due to memory increase (huggingface#1605)

* Check rope_scaling attr (huggingface#1609)

* skip certain tests for G1 with empty param list (huggingface#1613)

* Revert "Update transformers tests generation util v4.45.2 (huggingface#1441)" (huggingface#1614)

This reverts commit 2ba520a.

* audio classification readme update (huggingface#1604)

* fix readme cmds for clip-roberta (huggingface#1603)

* fix readme cmds for clip-roberta

* comments and cleanup

* Fix run_generation test commands for TRL out usage example (huggingface#1624)

Fix run_generation example

* Add arbitrary scales (#15) (huggingface#1625)

Co-authored-by: Linoy Buchnik <linoybu@gmail.com>

* Modify Qwen2 TRL command to avoid OOM.  (huggingface#1630)

Add --use_flash_attention to avoid OOM for Qwen2

* Replace the UNET custom attention processors (huggingface#1608)

Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>

* Falcon Model Support (huggingface#1612)

Co-authored-by: leopck <sckphoong@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Update sdp_on_bf16 option for ST example (huggingface#1615)

* Update save lora weights for diffusers with text_encoder_2 layers (huggingface#1626)

* Fix `save_lora_weights` in `pipeline_utils.py` (huggingface#1643)

* Refactor mixtral moe block. (huggingface#1635)

* speech-recognition: downgrade datasets version (huggingface#1646)

* add sdp_on_bf16 to controlnet (huggingface#1631)

* add sdp_on_bf16 to controlnet

* Update pipeline_controlnet.py

pass sdp_on_bf16 to controlnet_pipeline

* Update text_to_image_generation.py

* Update text_to_image_generation.py

* Quick fix for quantization/custom op list loading (huggingface#1657)

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

* Update multi-node test dockerfile (huggingface#1662)

* Fixes on OH 1.15 pre release (huggingface#1661)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Fix distributed issue for ST Trainer (huggingface#1649)

* Fix distributed issue for timm (huggingface#1653)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

* Added missing parameter for llama function call (huggingface#1663)

Co-authored-by: Libin Tang <litang@habana.ai>

* Add reuse_cache for llama3-405b measurement (huggingface#1664)

* Update EFA dockerfile to SynapseAI 1.19.0 (huggingface#1665)

Co-authored-by: Libin Tang <litang@habana.ai>

* Fix bug for GaudiMixtralAttentionLongSequence forward (huggingface#1650)

Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

* Update to SynapseAI v1.19

* Release: v1.15.0

* Fix style

* save_model - incorrect conflict resolution

* Fix style

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Xin <xin3.he@intel.com>
Signed-off-by: xinhe3 <xinhe3@habana.ai>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: yuanwu <yuan.wu@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com>
Signed-off-by: Focus Luo <focus.luo@intel.com>
Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>
Co-authored-by: Pramod Kumar <144990617+pramodkumar-habanalabs@users.noreply.github.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Roi Tiefenbrunn <roi.tief97@gmail.com>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Konrad Drozd <konrad.drozd@intel.com>
Co-authored-by: Uri Livne <ulivne@habana.ai>
Co-authored-by: Yeonsil Yoon <yyoon@habana.ai>
Co-authored-by: Danny Semiat <dsemiat@habana.ai>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com>
Co-authored-by: Piotr Bielak <pbielak@users.noreply.github.com>
Co-authored-by: Sayantan Sarkar <supersarkar@gmail.com>
Co-authored-by: Harish <hsubramony@habana.ai>
Co-authored-by: Libin Tang <litang@habana.ai>
Co-authored-by: ZhengHongming888 <hongming.zheng@intel.com>
Co-authored-by: Jimin Ha <jha@habana.ai>
Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai>
Co-authored-by: Dmitry <dmitry.smertin@intel.com>
Co-authored-by: Soila Kavulya <soila.p.kavulya@intel.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: xinhe <xin3.he@intel.com>
Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai>
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
Co-authored-by: XinyuYe-Intel <xinyu.ye@intel.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com>
Co-authored-by: Miroslav Goncharenko <miroslav.goncharenko@intel.com>
Co-authored-by: Wang, Mengni <mengni.wang@intel.com>
Co-authored-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Vidya Galli <vidya.s.galli@intel.com>
Co-authored-by: deepak-gowda-narayana <140652370+deepak-gowda-narayana@users.noreply.github.com>
Co-authored-by: Supreet Singh <100715017+SupreetSinghPalne@users.noreply.github.com>
Co-authored-by: kaixuanliu <kaixuan.liu@intel.com>
Co-authored-by: ANSHUMAN TRIPATHY <a.tripathy87@gmail.com>
Co-authored-by: sushil dubey <sdubey@habana.ai>
Co-authored-by: Luca Calabria <luca.calabria@intel.com>
Co-authored-by: billishyahao <yahao.he@intel.com>
Co-authored-by: xinhe3 <xinhe3@habana.ai>
Co-authored-by: KP (Edwin) Lau <kiangpeng.lau@intel.com>
Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>
Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
Co-authored-by: Greg Serochi <greg.serochi@intel.com>
Co-authored-by: Seethong Vang <seethong.vang@intel.com>
Co-authored-by: Anastasia Uvarova <anastasia.uvarova@intel.com>
Co-authored-by: Mohit Deopujari <mohit.deopujari@intel.com>
Co-authored-by: Chen Levkovich <chen.levkovich@intel.com>
Co-authored-by: Libin Tang <libin.tang@intel.com>
Co-authored-by: ranzhejiang <zhejiang.ran@intel.com>
Co-authored-by: Baochen Yang <baochen.yang@intel.com>
Co-authored-by: Huijuan Zhou <huijuan.zhou@intel.com>
Co-authored-by: Sergey Plotnikov <sergey.plotnikov@intel.com>
Co-authored-by: Deepak Narayana <deepak.narayana@intel.com>
Co-authored-by: Witold Szczurek <152967125+wszczurekhabana@users.noreply.github.com>
Co-authored-by: Wei Lin <forever871001@163.com>
Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com>
Co-authored-by: Chaojun Zhang <chzhang@habana.ai>
Co-authored-by: Daniel Huang <daniel1.huang@intel.com>
Co-authored-by: Yuan Wu <yuan.wu@intel.com>
Co-authored-by: Xiang, Haihao <haihao.xiang@intel.com>
Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com>
Co-authored-by: Wei Lin <wei2.lin@intel.com>
Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai>
Co-authored-by: Yao Matrix <yaoweifeng0301@126.com>
Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai>
Co-authored-by: Eran Geva <egeva@habana.ai>
Co-authored-by: Alexey Belyakov <alexey.belyakov@intel.com>
Co-authored-by: Bhargav <beede@habana.ai>
Co-authored-by: Krzysztof Wiśniewski <krzysztof2.wisniewski@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: FocusLuo <focus.luo@intel.com>
Co-authored-by: Yixiu Chen <yixiu.chen@intel.com>
Co-authored-by: Nariman Piroozan <87953329+npiroozan@users.noreply.github.com>
Co-authored-by: Edward Mascarenhas <edward.mascarenhas@intel.com>
Co-authored-by: Shiv Kaul <skaul@habana.ai>
Co-authored-by: bmengke <mengkejiergeli.ba@intel.com>
Co-authored-by: Leo Zhao <leo.zhao@intel.com>
Co-authored-by: Mohit Sinha <msinha@habana.ai>
Co-authored-by: Harshvardhan Chauhan <hchauhan@habana.ai>
Co-authored-by: Gustavo Malkomes <gustavo.malkomes@intel.com>
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
Co-authored-by: Alexey Fadeev <alexey.fadeev@intel.com>
Co-authored-by: leopck <sckphoong@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants