Skip to content

Fix scoped linear all-reduce for starcoder model#1432

Merged
regisss merged 2 commits intohuggingface:mainfrom
skavulya:starcoder-fix
Oct 20, 2024
Merged

Fix scoped linear all-reduce for starcoder model#1432
regisss merged 2 commits intohuggingface:mainfrom
skavulya:starcoder-fix

Conversation

@skavulya
Copy link
Copy Markdown
Contributor

@skavulya skavulya commented Oct 17, 2024

A bug in scoped linear all-reduce implementation for starcoder2 model caused incorrect output as shown below:

python ../gaudi_spawn.py --use_deepspeed --world_size 2 run_generation.py --model_name_or_path bigcode/starcoder2-15b --use_hpu_graphs --trust_remote_code --attn_softmax_bf16 --trim_logits --use_kv_cache --use_flash_attention --flash_attention_recompute --max_new_tokens 128 --batch_size 1 --bf16 --prompt "def is_prime():"

Output:
Input/outputs:
input 1: ('def is_prime():',)
output 1: ('def is_prime(): ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (',)

Output after the fix:
Input/outputs:
input 1: ('def is_prime():',)
output 1: ('def is_prime():\n for i in range(2, int(math.sqrt(n)) + 1):\n if n % i == 0:\n return False\n return True\n\ndef is_palindrome():\n return str(n) == str(n)[::-1]\n\ndef is_lychrel():\n for i in range(50):\n n = n + int(str(n)[::-1])\n if is_palindrome():\n return False\n return True\n\ncount = 0\nfor i in range(10000):\n if is_lychrel():',)

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@skavulya skavulya requested a review from regisss as a code owner October 17, 2024 03:25
@mandy-li mandy-li requested a review from libinta October 17, 2024 03:45
@mandy-li mandy-li added the run-test Run CI for PRs from external contributors label Oct 17, 2024
Comment on lines 67 to 68
output = F.dropout(x, p=self.residual_dropout, training=self.training)
return output
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output = F.dropout(x, p=self.residual_dropout, training=self.training)
return output
x= F.dropout(x, p=self.residual_dropout, training=self.training)
return x

@skavulya why do you create a new memory here, is there a reason you do not overwrite x?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def post_all_reduce(self, input):
output = input + self.bias if (self.bias is not None) else input
return output
# inplace addition needed for correct results
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skavulya very nice catch!

Does this affect any other tests or models (falcon, llama, gemma, qwen2, qwen2_moe)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at llama and qwen2. They were not affected because they didn't use a bias. I can test falcon, and qwen2 moe.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skavulya if llama and qwen2 did not use a bias, then the output is wrong isn't it? since based on the original code output=input + input

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For llama and qwen2, bias was None so the addition is skipped.

@skavulya
Copy link
Copy Markdown
Contributor Author

@yafshar The results of the transformer tests on main and this pr are the same

GAUDI2_CI=1 RUN_SLOW=true python -m pytest tests/transformers/tests/models/
main: commit f98688d ==== 41 failed, 993 passed, 401 skipped, 101 warnings in 1531.33s (0:25:31) ====
this pr: ==== 41 failed, 993 passed, 401 skipped, 101 warnings in 1296.93s (0:21:36) ====

Copy link
Copy Markdown
Contributor

@yafshar yafshar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@regisss this PR is ready. Would you check it.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@regisss regisss merged commit 3d047db into huggingface:main Oct 20, 2024
@skavulya
Copy link
Copy Markdown
Contributor Author

Thanks @regisss Please include this fix in the 1.14 release. We need it for the TGI gaudi release

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Oct 21, 2024

Thanks @regisss Please include this fix in the 1.14 release. We need it for the TGI gaudi release

Just added it to the release branch: https://github.com/huggingface/optimum-habana/commits/v1.14-release

xinyu-intel pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Mar 4, 2025
* Add flag to run inference with partial dataset (huggingface#1420)

* Add peft generation example (huggingface#1427)

* Upgrade to SynapseAI 1.18.0 (huggingface#1418)

* Simplify HQT config files (huggingface#1219)

* unify_measurements.py script support to unify PCQ 70B 8x (huggingface#1322)

* Add misc. training args (huggingface#1346)

* Add quantization config for low bs case (huggingface#1377)

* Remove HQT from OHF (huggingface#1257)

Co-authored-by: Adam Stachowicz <[email protected]>
Co-authored-by: Adam Stachowicz <[email protected]>
Co-authored-by: Yeonsil Yoon <[email protected]>

* Load INC GPTQ checkpoint & rename params (huggingface#1364)

Co-authored-by: Yaser Afshar <[email protected]>
Co-authored-by: Harish Subramony <[email protected]>
Co-authored-by: Yeonsil Yoon <[email protected]>

* Enable FusedSDPA fp8 in Llama FT (huggingface#1388)

Co-authored-by: Yaser Afshar <[email protected]>
Co-authored-by: Harish Subramony <[email protected]>

* Valid sequence length for sdpa (huggingface#1183)

Co-authored-by: Harish <[email protected]>
Co-authored-by: Libin Tang <[email protected]>
Co-authored-by: regisss <[email protected]>

* Multiple fixes (dynamo graph break, qwen-moe, multicard) (huggingface#1410)

* datasets downgrade version to 2.21.0 (huggingface#1413)

* Update ci sentence_transformer.sh (huggingface#1424)

* Fix load INC load weights compile error due to Transformer 4.45 upgrade.  (huggingface#1421)

* Update language-modeling README.md, add trust_remote_code for flan-t5-xl (huggingface#1422)

* Update unify_measurements.py support info (huggingface#1425)

* GPT2 torch.compile fix (huggingface#1434)

* Added missing allocate_kv_cache() call in CausalLM class (huggingface#1431)

* Fix merge error and update text-to-speech readme (huggingface#1436)

* Fix OOM error for code llama (huggingface#1437)

* Fix error on 4bit checkpoint load with run_lm_eval on TF4.45.2 (huggingface#1439)

* Fix scoped linear all-reduce for starcoder model (huggingface#1432)

* Fixed recursion error in SentenceTransformer (huggingface#1428)

* Fix Llama 3.1 generation (huggingface#1444)

* Update text-gen README.md to add auto-gptq fork install steps (huggingface#1442)

* Added gemma specific fp8 quantization file (huggingface#1445)

* Remove cache folder from image data folder (huggingface#1446)

Co-authored-by: regisss <[email protected]>

* Bump dev version

* Enable DeepSpeed for image-to-text example (huggingface#1455)

* Fix bug when loading 4bit checkpoint quantized in INC (huggingface#1447)

* Fixes 'Tokenizer does not have padding token' introduced by  huggingface#1444 for Llama3.1 (huggingface#1457)

* Fix facebook/hf-seamless-m4t-medium crash (huggingface#1433)

Signed-off-by: Wang, Yi A <[email protected]>

* Fix bias update in scoped all reduce (huggingface#1456)

* Added skip for unsuported tests for mistral/mixtral (huggingface#1462)

* Update sentence transformer to v3.2.1 (huggingface#1470)

* Optimized inference of Cohere model on HPU (huggingface#1329)

Signed-off-by: Ye, Xinyu <[email protected]>

* Idefics2 (huggingface#1270)

Signed-off-by: Wang, Yi A <[email protected]>

* Remove deprecated Mixed precision flags (huggingface#1471)

Change-Id: I1c2e2460dc2072ba7b311f239441b304694918c8

* Optimized inference of XGLM model on HPU (huggingface#1323)

Signed-off-by: Ye, Xinyu <[email protected]>

* Add mllama support (huggingface#1419)

Signed-off-by: Wang, Yi A <[email protected]>

* Enable flash attention for gemma (huggingface#1454)

* Readme: replace tabs with spaces (huggingface#1485)

* Move fast tests to Gaudi2 (huggingface#1498)

* Support loading 4 bit Qwen2 (huggingface#1476)

Signed-off-by: Mengni Wang <[email protected]>

* Add textual inversion XL for Gaudi (huggingface#868)

Signed-off-by: Daniel Socek <[email protected]>
Co-authored-by: Iman Gohari <[email protected]>

* Remove torch req from LM example (huggingface#1491)

* Remove keep_input_mutations (huggingface#1492)

* Fix trust_remote_code (huggingface#1493)

* Upgrade ViT README with torch.compile (huggingface#1494)

* Tests for text gen output text (huggingface#1411)

* Corrected Throughput measure for GaudiDDPMPipeline (huggingface#1460)

* Fix text generation test

* Add G3 in T5-L README (huggingface#1523)

* Fix tuple object error (huggingface#1354)

* Add warmup time and compile time log for the eval/prediction.  (huggingface#1489)

* Fix style

* Enable `paligemma` model for image-to-text example (huggingface#1407)

Signed-off-by: Liu, Kaixuan <[email protected]>
Co-authored-by: regisss <[email protected]>

* Add support for MLPERF optimized pipeline from example (huggingface#1465)

Co-authored-by: sushil dubey <[email protected]>

* Enable Gemma2 Inference on Gaudi (huggingface#1504)

Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Ye, Xinyu <[email protected]>
Signed-off-by: Mengni Wang <[email protected]>
Signed-off-by: Daniel Socek <[email protected]>
Co-authored-by: billishyahao <[email protected]>
Co-authored-by: Harish Subramony <[email protected]>
Co-authored-by: Yeonsil Yoon <[email protected]>
Co-authored-by: Seunghyuk Park (shepark) <[email protected]>
Co-authored-by: regisss <[email protected]>
Co-authored-by: Sun Choi <[email protected]>
Co-authored-by: xinhe <[email protected]>
Co-authored-by: Mohit Deopujari <[email protected]>
Co-authored-by: Wang, Yi <[email protected]>
Co-authored-by: Soila Kavulya <[email protected]>
Co-authored-by: Iman Gohari <[email protected]>
Co-authored-by: ZhengHongming888 <[email protected]>
Co-authored-by: XinyuYe-Intel <[email protected]>
Co-authored-by: Vivek Goel <[email protected]>
Co-authored-by: Akihiro Takahashi <[email protected]>
Co-authored-by: Miroslav Goncharenko <[email protected]>
Co-authored-by: Wang, Mengni <[email protected]>
Co-authored-by: Daniel Socek <[email protected]>
Co-authored-by: Adam Stachowicz <[email protected]>
Co-authored-by: Vidya Galli <[email protected]>
Co-authored-by: deepak-gowda-narayana <[email protected]>

* Add check_neural_compressor_min_version for 4 bit behavior (huggingface#1500)

Signed-off-by: Xin <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
Co-authored-by: xinhe3 <[email protected]>

* Fixed Gemma FP8 flash_attention lower throughput issue (huggingface#1510)

* Pass "lazy_mode" arg to GaudiLlamaModel GaudiTrainer (huggingface#1515)

Co-authored-by: Marcin Łapiński <[email protected]>

* Removed workaround for NaN bug causing graph break. (huggingface#1516)

Co-authored-by: Marcin Łapiński <[email protected]>

* Disable default sdpa in Albert (#22) (huggingface#1517)

Co-authored-by: Urszula Golowicz <[email protected]>

* Implement fused sdpa for wav2vec2 (#18) (huggingface#1520)

* Memory optimization for gpt_bitcode (#4) (huggingface#1513)

Co-authored-by: Urszula Golowicz <[email protected]>

* text_generation: improve parameters check (huggingface#1527)

* transformers: fixed some typos (huggingface#1528)

* Update DeepSpeed CI baselines

* Update FSDP CI baseline

* Optimum-Habana docs re-org (huggingface#1488)

Signed-off-by: Daniel Socek <[email protected]>
Co-authored-by: Greg Serochi <[email protected]>
Co-authored-by: Kiangpeng Lau <[email protected]>
Co-authored-by: Seethong Vang <[email protected]>
Co-authored-by: regisss <[email protected]>
Co-authored-by: Anastasia Uvarova <[email protected]>
Co-authored-by: Mohit Deopujari <[email protected]>
Co-authored-by: Chen Levkovich <[email protected]>
Co-authored-by: Libin Tang <[email protected]>

* Makes the with_stack of the profiler changeable (huggingface#1497)

* FLUX with diffusers 0.31.0 (huggingface#1450)

Signed-off-by: Daniel Socek <[email protected]>
Co-authored-by: Baochen Yang <[email protected]>
Co-authored-by: Huijuan Zhou <[email protected]>
Co-authored-by: Sergey Plotnikov <[email protected]>
Co-authored-by: Deepak Narayana <[email protected]>
Co-authored-by: regisss <[email protected]>

* Fix some CI baselines

* Add split runners to CI (2 devices per runner for fast tests)

* Fix fast CI to work with split runners (huggingface#1534)

* Fix dtype issue with valid sequence length in torch.compile bs=1 (huggingface#1532)

* Support beam search with reuse_cache and bucket_internal (huggingface#1472)

* Add mixtral trl sft (huggingface#1349)

* Enable tiiuae/falcon-11B-vlm in image_to_text example (huggingface#1490)

Signed-off-by: Wang, Yi A <[email protected]>

* Add Llama 3.1 ft to CI (huggingface#1529)

* Migrate OH CLIP (roberta-clip) training to torch.compile (huggingface#1507)

* test_text_generation: fix non-Gaudi2 case (huggingface#1530)

* text-generation: improve output printing (huggingface#1486)

* Text-generation, model set-up: torch.compile for attributes instead of models' types (huggingface#1452)

* FLUX Fine-Tuning for Gaudi (huggingface#1482)

Signed-off-by: Daniel Socek <[email protected]>

* Enable fusedsdpa kernel for vision part of mllama (huggingface#1531)

Signed-off-by: Wang, Yi A <[email protected]>

* Minicpm enabling (huggingface#1342)

Signed-off-by: Daniel Huang <[email protected]>

* Fix bridgetower example (#312) (huggingface#1481)

* Migrate OH Wave2Vec-AC training to torch.compile - README update (huggingface#1537)

Co-authored-by: Chaojun Zhang <[email protected]>

* Flux Image-To-Image pipeline (huggingface#1524)

Signed-off-by: Daniel Socek <[email protected]>
Co-authored-by: Iman Gohari <[email protected]>

* Enable Falcon-mamba (huggingface#1480)

Signed-off-by: yuanwu <[email protected]>
Co-authored-by: regisss <[email protected]>

* Enable dynamic compile for mpi(training) (huggingface#1509)

* Migrate OH T5-large training to torch.compile (huggingface#1506)

* Add support for Baichuan2 (huggingface#1479)

Signed-off-by: Haihao Xiang <[email protected]>
Co-authored-by: Jianqian Zhou <[email protected]>
Co-authored-by: Wei Lin <[email protected]>

* trainer: fixed spelling (huggingface#1538)

* Create CI Eager/Lazy for Language Modeling (huggingface#1448)

* Fixes for llava-next test failures in 1.19 (huggingface#1535)

Co-authored-by: regisss <[email protected]>

* Enable DeepSeek-V2 (huggingface#1475)

Signed-off-by: Matrix YAO <[email protected]>

* Refactor Qwen2 Family (huggingface#1541)

* Add support for optimized SDXL pipeline (huggingface#1519)

* Make style

* Add the checkout parameters of falcon-mamba pytest (huggingface#1540)

Signed-off-by: yuanwu <[email protected]>
Co-authored-by: regisss <[email protected]>

* Avoid negative values in eval metrics (huggingface#1533)

* Fixes in unify_measurements (huggingface#1496)

Co-authored-by: yan tomsinsky <[email protected]>
Co-authored-by: Eran Geva <[email protected]>

* Fix lm_eval script for starcoder and gemma (huggingface#1463)

* Add option to use bf16 in PT sdp (#5) (huggingface#1514)

Co-authored-by: Urszula Golowicz <[email protected]>

* Fix tests.test_peft_inference failure (huggingface#1543)

Signed-off-by: Wang, Yi A <[email protected]>

* [wav2vec2] Remove tensor.item and dynamic slicing operations in the loop that cause graph break (huggingface#1508)

* Update lm_eval version (huggingface#1473)

Co-authored-by: regisss <[email protected]>

* Fix lm_eval script for starcoder and gemma (huggingface#1463)

* Add option to use bf16 in PT sdp (#5) (huggingface#1514)

Co-authored-by: Urszula Golowicz <[email protected]>

* Fix tests.test_peft_inference failure (huggingface#1543)

Signed-off-by: Wang, Yi A <[email protected]>

* Update lm_eval version (huggingface#1473)

Co-authored-by: regisss <[email protected]>

* Fix bad import in Baichuan code (huggingface#1547)

* Restore performance in generate (huggingface#1546)

Signed-off-by: Urszula Golowicz <[email protected]>
Co-authored-by: Marcin Łapiński <[email protected]>
Co-authored-by: Adam Stachowicz <[email protected]>

* Enable pyTorch-IMage-Models (TIMM) with HPUs (huggingface#1459)

Co-authored-by: regisss <[email protected]>

* Add HF login for 8x Gaudi2 CI

* Adding support for Context Parallelism using Deepseed's DistributedAttention (huggingface#1501)

Co-authored-by: regisss <[email protected]>

* Fix bad import in Baichuan code (huggingface#1547)

* Restore performance in generate (huggingface#1546)

Signed-off-by: Urszula Golowicz <[email protected]>
Co-authored-by: Marcin Łapiński <[email protected]>
Co-authored-by: Adam Stachowicz <[email protected]>

* Enable pyTorch-IMage-Models (TIMM) with HPUs (huggingface#1459)

Co-authored-by: regisss <[email protected]>

* Add HF login for 8x Gaudi2 CI

* Adding support for Context Parallelism using Deepseed's DistributedAttention (huggingface#1501)

Co-authored-by: regisss <[email protected]>

* Fix Llama CI

* Fix Llama CI

* Add DynamicMoE support for Mixtral (huggingface#1511)

Co-authored-by: Adam Stachowicz <[email protected]>

* Fix for llava models not generating text with test failures in 1.19 (huggingface#1548)

* Refactor KV cache, Rope  , reduce common code  (huggingface#1148)

Co-authored-by: regisss <[email protected]>

* Adjust Qwen2-7B test case (huggingface#1551)

* [run_lm_eval.py] Fixed too many print dump json info (huggingface#1553)

Signed-off-by: Focus Luo <[email protected]>

* Fix for single_card llama7b and falcon40b CI errors (huggingface#1549)

* Implemented fusedSDPA for stable diffusion (#36) (huggingface#1545)

Co-authored-by: Yixiu Chen <[email protected]>
Co-authored-by: Libin Tang <[email protected]>

* Apply --sdp_on_bf16 to image-to-text examples (huggingface#1557)

* Fix accuracy regression in Gemma (huggingface#1556)

* Fix FusedSDPA wrapper from TransformerEngine (huggingface#1562)

* Add DynamicMoE support for Mixtral (huggingface#1511)

Co-authored-by: Adam Stachowicz <[email protected]>

* Fix for llava models not generating text with test failures in 1.19 (huggingface#1548)

* Refactor KV cache, Rope  , reduce common code  (huggingface#1148)

Co-authored-by: regisss <[email protected]>

* Adjust Qwen2-7B test case (huggingface#1551)

* [run_lm_eval.py] Fixed too many print dump json info (huggingface#1553)

Signed-off-by: Focus Luo <[email protected]>

* Fix for single_card llama7b and falcon40b CI errors (huggingface#1549)

* Implemented fusedSDPA for stable diffusion (#36) (huggingface#1545)

Co-authored-by: Yixiu Chen <[email protected]>
Co-authored-by: Libin Tang <[email protected]>

* Apply --sdp_on_bf16 to image-to-text examples (huggingface#1557)

* Fix accuracy regression in Gemma (huggingface#1556)

* Fix FusedSDPA wrapper from TransformerEngine (huggingface#1562)

* Run albert-xxlarge-v1 CI as torch.compile mode (huggingface#1563)

* Update README commands for the models to use --sdp_on_bf16 (huggingface#1566)

* Minicpm patch (huggingface#1567)

Signed-off-by: Daniel Huang <[email protected]>

* Updated gemma_2b_it CI (huggingface#1561)

Co-authored-by: regisss <[email protected]>

* Fixed Adalora Test for OH 1.15 (huggingface#1564)

* Fixed LORACP Test for OH 1.15 (huggingface#1568)

* Run albert-xxlarge-v1 CI as torch.compile mode (huggingface#1563)

* Update README commands for the models to use --sdp_on_bf16 (huggingface#1566)

* Minicpm patch (huggingface#1567)

Signed-off-by: Daniel Huang <[email protected]>

* Updated gemma_2b_it CI (huggingface#1561)

Co-authored-by: regisss <[email protected]>

* Fixed Adalora Test for OH 1.15 (huggingface#1564)

* Fixed LORACP Test for OH 1.15 (huggingface#1568)

* Add requirements.txt

* Update the baseline for 1.18 to reflect performance in 1.19 (huggingface#1571)

* Fix prefix llama ci failure (huggingface#1570)

Signed-off-by: Wang, Yi A <[email protected]>

* fusedsdpa for stable diffusion xl (huggingface#1565)

Co-authored-by: regisss <[email protected]>

* Fix prefix llama ci failure (huggingface#1570)

Signed-off-by: Wang, Yi A <[email protected]>

* Add sdp_on_bf16 to tests,text-gen (huggingface#1559)

* Fix mllama test (huggingface#1569)

Signed-off-by: Wang, Yi A <[email protected]>

* Fix lazy_mode assignment (huggingface#1558)

Co-authored-by: Yaser Afshar <[email protected]>

* Fix mllama test (huggingface#1569)

Signed-off-by: Wang, Yi A <[email protected]>

* Fix lazy_mode assignment (huggingface#1558)

Co-authored-by: Yaser Afshar <[email protected]>

* Fix diffusers import (huggingface#1574)

* Update README commands for more models to use --sdp_on_bf16 (huggingface#1575)

Co-authored-by: Libin Tang <[email protected]>

* Generation utils update (minor) (huggingface#1468)

* style: removed tabs (huggingface#1577)

* Add chatglm (huggingface#1478)

Co-authored-by: Wei Lin <[email protected]>
Co-authored-by: Jianqian Zhou <[email protected]>
Co-authored-by: Leo Zhao <[email protected]>

* Enable num_return_sequences in beam search (huggingface#1536)

* gpt_bigcode: added internal bucketing fix (huggingface#1526)

* Update the Gaudi trainer with transformers 4.45.2 (huggingface#1398)

* Revert "add check_neural_compressor_min_version for 4 bit behavior" (huggingface#1578)

* Revert PR huggingface#1473 (huggingface#1582)

* Enable num_return_sequences in beam search (huggingface#1536)

* gpt_bigcode: added internal bucketing fix (huggingface#1526)

* Revert "add check_neural_compressor_min_version for 4 bit behavior" (huggingface#1578)

* Revert PR huggingface#1473 (huggingface#1582)

* Remove deprecated env variables

* Add sdp_on_bf16 argument to CI for run_image2text_lora_finetune and a… (huggingface#1585)

* Remove unnecessary neural compressor fix for 1.19 release (huggingface#1584)

* Make style

* Fixed spelling (huggingface#1576)

* Update docs for baichuan2 training (huggingface#1586)

* Fixed spelling (huggingface#1576)

* Update docs for baichuan2 training (huggingface#1586)

* Adjust bert and roberta targets (huggingface#1588)

* Update text-gen readme for autogptq (huggingface#1589)

* Update README to Include Information on Performance Degradation and Mitigation Options (huggingface#1555)

* Fix Accuracy Calculation Issue in GPT-NeoX (huggingface#1591)

* Readme update for llama-405B (huggingface#1587)

Co-authored-by: Mohit Sinha <[email protected]>
Co-authored-by: Seunghyuk Park (shepark) <[email protected]>
Co-authored-by: regisss <[email protected]>

* Fix Accuracy Calculation Issue in GPT-NeoX (huggingface#1591)

* Add WA flag for falcon-180b to resolve text-gen critical reset error during tests (huggingface#1590)

* Add WA flag for falcon-180b to resolve text-gen critical reset error during tests (huggingface#1590)

* Add sdp_on_bf16 option to diffusers and image/audio classicifation tests (huggingface#1592)

* Update transformers tests generation util v4.45.2 (huggingface#1441)

Co-authored-by: Gustavo <gustavo.malkomes>
Co-authored-by: Yaser Afshar <[email protected]>
Co-authored-by: regisss <[email protected]>

* Update README.md (huggingface#1595)

* Limit position embeddings in inference (huggingface#1598)

Co-authored-by: Adam Stachowicz <[email protected]>

* Verify model output is provided when check_output is enabled (huggingface#1597)

* Limit position embeddings in inference (huggingface#1598)

Co-authored-by: Adam Stachowicz <[email protected]>

* Verify model output is provided when check_output is enabled (huggingface#1597)

* Update README.md (huggingface#1595)

* Fix scikit-learn to 1.5.2 to fix f1 evaluation crash in 1.6.0 (huggingface#1596)

Signed-off-by: Wang, Yi A <[email protected]>

* Revert common KVCache not to check token_idx (huggingface#1594)

* Update language-modeling README file (huggingface#1599)

Co-authored-by: Libin Tang <[email protected]>
Co-authored-by: regisss <[email protected]>

* Update readme for audio-classification example (huggingface#1602)

* SDPA flag update - static code analysis (huggingface#1601)

* Revert common KVCache not to check token_idx (huggingface#1594)

* Remove unwanted merged changes in SD pipeline

* Revert LlamaKVCache due to memory increase (huggingface#1605)

* Check rope_scaling attr (huggingface#1609)

* skip certain tests for G1 with empty param list (huggingface#1613)

* Revert "Update transformers tests generation util v4.45.2 (huggingface#1441)" (huggingface#1614)

This reverts commit 2ba520a.

* audio classification readme update (huggingface#1604)

* fix readme cmds for clip-roberta (huggingface#1603)

* fix readme cmds for clip-roberta

* comments and cleanup

* Fix run_generation test commands for TRL out usage example (huggingface#1624)

Fix run_generation example

* Add arbitrary scales (#15) (huggingface#1625)

Co-authored-by: Linoy Buchnik <[email protected]>

* Modify Qwen2 TRL command to avoid OOM.  (huggingface#1630)

Add --use_flash_attention to avoid OOM for Qwen2

* Replace the UNET custom attention processors (huggingface#1608)

Co-authored-by: Iman Gohari <[email protected]>

* Falcon Model Support (huggingface#1612)

Co-authored-by: leopck <[email protected]>
Co-authored-by: regisss <[email protected]>

* Update sdp_on_bf16 option for ST example (huggingface#1615)

* Update save lora weights for diffusers with text_encoder_2 layers (huggingface#1626)

* Fix `save_lora_weights` in `pipeline_utils.py` (huggingface#1643)

* Refactor mixtral moe block. (huggingface#1635)

* speech-recognition: downgrade datasets version (huggingface#1646)

* add sdp_on_bf16 to controlnet (huggingface#1631)

* add sdp_on_bf16 to controlnet

* Update pipeline_controlnet.py

pass sdp_on_bf16 to controlnet_pipeline

* Update text_to_image_generation.py

* Update text_to_image_generation.py

* Quick fix for quantization/custom op list loading (huggingface#1657)

Signed-off-by: Daniel Socek <[email protected]>

* Update multi-node test dockerfile (huggingface#1662)

* Fixes on OH 1.15 pre release (huggingface#1661)

Co-authored-by: regisss <[email protected]>

* Fix distributed issue for ST Trainer (huggingface#1649)

* Fix distributed issue for timm (huggingface#1653)

Co-authored-by: regisss <[email protected]>

* Added missing parameter for llama function call (huggingface#1663)

Co-authored-by: Libin Tang <[email protected]>

* Add reuse_cache for llama3-405b measurement (huggingface#1664)

* Update EFA dockerfile to SynapseAI 1.19.0 (huggingface#1665)

Co-authored-by: Libin Tang <[email protected]>

* Fix bug for GaudiMixtralAttentionLongSequence forward (huggingface#1650)

Signed-off-by: kaixuanliu <[email protected]>

* Update to SynapseAI v1.19

* Release: v1.15.0

* Fix style

* save_model - incorrect conflict resolution

* Fix style

---------

Signed-off-by: Wang, Yi A <[email protected]>
Signed-off-by: Ye, Xinyu <[email protected]>
Signed-off-by: Mengni Wang <[email protected]>
Signed-off-by: Daniel Socek <[email protected]>
Signed-off-by: Liu, Kaixuan <[email protected]>
Signed-off-by: Xin <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: Daniel Huang <[email protected]>
Signed-off-by: yuanwu <[email protected]>
Signed-off-by: Haihao Xiang <[email protected]>
Signed-off-by: Matrix YAO <[email protected]>
Signed-off-by: Urszula Golowicz <[email protected]>
Signed-off-by: Focus Luo <[email protected]>
Signed-off-by: kaixuanliu <[email protected]>
Co-authored-by: Pramod Kumar <[email protected]>
Co-authored-by: Wang, Yi <[email protected]>
Co-authored-by: regisss <[email protected]>
Co-authored-by: Roi Tiefenbrunn <[email protected]>
Co-authored-by: Yan Tomsinsky <[email protected]>
Co-authored-by: Konrad Drozd <[email protected]>
Co-authored-by: Uri Livne <[email protected]>
Co-authored-by: Yeonsil Yoon <[email protected]>
Co-authored-by: Danny Semiat <[email protected]>
Co-authored-by: Yaser Afshar <[email protected]>
Co-authored-by: Harish Subramony <[email protected]>
Co-authored-by: Piotr Bielak <[email protected]>
Co-authored-by: Sayantan Sarkar <[email protected]>
Co-authored-by: Harish <[email protected]>
Co-authored-by: Libin Tang <[email protected]>
Co-authored-by: ZhengHongming888 <[email protected]>
Co-authored-by: Jimin Ha <[email protected]>
Co-authored-by: Seunghyuk Park (shepark) <[email protected]>
Co-authored-by: Dmitry <[email protected]>
Co-authored-by: Soila Kavulya <[email protected]>
Co-authored-by: Sun Choi <[email protected]>
Co-authored-by: xinhe <[email protected]>
Co-authored-by: Mohit Deopujari <[email protected]>
Co-authored-by: Iman Gohari <[email protected]>
Co-authored-by: XinyuYe-Intel <[email protected]>
Co-authored-by: Vivek Goel <[email protected]>
Co-authored-by: Akihiro Takahashi <[email protected]>
Co-authored-by: Miroslav Goncharenko <[email protected]>
Co-authored-by: Wang, Mengni <[email protected]>
Co-authored-by: Daniel Socek <[email protected]>
Co-authored-by: Vidya Galli <[email protected]>
Co-authored-by: deepak-gowda-narayana <[email protected]>
Co-authored-by: Supreet Singh <[email protected]>
Co-authored-by: kaixuanliu <[email protected]>
Co-authored-by: ANSHUMAN TRIPATHY <[email protected]>
Co-authored-by: sushil dubey <[email protected]>
Co-authored-by: Luca Calabria <[email protected]>
Co-authored-by: billishyahao <[email protected]>
Co-authored-by: xinhe3 <[email protected]>
Co-authored-by: KP (Edwin) Lau <[email protected]>
Co-authored-by: Marcin Łapiński <[email protected]>
Co-authored-by: Urszula Golowicz <[email protected]>
Co-authored-by: Greg Serochi <[email protected]>
Co-authored-by: Seethong Vang <[email protected]>
Co-authored-by: Anastasia Uvarova <[email protected]>
Co-authored-by: Mohit Deopujari <[email protected]>
Co-authored-by: Chen Levkovich <[email protected]>
Co-authored-by: Libin Tang <[email protected]>
Co-authored-by: ranzhejiang <[email protected]>
Co-authored-by: Baochen Yang <[email protected]>
Co-authored-by: Huijuan Zhou <[email protected]>
Co-authored-by: Sergey Plotnikov <[email protected]>
Co-authored-by: Deepak Narayana <[email protected]>
Co-authored-by: Witold Szczurek <[email protected]>
Co-authored-by: Wei Lin <[email protected]>
Co-authored-by: lkk <[email protected]>
Co-authored-by: Chaojun Zhang <[email protected]>
Co-authored-by: Daniel Huang <[email protected]>
Co-authored-by: Yuan Wu <[email protected]>
Co-authored-by: Xiang, Haihao <[email protected]>
Co-authored-by: Jianqian Zhou <[email protected]>
Co-authored-by: Wei Lin <[email protected]>
Co-authored-by: Thanaji Rao Thakkalapelli <[email protected]>
Co-authored-by: Yao Matrix <[email protected]>
Co-authored-by: yan tomsinsky <[email protected]>
Co-authored-by: Eran Geva <[email protected]>
Co-authored-by: Alexey Belyakov <[email protected]>
Co-authored-by: Bhargav <[email protected]>
Co-authored-by: Krzysztof Wiśniewski <[email protected]>
Co-authored-by: Abhilash Majumder <[email protected]>
Co-authored-by: FocusLuo <[email protected]>
Co-authored-by: Yixiu Chen <[email protected]>
Co-authored-by: Nariman Piroozan <[email protected]>
Co-authored-by: Edward Mascarenhas <[email protected]>
Co-authored-by: Shiv Kaul <[email protected]>
Co-authored-by: bmengke <[email protected]>
Co-authored-by: Leo Zhao <[email protected]>
Co-authored-by: Mohit Sinha <[email protected]>
Co-authored-by: Harshvardhan Chauhan <[email protected]>
Co-authored-by: Gustavo Malkomes <[email protected]>
Co-authored-by: Linoy Buchnik <[email protected]>
Co-authored-by: Alexey Fadeev <[email protected]>
Co-authored-by: leopck <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants