Add chatglm by mengker33 · Pull Request #1478 · huggingface/optimum-habana

mengker33 · 2024-11-12T05:10:22Z

What does this PR do?

This PR adds the chatglm model (a custom model), including chatglm2-6b, chatglm3-6b.
The inference test and pretrain example/test are also available.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

xuguangxin · 2024-11-13T07:04:02Z

@libinta @sywangyi , pls help review, thanks.

emascarenhas · 2024-11-24T06:26:37Z

@mengker33 ,
Please run "pip install -U ruff; make style" and check for errors.
Also run tests/ci/fast_tests.sh and all the slow tests related to test generation for chatglm that you added e.g., "GAUDI2_CI=1 RUN_SLOW=1 python -m pytest test_text_generation_example.py" and check no new errors.

Don't you also need to add a test for the language modeling part?

Does this PR need to be included in this release or can it wait for the next release?

phoenixdna · 2024-11-24T14:06:48Z

Hi, I am trying to use Guadi card do some inference work on Chatglm3-6b , but I continue to have with the following problem although I use the PR 1478.
The following is the script I copied from your instruction:

GLM=3 python3 run_generation.py \
--model_name_or_path /data/ZhipuAI/chatglm3-6b \
--use_hpu_graphs \
--use_kv_cache \
--do_sample \
--bf16 \
--trim_logits \
--batch_size 1 \
--max_input_tokens 1024 \
--max_new_tokens 512 \
--reuse_cache \
--use_flash_attention

however , I still got the following errors:

[WARNING|utils.py:225] 2024-11-24 21:54:40,419 >> optimum-habana v1.15.0.dev0 has been validated for SynapseAI v1.18.0 but the driver version is v1.17.0, this could lead to undefined behavior!
/root/habanalabs-venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn(
/root/habanalabs-venv/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
11/24/2024 21:54:41 - INFO - __main__ - Single-device run.
ChatGLMForConditionalGeneration has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:04<00:00,  1.73it/s]
============================= HABANA PT BRIDGE CONFIGURATION =========================== 
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH = 
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG = 
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 28
CPU RAM       : 123577836 KB
------------------------------------------------------------------------------
[WARNING|tokenization_chatglm.py:174] 2024-11-24 21:54:50,850 >> Setting eos_token is not supported, use the default one.
[WARNING|tokenization_chatglm.py:170] 2024-11-24 21:54:50,850 >> Setting pad_token is not supported, use the default one.
[WARNING|tokenization_chatglm.py:166] 2024-11-24 21:54:50,850 >> Setting unk_token is not supported, use the default one.
11/24/2024 21:54:51 - INFO - __main__ - Args: Namespace(device='hpu', model_name_or_path='/data/ZhipuAI/chatglm3-6b', bf16=True, max_new_tokens=512, max_input_tokens=1024, batch_size=1, warmup=3, n_iterations=5, local_rank=0, use_kv_cache=True, use_hpu_graphs=True, dataset_name=None, column_name=None, do_sample=True, num_beams=1, top_k=None, penalty_alpha=None, trim_logits=True, seed=27, profiling_warmup_steps=0, profiling_steps=0, profiling_record_shapes=False, prompt=None, bad_words=None, force_words=None, assistant_model=None, peft_model=None, num_return_sequences=1, token=None, model_revision='main', attn_softmax_bf16=False, output_dir=None, bucket_size=-1, bucket_internal=False, dataset_max_samples=-1, limit_hpu_graphs=False, show_graphs_count=False, reuse_cache=True, verbose_workers=False, simulate_dyn_prompt=None, reduce_recompile=False, use_flash_attention=True, flash_attention_recompute=False, flash_attention_causal_mask=False, flash_attention_fast_softmax=True, book_source=False, torch_compile=False, ignore_eos=True, temperature=1.0, top_p=1.0, const_serialization_path=None, trust_remote_code=False, parallel_strategy='none', input_embeds=False, run_partial_dataset=False, load_quantized_model_with_autogptq=False, disk_offload=False, load_quantized_model_with_inc=False, local_quantized_inc_model_path=None, quant_config='', world_size=0, global_rank=0)
11/24/2024 21:54:51 - INFO - __main__ - device: hpu, n_hpu: 0, bf16: True
11/24/2024 21:54:51 - INFO - __main__ - Model initialization took 10.588s
11/24/2024 21:54:51 - INFO - __main__ - Graph compilation...
Warming up iteration 1/3
Traceback (most recent call last):
  File "/root/jupyter/optimum-habana/examples/text-generation/run_generation.py", line 758, in <module>
    main()
  File "/root/jupyter/optimum-habana/examples/text-generation/run_generation.py", line 523, in main
    generate(None, args.reduce_recompile)
  File "/root/jupyter/optimum-habana/examples/text-generation/run_generation.py", line 494, in generate
    outputs = model.generate(
  File "/root/habanalabs-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/jupyter/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1008, in generate
    self._prepare_special_tokens(generation_config, kwargs_has_attention_mask, device=device)
  File "/root/habanalabs-venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1676, in _prepare_special_tokens
    eos_token_tensor is not None
RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure].

As I understand this PR is exactly for chatglm3-6b，but I don't understand why this happens after tried a lots of times. So please help to give some suggestion!
with regards.

mengker33 · 2024-11-26T09:21:48Z

tests/ci/fast_tests.sh

Hi, I have tried with fast_tests.sh and test_text_generation_example.py, and no errors occurred.
I also added tests for the language modeling part.

mengker33 · 2024-11-26T09:23:26Z

Hi, I am trying to use Guadi card do some inference work on Chatglm3-6b , but I continue to have with the following problem although I use the PR 1478. The following is the script I copied from your instruction:

GLM=3 python3 run_generation.py \
--model_name_or_path /data/ZhipuAI/chatglm3-6b \
--use_hpu_graphs \
--use_kv_cache \
--do_sample \
--bf16 \
--trim_logits \
--batch_size 1 \
--max_input_tokens 1024 \
--max_new_tokens 512 \
--reuse_cache \
--use_flash_attention

however , I still got the following errors:

[WARNING|utils.py:225] 2024-11-24 21:54:40,419 >> optimum-habana v1.15.0.dev0 has been validated for SynapseAI v1.18.0 but the driver version is v1.17.0, this could lead to undefined behavior!
/root/habanalabs-venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:366: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn(
/root/habanalabs-venv/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
11/24/2024 21:54:41 - INFO - __main__ - Single-device run.
ChatGLMForConditionalGeneration has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:04<00:00,  1.73it/s]
============================= HABANA PT BRIDGE CONFIGURATION =========================== 
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH = 
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG = 
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 28
CPU RAM       : 123577836 KB
------------------------------------------------------------------------------
[WARNING|tokenization_chatglm.py:174] 2024-11-24 21:54:50,850 >> Setting eos_token is not supported, use the default one.
[WARNING|tokenization_chatglm.py:170] 2024-11-24 21:54:50,850 >> Setting pad_token is not supported, use the default one.
[WARNING|tokenization_chatglm.py:166] 2024-11-24 21:54:50,850 >> Setting unk_token is not supported, use the default one.
11/24/2024 21:54:51 - INFO - __main__ - Args: Namespace(device='hpu', model_name_or_path='/data/ZhipuAI/chatglm3-6b', bf16=True, max_new_tokens=512, max_input_tokens=1024, batch_size=1, warmup=3, n_iterations=5, local_rank=0, use_kv_cache=True, use_hpu_graphs=True, dataset_name=None, column_name=None, do_sample=True, num_beams=1, top_k=None, penalty_alpha=None, trim_logits=True, seed=27, profiling_warmup_steps=0, profiling_steps=0, profiling_record_shapes=False, prompt=None, bad_words=None, force_words=None, assistant_model=None, peft_model=None, num_return_sequences=1, token=None, model_revision='main', attn_softmax_bf16=False, output_dir=None, bucket_size=-1, bucket_internal=False, dataset_max_samples=-1, limit_hpu_graphs=False, show_graphs_count=False, reuse_cache=True, verbose_workers=False, simulate_dyn_prompt=None, reduce_recompile=False, use_flash_attention=True, flash_attention_recompute=False, flash_attention_causal_mask=False, flash_attention_fast_softmax=True, book_source=False, torch_compile=False, ignore_eos=True, temperature=1.0, top_p=1.0, const_serialization_path=None, trust_remote_code=False, parallel_strategy='none', input_embeds=False, run_partial_dataset=False, load_quantized_model_with_autogptq=False, disk_offload=False, load_quantized_model_with_inc=False, local_quantized_inc_model_path=None, quant_config='', world_size=0, global_rank=0)
11/24/2024 21:54:51 - INFO - __main__ - device: hpu, n_hpu: 0, bf16: True
11/24/2024 21:54:51 - INFO - __main__ - Model initialization took 10.588s
11/24/2024 21:54:51 - INFO - __main__ - Graph compilation...
Warming up iteration 1/3
Traceback (most recent call last):
  File "/root/jupyter/optimum-habana/examples/text-generation/run_generation.py", line 758, in <module>
    main()
  File "/root/jupyter/optimum-habana/examples/text-generation/run_generation.py", line 523, in main
    generate(None, args.reduce_recompile)
  File "/root/jupyter/optimum-habana/examples/text-generation/run_generation.py", line 494, in generate
    outputs = model.generate(
  File "/root/habanalabs-venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/jupyter/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1008, in generate
    self._prepare_special_tokens(generation_config, kwargs_has_attention_mask, device=device)
  File "/root/habanalabs-venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1676, in _prepare_special_tokens
    eos_token_tensor is not None
RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure].

As I understand this PR is exactly for chatglm3-6b，but I don't understand why this happens after tried a lots of times. So please help to give some suggestion! with regards.

Hi, I didn't see any inference/pretraining error from my local test, please check if your test goes through the correct glm modeling path in optimum-habana.

optimum/habana/transformers/models/chatglm/modeling_chatglm.py

phoenixdna · 2024-11-29T13:27:11Z

Hi, I am trying to use Guadi card do some inference work on Chatglm3-6b , but I continue to have with the following problem although I use the PR 1478. The following is the script I copied from your instruction:

As I understand this PR is exactly for chatglm3-6b，but I don't understand why this happens after tried a lots of times. So please help to give some suggestion! with regards.

Hi, I didn't see any inference/pretraining error from my local test, please check if your test goes through the correct glm modeling path in optimum-habana.

Thanks for you reply, I download the chatglm3-6b from the modelscope, I don't know what's your mean by "go throught the correct glm modeling path in optimum-habana", could you kindly expain this?

mengker33 · 2024-12-02T05:36:28Z

Hi, I am trying to use Guadi card do some inference work on Chatglm3-6b , but I continue to have with the following problem although I use the PR 1478. The following is the script I copied from your instruction:

As I understand this PR is exactly for chatglm3-6b，but I don't understand why this happens after tried a lots of times. So please help to give some suggestion! with regards.

Hi, I didn't see any inference/pretraining error from my local test, please check if your test goes through the correct glm modeling path in optimum-habana.

Thanks for you reply, I download the chatglm3-6b from the modelscope, I don't know what's your mean by "go throught the correct glm modeling path in optimum-habana", could you kindly expain this?

You need to check if the model is initialized correctly by going through optimum/habana/transformers/models/chatglm/modeling_chatglm.py instead of the one in your downloaded modeling codes.

examples/language-modeling/README.md

emascarenhas · 2024-12-03T19:29:15Z

@mengker33 ,

I tried to run this test and got an error.
optimum-habana# GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py

__________________________________________________________ ERROR collecting tests/test_text_generation_example.py __________________________________________________________
tests/test_text_generation_example.py::test_text_generation_bf16_1x: in "parametrize" the number of names (5):
['model_name', 'batch_size', 'reuse_cache', 'baseline', 'check_output']
must be equal to the number of values (4):
('THUDM/glm-4-9b-chat', 1, True, 105)

mengker33 · 2024-12-04T01:07:26Z

@mengker33 ,

I tried to run this test and got an error. optimum-habana# GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py

__________________________________________________________ ERROR collecting tests/test_text_generation_example.py __________________________________________________________ tests/test_text_generation_example.py::test_text_generation_bf16_1x: in "parametrize" the number of names (5): ['model_name', 'batch_size', 'reuse_cache', 'baseline', 'check_output'] must be equal to the number of values (4): ('THUDM/glm-4-9b-chat', 1, True, 105)

I think you are using the old version of this PR, please rebase to the latest and try again, thanks!

setup.py

emascarenhas · 2024-12-05T22:59:32Z

@mengker33 ,

I think you are using the old version of this PR, please rebase to the latest and try again, thanks!

Yes. This was the case. I am able to run the examples in the readme successfully after rebasing.
This test command is giving an error.
GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_examples.py -s -v -k chatglm
E FileNotFoundError: [Errno 2] No such file or directory: '/home/optimum-habana/tests/baselines/chatglm3_6b.json'

Is that file required?

phoenixdna · 2024-12-06T02:00:10Z

Thanks for you reply, I download the chatglm3-6b from the modelscope, I don't know what's your mean by "go throught the correct glm modeling path in optimum-habana", could you kindly expain this?

You need to check if the model is initialized correctly by going through optimum/habana/transformers/models/chatglm/modeling_chatglm.py instead of the one in your downloaded modeling codes.

ok, thx for your reply and will give a try

mengker33 · 2024-12-06T02:11:29Z

GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_examples.py -s -v -k chatglm

Sorry, my bad... I had this baselines/chatglm3_6b.json file locally but forgot to push it to this PR. Really appreciate your test!

regisss

Please add this model to the table in the README and in the doc:

optimum-habana/README.md

Line 192 in 899b364

| Architecture | Training | Inference | <center>Tasks</center> |
optimum-habana/docs/source/index.mdx

Line 59 in 899b364

| Architecture | Training | Inference | Tasks |

regisss · 2024-12-08T10:46:44Z

tests/test_text_generation_example.py

+            ("THUDM/chatglm2-6b", 1, True, 150, False),
+            ("THUDM/chatglm3-6b", 1, True, 150, False),


What's the difference between ChatGLM-2 and ChatGLM-3 exactly? To know if we really need to test both

I don't think there is modeling functional difference, the only difference lies in some customized tokenizer methods' implementation. I removed the test for chatglm2.

mengker33 · 2024-12-09T02:14:20Z

Please add this model to the table in the README and in the doc:

optimum-habana/README.md

Line 192 in 899b364

| Architecture | Training | Inference | <center>Tasks</center> |

optimum-habana/docs/source/index.mdx

Line 59 in 899b364

| Architecture | Training | Inference | Tasks |

Done, thanks!

Including chatglm2-6b and chatglm3-6b. Co-authored-by: Wei Lin <wei2.lin@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Leo Zhao <leo.zhao@intel.com>

HuggingFaceDocBuilderDev · 2024-12-09T08:25:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: Wei Lin <wei2.lin@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Leo Zhao <leo.zhao@intel.com>

kaixuanliu · 2024-12-25T10:36:14Z

optimum/habana/transformers/models/chatglm/modeling_chatglm.py

+                        )
+                else:
+                    with ht.sdp_kernel(enable_recompute=flash_attention_recompute):
+                        if (q_len > 8192 or (q_len >= 6144 and bsz >= 2)) and self.training:


@mengker33 ,Hi, just curious about why is 6144 here?

Co-authored-by: Wei Lin <wei2.lin@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Leo Zhao <leo.zhao@intel.com>

* Add flag to run inference with partial dataset (huggingface#1420) * Add peft generation example (huggingface#1427) * Upgrade to SynapseAI 1.18.0 (huggingface#1418) * Simplify HQT config files (huggingface#1219) * unify_measurements.py script support to unify PCQ 70B 8x (huggingface#1322) * Add misc. training args (huggingface#1346) * Add quantization config for low bs case (huggingface#1377) * Remove HQT from OHF (huggingface#1257) Co-authored-by: Adam Stachowicz <astachowicz@habana.ai> Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> Co-authored-by: Yeonsil Yoon <yyoon@habana.ai> * Load INC GPTQ checkpoint & rename params (huggingface#1364) Co-authored-by: Yaser Afshar <yaser.afshar@intel.com> Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com> Co-authored-by: Yeonsil Yoon <yyoon@habana.ai> * Enable FusedSDPA fp8 in Llama FT (huggingface#1388) Co-authored-by: Yaser Afshar <yaser.afshar@intel.com> Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com> * Valid sequence length for sdpa (huggingface#1183) Co-authored-by: Harish <hsubramony@habana.ai> Co-authored-by: Libin Tang <litang@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Multiple fixes (dynamo graph break, qwen-moe, multicard) (huggingface#1410) * datasets downgrade version to 2.21.0 (huggingface#1413) * Update ci sentence_transformer.sh (huggingface#1424) * Fix load INC load weights compile error due to Transformer 4.45 upgrade. (huggingface#1421) * Update language-modeling README.md, add trust_remote_code for flan-t5-xl (huggingface#1422) * Update unify_measurements.py support info (huggingface#1425) * GPT2 torch.compile fix (huggingface#1434) * Added missing allocate_kv_cache() call in CausalLM class (huggingface#1431) * Fix merge error and update text-to-speech readme (huggingface#1436) * Fix OOM error for code llama (huggingface#1437) * Fix error on 4bit checkpoint load with run_lm_eval on TF4.45.2 (huggingface#1439) * Fix scoped linear all-reduce for starcoder model (huggingface#1432) * Fixed recursion error in SentenceTransformer (huggingface#1428) * Fix Llama 3.1 generation (huggingface#1444) * Update text-gen README.md to add auto-gptq fork install steps (huggingface#1442) * Added gemma specific fp8 quantization file (huggingface#1445) * Remove cache folder from image data folder (huggingface#1446) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Bump dev version * Enable DeepSpeed for image-to-text example (huggingface#1455) * Fix bug when loading 4bit checkpoint quantized in INC (huggingface#1447) * Fixes 'Tokenizer does not have padding token' introduced by huggingface#1444 for Llama3.1 (huggingface#1457) * Fix facebook/hf-seamless-m4t-medium crash (huggingface#1433) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Fix bias update in scoped all reduce (huggingface#1456) * Added skip for unsuported tests for mistral/mixtral (huggingface#1462) * Update sentence transformer to v3.2.1 (huggingface#1470) * Optimized inference of Cohere model on HPU (huggingface#1329) Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * Idefics2 (huggingface#1270) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Remove deprecated Mixed precision flags (huggingface#1471) Change-Id: I1c2e2460dc2072ba7b311f239441b304694918c8 * Optimized inference of XGLM model on HPU (huggingface#1323) Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * Add mllama support (huggingface#1419) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Enable flash attention for gemma (huggingface#1454) * Readme: replace tabs with spaces (huggingface#1485) * Move fast tests to Gaudi2 (huggingface#1498) * Support loading 4 bit Qwen2 (huggingface#1476) Signed-off-by: Mengni Wang <mengni.wang@intel.com> * Add textual inversion XL for Gaudi (huggingface#868) Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> * Remove torch req from LM example (huggingface#1491) * Remove keep_input_mutations (huggingface#1492) * Fix trust_remote_code (huggingface#1493) * Upgrade ViT README with torch.compile (huggingface#1494) * Tests for text gen output text (huggingface#1411) * Corrected Throughput measure for GaudiDDPMPipeline (huggingface#1460) * Fix text generation test * Add G3 in T5-L README (huggingface#1523) * Fix tuple object error (huggingface#1354) * Add warmup time and compile time log for the eval/prediction. (huggingface#1489) * Fix style * Enable `paligemma` model for image-to-text example (huggingface#1407) Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Add support for MLPERF optimized pipeline from example (huggingface#1465) Co-authored-by: sushil dubey <sdubey@habana.ai> * Enable Gemma2 Inference on Gaudi (huggingface#1504) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> Signed-off-by: Mengni Wang <mengni.wang@intel.com> Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: billishyahao <yahao.he@intel.com> Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com> Co-authored-by: Yeonsil Yoon <yyoon@habana.ai> Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Sun Choi <schoi@habana.ai> Co-authored-by: xinhe <xin3.he@intel.com> Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai> Co-authored-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: Soila Kavulya <soila.p.kavulya@intel.com> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> Co-authored-by: ZhengHongming888 <hongming.zheng@intel.com> Co-authored-by: XinyuYe-Intel <xinyu.ye@intel.com> Co-authored-by: Vivek Goel <vgoel@habana.ai> Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com> Co-authored-by: Miroslav Goncharenko <miroslav.goncharenko@intel.com> Co-authored-by: Wang, Mengni <mengni.wang@intel.com> Co-authored-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> Co-authored-by: Vidya Galli <vidya.s.galli@intel.com> Co-authored-by: deepak-gowda-narayana <140652370+deepak-gowda-narayana@users.noreply.github.com> * Add check_neural_compressor_min_version for 4 bit behavior (huggingface#1500) Signed-off-by: Xin <xin3.he@intel.com> Signed-off-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> * Fixed Gemma FP8 flash_attention lower throughput issue (huggingface#1510) * Pass "lazy_mode" arg to GaudiLlamaModel GaudiTrainer (huggingface#1515) Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai> * Removed workaround for NaN bug causing graph break. (huggingface#1516) Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai> * Disable default sdpa in Albert (#22) (huggingface#1517) Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> * Implement fused sdpa for wav2vec2 (#18) (huggingface#1520) * Memory optimization for gpt_bitcode (#4) (huggingface#1513) Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> * text_generation: improve parameters check (huggingface#1527) * transformers: fixed some typos (huggingface#1528) * Update DeepSpeed CI baselines * Update FSDP CI baseline * Optimum-Habana docs re-org (huggingface#1488) Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Greg Serochi <greg.serochi@intel.com> Co-authored-by: Kiangpeng Lau <kiangpeng.lau@intel.com> Co-authored-by: Seethong Vang <seethong.vang@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Anastasia Uvarova <anastasia.uvarova@intel.com> Co-authored-by: Mohit Deopujari <mohit.deopujari@intel.com> Co-authored-by: Chen Levkovich <chen.levkovich@intel.com> Co-authored-by: Libin Tang <libin.tang@intel.com> * Makes the with_stack of the profiler changeable (huggingface#1497) * FLUX with diffusers 0.31.0 (huggingface#1450) Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Baochen Yang <baochen.yang@intel.com> Co-authored-by: Huijuan Zhou <huijuan.zhou@intel.com> Co-authored-by: Sergey Plotnikov <sergey.plotnikov@intel.com> Co-authored-by: Deepak Narayana <deepak.narayana@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix some CI baselines * Add split runners to CI (2 devices per runner for fast tests) * Fix fast CI to work with split runners (huggingface#1534) * Fix dtype issue with valid sequence length in torch.compile bs=1 (huggingface#1532) * Support beam search with reuse_cache and bucket_internal (huggingface#1472) * Add mixtral trl sft (huggingface#1349) * Enable tiiuae/falcon-11B-vlm in image_to_text example (huggingface#1490) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Add Llama 3.1 ft to CI (huggingface#1529) * Migrate OH CLIP (roberta-clip) training to torch.compile (huggingface#1507) * test_text_generation: fix non-Gaudi2 case (huggingface#1530) * text-generation: improve output printing (huggingface#1486) * Text-generation, model set-up: torch.compile for attributes instead of models' types (huggingface#1452) * FLUX Fine-Tuning for Gaudi (huggingface#1482) Signed-off-by: Daniel Socek <daniel.socek@intel.com> * Enable fusedsdpa kernel for vision part of mllama (huggingface#1531) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Minicpm enabling (huggingface#1342) Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Fix bridgetower example (#312) (huggingface#1481) * Migrate OH Wave2Vec-AC training to torch.compile - README update (huggingface#1537) Co-authored-by: Chaojun Zhang <chzhang@habana.ai> * Flux Image-To-Image pipeline (huggingface#1524) Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> * Enable Falcon-mamba (huggingface#1480) Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Enable dynamic compile for mpi(training) (huggingface#1509) * Migrate OH T5-large training to torch.compile (huggingface#1506) * Add support for Baichuan2 (huggingface#1479) Signed-off-by: Haihao Xiang <haihao.xiang@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Wei Lin <wei2.lin@intel.com> * trainer: fixed spelling (huggingface#1538) * Create CI Eager/Lazy for Language Modeling (huggingface#1448) * Fixes for llava-next test failures in 1.19 (huggingface#1535) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Enable DeepSeek-V2 (huggingface#1475) Signed-off-by: Matrix YAO <matrix.yao@intel.com> * Refactor Qwen2 Family (huggingface#1541) * Add support for optimized SDXL pipeline (huggingface#1519) * Make style * Add the checkout parameters of falcon-mamba pytest (huggingface#1540) Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Avoid negative values in eval metrics (huggingface#1533) * Fixes in unify_measurements (huggingface#1496) Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Eran Geva <egeva@habana.ai> * Fix lm_eval script for starcoder and gemma (huggingface#1463) * Add option to use bf16 in PT sdp (#5) (huggingface#1514) Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> * Fix tests.test_peft_inference failure (huggingface#1543) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * [wav2vec2] Remove tensor.item and dynamic slicing operations in the loop that cause graph break (huggingface#1508) * Update lm_eval version (huggingface#1473) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix lm_eval script for starcoder and gemma (huggingface#1463) * Add option to use bf16 in PT sdp (#5) (huggingface#1514) Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> * Fix tests.test_peft_inference failure (huggingface#1543) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Update lm_eval version (huggingface#1473) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix bad import in Baichuan code (huggingface#1547) * Restore performance in generate (huggingface#1546) Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com> Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai> Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> * Enable pyTorch-IMage-Models (TIMM) with HPUs (huggingface#1459) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Add HF login for 8x Gaudi2 CI * Adding support for Context Parallelism using Deepseed's DistributedAttention (huggingface#1501) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix bad import in Baichuan code (huggingface#1547) * Restore performance in generate (huggingface#1546) Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com> Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai> Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> * Enable pyTorch-IMage-Models (TIMM) with HPUs (huggingface#1459) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Add HF login for 8x Gaudi2 CI * Adding support for Context Parallelism using Deepseed's DistributedAttention (huggingface#1501) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix Llama CI * Fix Llama CI * Add DynamicMoE support for Mixtral (huggingface#1511) Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> * Fix for llava models not generating text with test failures in 1.19 (huggingface#1548) * Refactor KV cache, Rope , reduce common code (huggingface#1148) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Adjust Qwen2-7B test case (huggingface#1551) * [run_lm_eval.py] Fixed too many print dump json info (huggingface#1553) Signed-off-by: Focus Luo <focus.luo@intel.com> * Fix for single_card llama7b and falcon40b CI errors (huggingface#1549) * Implemented fusedSDPA for stable diffusion (#36) (huggingface#1545) Co-authored-by: Yixiu Chen <yixiu.chen@intel.com> Co-authored-by: Libin Tang <litang@habana.ai> * Apply --sdp_on_bf16 to image-to-text examples (huggingface#1557) * Fix accuracy regression in Gemma (huggingface#1556) * Fix FusedSDPA wrapper from TransformerEngine (huggingface#1562) * Add DynamicMoE support for Mixtral (huggingface#1511) Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> * Fix for llava models not generating text with test failures in 1.19 (huggingface#1548) * Refactor KV cache, Rope , reduce common code (huggingface#1148) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Adjust Qwen2-7B test case (huggingface#1551) * [run_lm_eval.py] Fixed too many print dump json info (huggingface#1553) Signed-off-by: Focus Luo <focus.luo@intel.com> * Fix for single_card llama7b and falcon40b CI errors (huggingface#1549) * Implemented fusedSDPA for stable diffusion (#36) (huggingface#1545) Co-authored-by: Yixiu Chen <yixiu.chen@intel.com> Co-authored-by: Libin Tang <litang@habana.ai> * Apply --sdp_on_bf16 to image-to-text examples (huggingface#1557) * Fix accuracy regression in Gemma (huggingface#1556) * Fix FusedSDPA wrapper from TransformerEngine (huggingface#1562) * Run albert-xxlarge-v1 CI as torch.compile mode (huggingface#1563) * Update README commands for the models to use --sdp_on_bf16 (huggingface#1566) * Minicpm patch (huggingface#1567) Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Updated gemma_2b_it CI (huggingface#1561) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fixed Adalora Test for OH 1.15 (huggingface#1564) * Fixed LORACP Test for OH 1.15 (huggingface#1568) * Run albert-xxlarge-v1 CI as torch.compile mode (huggingface#1563) * Update README commands for the models to use --sdp_on_bf16 (huggingface#1566) * Minicpm patch (huggingface#1567) Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Updated gemma_2b_it CI (huggingface#1561) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fixed Adalora Test for OH 1.15 (huggingface#1564) * Fixed LORACP Test for OH 1.15 (huggingface#1568) * Add requirements.txt * Update the baseline for 1.18 to reflect performance in 1.19 (huggingface#1571) * Fix prefix llama ci failure (huggingface#1570) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * fusedsdpa for stable diffusion xl (huggingface#1565) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix prefix llama ci failure (huggingface#1570) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Add sdp_on_bf16 to tests,text-gen (huggingface#1559) * Fix mllama test (huggingface#1569) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Fix lazy_mode assignment (huggingface#1558) Co-authored-by: Yaser Afshar <yaser.afshar@intel.com> * Fix mllama test (huggingface#1569) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Fix lazy_mode assignment (huggingface#1558) Co-authored-by: Yaser Afshar <yaser.afshar@intel.com> * Fix diffusers import (huggingface#1574) * Update README commands for more models to use --sdp_on_bf16 (huggingface#1575) Co-authored-by: Libin Tang <litang@habana.ai> * Generation utils update (minor) (huggingface#1468) * style: removed tabs (huggingface#1577) * Add chatglm (huggingface#1478) Co-authored-by: Wei Lin <wei2.lin@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Leo Zhao <leo.zhao@intel.com> * Enable num_return_sequences in beam search (huggingface#1536) * gpt_bigcode: added internal bucketing fix (huggingface#1526) * Update the Gaudi trainer with transformers 4.45.2 (huggingface#1398) * Revert "add check_neural_compressor_min_version for 4 bit behavior" (huggingface#1578) * Revert PR huggingface#1473 (huggingface#1582) * Enable num_return_sequences in beam search (huggingface#1536) * gpt_bigcode: added internal bucketing fix (huggingface#1526) * Revert "add check_neural_compressor_min_version for 4 bit behavior" (huggingface#1578) * Revert PR huggingface#1473 (huggingface#1582) * Remove deprecated env variables * Add sdp_on_bf16 argument to CI for run_image2text_lora_finetune and a… (huggingface#1585) * Remove unnecessary neural compressor fix for 1.19 release (huggingface#1584) * Make style * Fixed spelling (huggingface#1576) * Update docs for baichuan2 training (huggingface#1586) * Fixed spelling (huggingface#1576) * Update docs for baichuan2 training (huggingface#1586) * Adjust bert and roberta targets (huggingface#1588) * Update text-gen readme for autogptq (huggingface#1589) * Update README to Include Information on Performance Degradation and Mitigation Options (huggingface#1555) * Fix Accuracy Calculation Issue in GPT-NeoX (huggingface#1591) * Readme update for llama-405B (huggingface#1587) Co-authored-by: Mohit Sinha <msinha@habana.ai> Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix Accuracy Calculation Issue in GPT-NeoX (huggingface#1591) * Add WA flag for falcon-180b to resolve text-gen critical reset error during tests (huggingface#1590) * Add WA flag for falcon-180b to resolve text-gen critical reset error during tests (huggingface#1590) * Add sdp_on_bf16 option to diffusers and image/audio classicifation tests (huggingface#1592) * Update transformers tests generation util v4.45.2 (huggingface#1441) Co-authored-by: Gustavo <gustavo.malkomes> Co-authored-by: Yaser Afshar <yaser.afshar@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Update README.md (huggingface#1595) * Limit position embeddings in inference (huggingface#1598) Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> * Verify model output is provided when check_output is enabled (huggingface#1597) * Limit position embeddings in inference (huggingface#1598) Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> * Verify model output is provided when check_output is enabled (huggingface#1597) * Update README.md (huggingface#1595) * Fix scikit-learn to 1.5.2 to fix f1 evaluation crash in 1.6.0 (huggingface#1596) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Revert common KVCache not to check token_idx (huggingface#1594) * Update language-modeling README file (huggingface#1599) Co-authored-by: Libin Tang <litang@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Update readme for audio-classification example (huggingface#1602) * SDPA flag update - static code analysis (huggingface#1601) * Revert common KVCache not to check token_idx (huggingface#1594) * Remove unwanted merged changes in SD pipeline * Revert LlamaKVCache due to memory increase (huggingface#1605) * Check rope_scaling attr (huggingface#1609) * skip certain tests for G1 with empty param list (huggingface#1613) * Revert "Update transformers tests generation util v4.45.2 (huggingface#1441)" (huggingface#1614) This reverts commit 2ba520a. * audio classification readme update (huggingface#1604) * fix readme cmds for clip-roberta (huggingface#1603) * fix readme cmds for clip-roberta * comments and cleanup * Fix run_generation test commands for TRL out usage example (huggingface#1624) Fix run_generation example * Add arbitrary scales (#15) (huggingface#1625) Co-authored-by: Linoy Buchnik <linoybu@gmail.com> * Modify Qwen2 TRL command to avoid OOM. (huggingface#1630) Add --use_flash_attention to avoid OOM for Qwen2 * Replace the UNET custom attention processors (huggingface#1608) Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> * Falcon Model Support (huggingface#1612) Co-authored-by: leopck <sckphoong@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Update sdp_on_bf16 option for ST example (huggingface#1615) * Update save lora weights for diffusers with text_encoder_2 layers (huggingface#1626) * Fix `save_lora_weights` in `pipeline_utils.py` (huggingface#1643) * Refactor mixtral moe block. (huggingface#1635) * speech-recognition: downgrade datasets version (huggingface#1646) * add sdp_on_bf16 to controlnet (huggingface#1631) * add sdp_on_bf16 to controlnet * Update pipeline_controlnet.py pass sdp_on_bf16 to controlnet_pipeline * Update text_to_image_generation.py * Update text_to_image_generation.py * Quick fix for quantization/custom op list loading (huggingface#1657) Signed-off-by: Daniel Socek <daniel.socek@intel.com> * Update multi-node test dockerfile (huggingface#1662) * Fixes on OH 1.15 pre release (huggingface#1661) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix distributed issue for ST Trainer (huggingface#1649) * Fix distributed issue for timm (huggingface#1653) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Added missing parameter for llama function call (huggingface#1663) Co-authored-by: Libin Tang <litang@habana.ai> * Add reuse_cache for llama3-405b measurement (huggingface#1664) * Update EFA dockerfile to SynapseAI 1.19.0 (huggingface#1665) Co-authored-by: Libin Tang <litang@habana.ai> * Fix bug for GaudiMixtralAttentionLongSequence forward (huggingface#1650) Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> * Update to SynapseAI v1.19 * Release: v1.15.0 * Fix style * save_model - incorrect conflict resolution * Fix style --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> Signed-off-by: Mengni Wang <mengni.wang@intel.com> Signed-off-by: Daniel Socek <daniel.socek@intel.com> Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> Signed-off-by: Xin <xin3.he@intel.com> Signed-off-by: xinhe3 <xinhe3@habana.ai> Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Signed-off-by: yuanwu <yuan.wu@intel.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com> Signed-off-by: Focus Luo <focus.luo@intel.com> Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> Co-authored-by: Pramod Kumar <144990617+pramodkumar-habanalabs@users.noreply.github.com> Co-authored-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Roi Tiefenbrunn <roi.tief97@gmail.com> Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com> Co-authored-by: Konrad Drozd <konrad.drozd@intel.com> Co-authored-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Yeonsil Yoon <yyoon@habana.ai> Co-authored-by: Danny Semiat <dsemiat@habana.ai> Co-authored-by: Yaser Afshar <yaser.afshar@intel.com> Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com> Co-authored-by: Piotr Bielak <pbielak@users.noreply.github.com> Co-authored-by: Sayantan Sarkar <supersarkar@gmail.com> Co-authored-by: Harish <hsubramony@habana.ai> Co-authored-by: Libin Tang <litang@habana.ai> Co-authored-by: ZhengHongming888 <hongming.zheng@intel.com> Co-authored-by: Jimin Ha <jha@habana.ai> Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai> Co-authored-by: Dmitry <dmitry.smertin@intel.com> Co-authored-by: Soila Kavulya <soila.p.kavulya@intel.com> Co-authored-by: Sun Choi <schoi@habana.ai> Co-authored-by: xinhe <xin3.he@intel.com> Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> Co-authored-by: XinyuYe-Intel <xinyu.ye@intel.com> Co-authored-by: Vivek Goel <vgoel@habana.ai> Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com> Co-authored-by: Miroslav Goncharenko <miroslav.goncharenko@intel.com> Co-authored-by: Wang, Mengni <mengni.wang@intel.com> Co-authored-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Vidya Galli <vidya.s.galli@intel.com> Co-authored-by: deepak-gowda-narayana <140652370+deepak-gowda-narayana@users.noreply.github.com> Co-authored-by: Supreet Singh <100715017+SupreetSinghPalne@users.noreply.github.com> Co-authored-by: kaixuanliu <kaixuan.liu@intel.com> Co-authored-by: ANSHUMAN TRIPATHY <a.tripathy87@gmail.com> Co-authored-by: sushil dubey <sdubey@habana.ai> Co-authored-by: Luca Calabria <luca.calabria@intel.com> Co-authored-by: billishyahao <yahao.he@intel.com> Co-authored-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: KP (Edwin) Lau <kiangpeng.lau@intel.com> Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai> Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> Co-authored-by: Greg Serochi <greg.serochi@intel.com> Co-authored-by: Seethong Vang <seethong.vang@intel.com> Co-authored-by: Anastasia Uvarova <anastasia.uvarova@intel.com> Co-authored-by: Mohit Deopujari <mohit.deopujari@intel.com> Co-authored-by: Chen Levkovich <chen.levkovich@intel.com> Co-authored-by: Libin Tang <libin.tang@intel.com> Co-authored-by: ranzhejiang <zhejiang.ran@intel.com> Co-authored-by: Baochen Yang <baochen.yang@intel.com> Co-authored-by: Huijuan Zhou <huijuan.zhou@intel.com> Co-authored-by: Sergey Plotnikov <sergey.plotnikov@intel.com> Co-authored-by: Deepak Narayana <deepak.narayana@intel.com> Co-authored-by: Witold Szczurek <152967125+wszczurekhabana@users.noreply.github.com> Co-authored-by: Wei Lin <forever871001@163.com> Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com> Co-authored-by: Chaojun Zhang <chzhang@habana.ai> Co-authored-by: Daniel Huang <daniel1.huang@intel.com> Co-authored-by: Yuan Wu <yuan.wu@intel.com> Co-authored-by: Xiang, Haihao <haihao.xiang@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Wei Lin <wei2.lin@intel.com> Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai> Co-authored-by: Yao Matrix <yaoweifeng0301@126.com> Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Eran Geva <egeva@habana.ai> Co-authored-by: Alexey Belyakov <alexey.belyakov@intel.com> Co-authored-by: Bhargav <beede@habana.ai> Co-authored-by: Krzysztof Wiśniewski <krzysztof2.wisniewski@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> Co-authored-by: FocusLuo <focus.luo@intel.com> Co-authored-by: Yixiu Chen <yixiu.chen@intel.com> Co-authored-by: Nariman Piroozan <87953329+npiroozan@users.noreply.github.com> Co-authored-by: Edward Mascarenhas <edward.mascarenhas@intel.com> Co-authored-by: Shiv Kaul <skaul@habana.ai> Co-authored-by: bmengke <mengkejiergeli.ba@intel.com> Co-authored-by: Leo Zhao <leo.zhao@intel.com> Co-authored-by: Mohit Sinha <msinha@habana.ai> Co-authored-by: Harshvardhan Chauhan <hchauhan@habana.ai> Co-authored-by: Gustavo Malkomes <gustavo.malkomes@intel.com> Co-authored-by: Linoy Buchnik <linoybu@gmail.com> Co-authored-by: Alexey Fadeev <alexey.fadeev@intel.com> Co-authored-by: leopck <sckphoong@habana.ai>

mengker33 requested review from bhargaveede, regisss, ssarkar2 and vivekgoe as code owners November 12, 2024 05:10

mengker33 force-pushed the chatglm_upstream branch from d81bab4 to 615f749 Compare November 13, 2024 02:46

xuguangxin mentioned this pull request Nov 14, 2024

Error when running chatglm3_6b: NotImplementedError: Unknown device for graph fuser #1477

Open

4 tasks

mengker33 force-pushed the chatglm_upstream branch from 615f749 to 5664f08 Compare November 26, 2024 09:20

emascarenhas suggested changes Nov 26, 2024

View reviewed changes

optimum/habana/transformers/models/chatglm/modeling_chatglm.py Outdated Show resolved Hide resolved

mengker33 force-pushed the chatglm_upstream branch from 5664f08 to 7e1c410 Compare November 27, 2024 09:20

mengker33 force-pushed the chatglm_upstream branch from 7e1c410 to 0b9f898 Compare December 2, 2024 05:14

mengker33 force-pushed the chatglm_upstream branch 2 times, most recently from 7e06281 to 6374a63 Compare December 2, 2024 07:03

emascarenhas reviewed Dec 3, 2024

View reviewed changes

examples/language-modeling/README.md Outdated Show resolved Hide resolved

mengker33 force-pushed the chatglm_upstream branch 2 times, most recently from cb23dfe to dede6fe Compare December 4, 2024 02:55

mengker33 commented Dec 4, 2024

View reviewed changes

setup.py Outdated Show resolved Hide resolved

mengker33 force-pushed the chatglm_upstream branch from dede6fe to 6fe3dfa Compare December 4, 2024 10:11

libinta added the synapse1.20 label Dec 5, 2024

mengker33 force-pushed the chatglm_upstream branch from 6fe3dfa to 61a53dd Compare December 6, 2024 02:09

mengker33 force-pushed the chatglm_upstream branch 2 times, most recently from 9fe9e0c to 90ea98b Compare December 6, 2024 02:28

regisss reviewed Dec 8, 2024

View reviewed changes

mengker33 force-pushed the chatglm_upstream branch from 90ea98b to 5715a5d Compare December 9, 2024 02:13

mengker33 and others added 3 commits December 9, 2024 06:31

Add chatglm model

2ee9563

Including chatglm2-6b and chatglm3-6b. Co-authored-by: Wei Lin <wei2.lin@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Leo Zhao <leo.zhao@intel.com>

chatglm: Add text_generation test

e3d15c9

chatglm: Add pretrain example and test

862bdfb

mengker33 force-pushed the chatglm_upstream branch from 5715a5d to 862bdfb Compare December 9, 2024 06:32

regisss approved these changes Dec 9, 2024

View reviewed changes

regisss merged commit 1bf9a9a into huggingface:main Dec 9, 2024

zzhang37 pushed a commit to zzhang37/optimum-habana that referenced this pull request Dec 9, 2024

Add chatglm (huggingface#1478)

f92097d

Co-authored-by: Wei Lin <wei2.lin@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Leo Zhao <leo.zhao@intel.com>

imangohari1 pushed a commit to imangohari1/optimum-habana that referenced this pull request Dec 10, 2024

Add chatglm (huggingface#1478)

a2afca7

Co-authored-by: Wei Lin <wei2.lin@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Leo Zhao <leo.zhao@intel.com>

kaixuanliu reviewed Dec 25, 2024

View reviewed changes

Liangyx2 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jan 20, 2025

Add chatglm (huggingface#1478)

b97359f

Co-authored-by: Wei Lin <wei2.lin@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Leo Zhao <leo.zhao@intel.com>

		("THUDM/chatglm2-6b", 1, True, 150, False),
		("THUDM/chatglm3-6b", 1, True, 150, False),

Conversation

mengker33 commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

xuguangxin commented Nov 13, 2024

Uh oh!

emascarenhas commented Nov 24, 2024

Uh oh!

phoenixdna commented Nov 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mengker33 commented Nov 26, 2024

Uh oh!

mengker33 commented Nov 26, 2024

Uh oh!

Uh oh!

phoenixdna commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mengker33 commented Dec 2, 2024

Uh oh!

Uh oh!

emascarenhas commented Dec 3, 2024

Uh oh!

mengker33 commented Dec 4, 2024

Uh oh!

Uh oh!

emascarenhas commented Dec 5, 2024

Uh oh!

phoenixdna commented Dec 6, 2024

Uh oh!

mengker33 commented Dec 6, 2024

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

regisss Dec 8, 2024

Choose a reason for hiding this comment

Uh oh!

mengker33 Dec 9, 2024

Choose a reason for hiding this comment

Uh oh!

mengker33 commented Dec 9, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Dec 9, 2024

Uh oh!

kaixuanliu Dec 25, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

mengker33 commented Nov 12, 2024 •

edited

Loading

phoenixdna commented Nov 24, 2024 •

edited

Loading

phoenixdna commented Nov 29, 2024 •

edited

Loading