FLUX Fine-Tuning for Gaudi by dsocek · Pull Request #1482 · huggingface/optimum-habana

dsocek · 2024-11-13T17:03:37Z

FLUX Fine-Tuning for Gaudi

Overview

FLUX.1-dev is a high-quality text-to-image model that has been attracting significant attention for its impressive image generation capabilities. In this PR, we introduce support for FLUX fine-tuning on Gaudi using the LoRA DreamBooth adapter. While the model is highly memory-intensive for training, we've developed a training script specifically for Gaudi devices that fits within HPU memory constraints and enables effective fine-tuning. The implementation works on a single Gaudi device as well as a multi-card Gaudi system (supporting both MPI and DeepSpeed training).

List of Contributions in this PR

FLUX.1-dev LoRA DreamBooth training script for Gaudi
Single-card and multi-card (MPI and DeepSpeed) support
Updated documentation with FLUX training samples
Training FLUX CI test added diffusers tests

Quality

Ran several quality tests, and FLUX fine-tuning script demonstrated good visual results. The following example was ran on Gaudi2.

Training set ("dog" dataset of 5 images from Hugging Face):

Ran fine-tuning of FLUX.1-dev model on Gaudi2 with 1000 training steps and captured checkpoints after every 250 steps. The following 5 images are ran with prompt "A photo of sks dog in a bucket" using original FLUX.1-dev model and then 4 fine-tuned checkpoints. The images show a clear progression in the dog's resemblance to the one from the training set:

Performance

Both single card and multi-card FLUX training on Gaudi2 was evaluated for performance.

Single card performance

Ran single card FLUX.1-dev training on Gaudi2 with:

python train_dreambooth_lora_flux.py \
  --pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
  --dataset="dog" \
  --prompt="a photo of sks dog" \
  --output_dir="dog_lora_flux" \
  --mixed_precision="bf16" \
  --weighting_scheme="none" \
  --resolution=1024 \
  --train_batch_size=1 \
  --learning_rate=1e-4 \
  --guidance_scale=1 \
  --report_to="tensorboard" \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --cache_latents \
  --rank=4 \
  --max_train_steps=500 \
  --seed="0" \
  --use_hpu_graphs_for_inference \
  --use_hpu_graphs_for_training \
  --gaudi_config_name="Habana/stable-diffusion"

Output:

11/11/2024 16:07:42 - INFO - __main__ - ***** Running training *****
11/11/2024 16:07:42 - INFO - __main__ -   Num examples = 5
11/11/2024 16:07:42 - INFO - __main__ -   Num batches each epoch = 5
11/11/2024 16:07:42 - INFO - __main__ -   Num Epochs = 250
11/11/2024 16:07:42 - INFO - __main__ -   Instantaneous batch size per device = 1
11/11/2024 16:07:42 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
11/11/2024 16:07:42 - INFO - __main__ -   Gradient Accumulation steps = 4
11/11/2024 16:07:42 - INFO - __main__ -   Total optimization steps = 500
Steps: 100%|█████████████████████| 500/500 [1:53:10<00:00, 11.50s/it, loss=0.165, lr=0.0001]
Model weights saved in dog_lora_flux/pytorch_lora_weights.safetensors

Multi-card performance

Ran 8 card FLUX.1-dev training on Gaudi2 using MPI with:

python ../../gaudi_spawn.py --world_size 8 --use_mpi train_dreambooth_lora_flux.py \
  --pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
  --dataset="dog" \
  --prompt="a photo of sks dog" \
  --output_dir="dog_lora_flux" \
  --mixed_precision="bf16" \
  --weighting_scheme="none" \
  --resolution=1024 \
  --train_batch_size=1 \
  --learning_rate=1e-4 \
  --guidance_scale=1 \
  --report_to="tensorboard" \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --cache_latents \
  --rank=4 \
  --max_train_steps=500 \
  --seed="0" \
  --use_hpu_graphs_for_inference \
  --use_hpu_graphs_for_training \
  --gaudi_config_name="Habana/stable-diffusion"

Output:

11/11/2024 17:10:52 - INFO - __main__ - ***** Running training *****
11/11/2024 17:10:52 - INFO - __main__ -   Num examples = 5
11/11/2024 17:10:52 - INFO - __main__ -   Num batches each epoch = 1
11/11/2024 17:10:52 - INFO - __main__ -   Num Epochs = 500
11/11/2024 17:10:52 - INFO - __main__ -   Instantaneous batch size per device = 1
11/11/2024 17:10:52 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 32
11/11/2024 17:10:52 - INFO - __main__ -   Gradient Accumulation steps = 4
11/11/2024 17:10:52 - INFO - __main__ -   Total optimization steps = 500
Steps: 100%|██████████| 500/500 [46:09<00:00,  5.13s/it, loss=0.74, lr=0.0001]
Model weights saved in dog_lora_flux/pytorch_lora_weights.safetensors

Ran 8 card FLUX.1-dev training on Gaudi2 using DeepSpeed with:

python ../../gaudi_spawn.py --world_size 8 --use_deepspeed train_dreambooth_lora_flux.py \
  --pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
  --dataset="dog" \
  --prompt="a photo of sks dog" \
  --output_dir="dog_lora_flux" \
  --mixed_precision="bf16" \
  --weighting_scheme="none" \
  --resolution=1024 \
  --train_batch_size=1 \
  --learning_rate=1e-4 \
  --guidance_scale=1 \
  --report_to="tensorboard" \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --cache_latents \
  --rank=4 \
  --max_train_steps=500 \
  --seed="0" \
  --use_hpu_graphs_for_inference \
  --use_hpu_graphs_for_training \
  --gaudi_config_name="Habana/stable-diffusion"

Output:

11/11/2024 15:27:54 - INFO - __main__ - ***** Running training *****
11/11/2024 15:27:54 - INFO - __main__ -   Num examples = 5
11/11/2024 15:27:54 - INFO - __main__ -   Num batches each epoch = 1
11/11/2024 15:27:54 - INFO - __main__ -   Num Epochs = 500
11/11/2024 15:27:54 - INFO - __main__ -   Instantaneous batch size per device = 1
11/11/2024 15:27:54 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 32
11/11/2024 15:27:54 - INFO - __main__ -   Gradient Accumulation steps = 4
11/11/2024 15:27:54 - INFO - __main__ -   Total optimization steps = 500
Steps: 100%|██████████| 500/500 [45:58<00:00,  5.11s/it, loss=0.74, lr=0.0001]
Model weights saved in dog_lora_flux/pytorch_lora_weights.safetensors

Device	Configuration	Synapse	Training Throughput [s/it]
Gaudi2	1-HPU	1.18.0-524	11.50
Gaudi2	8-HPU w/ MPI	1.18.0-524	5.13
Gaudi2	8-HPU w/ DeepSpeed	1.18.0-524	5.11

Tests

Added CI test for FLUX training PASSED ✔️

python -m pytest tests/test_diffusers.py -v -s -k "test_dreambooth_lora_flux"
...
============ 1 passed, 162 deselected, 1 warning in 18.61s =============

imangohari1

@dsocek
Hi Daniel,
Thanks for adding these. I am working on a deeper review and testing on this but added some suggestions.

I also am not sure why we have these empty files? Maybe they need to be deleted?

Empty file modified 0
[examples/stable-diffusion/training/train_dreambooth_lora_sdxl.py](https://github.com/huggingface/optimum-habana/pull/1482/files?diff=unified&w=0#diff-f70e088d9b743c02b0e00a1de796c1397b19cd509d586962a403bddafc19af3d)
100644 → 100755
Viewed
Empty file.
Empty file modified 0
[examples/stable-diffusion/training/train_text_to_image_sdxl.py](https://github.com/huggingface/optimum-habana/pull/1482/files?diff=unified&w=0#diff-7a22acb0f64524f36bb39e1c9b6b9c192abd3cef9ebc1d4b4a5e0169dcd8a416)
100644 → 100755
Viewed
Empty file.
Empty file modified 0
[examples/stable-diffusion/unconditional_image_generation.py](https://github.com/huggingface/optimum-habana/pull/1482/files?diff=unified&w=0#diff-4e3a2db778ea0b3c6724c37248c31326e50fd7eb8351a2130a91ca288c78b9fa)
100644 → 100755
Viewed
Empty file.

examples/stable-diffusion/training/README.md

imangohari1 · 2024-11-18T22:42:35Z

examples/stable-diffusion/training/README.md

+Let's first download this dataset locally:
+
+```bash
+python -c "\


should we bring this to its own simple few line script download_data_dog.py and then here have it run like python XXX?
The mix of bash and python here is confusing.

I wanted to make it cut and paste ready. TBH this is a matter of preference... I am not in favor of adding download scripts as files. We also have cat dataset as example for prior training samples. Actually now there is no python, all sample code is bash :)

I agree that it's easier to use that way, but on the other hand it makes the code snippet much less clear IMO, see for example this one: https://github.com/dsocek/optimum-habana/tree/flux-fine-tuning/examples/stable-diffusion/training#cat-toy-example
I would rather keep pure Python code snippets.

imangohari1 · 2024-11-18T22:42:50Z

examples/stable-diffusion/training/README.md


-```py
+```bash
+python -c "\


same here.
should we bring this to its own simple few line script download_data_cat.py and then here have it run like python XXX?
The mix of bash and python here is confusing.

I agree with @imangohari1, why using python -c "..." instead of simply having a Python script?
edit: I saw the reply below, continuing there

examples/stable-diffusion/training/README.md

dsocek · 2024-11-19T00:13:39Z

@dsocek Hi Daniel, Thanks for adding these. I am working on a deeper review and testing on this but added some suggestions.

I also am not sure why we have these empty files? Maybe they need to be deleted?

Empty file modified 0
[examples/stable-diffusion/training/train_dreambooth_lora_sdxl.py](https://github.com/huggingface/optimum-habana/pull/1482/files?diff=unified&w=0#diff-f70e088d9b743c02b0e00a1de796c1397b19cd509d586962a403bddafc19af3d)
100644 → 100755
Viewed
Empty file.
Empty file modified 0
[examples/stable-diffusion/training/train_text_to_image_sdxl.py](https://github.com/huggingface/optimum-habana/pull/1482/files?diff=unified&w=0#diff-7a22acb0f64524f36bb39e1c9b6b9c192abd3cef9ebc1d4b4a5e0169dcd8a416)
100644 → 100755
Viewed
Empty file.
Empty file modified 0
[examples/stable-diffusion/unconditional_image_generation.py](https://github.com/huggingface/optimum-habana/pull/1482/files?diff=unified&w=0#diff-4e3a2db778ea0b3c6724c37248c31326e50fd7eb8351a2130a91ca288c78b9fa)
100644 → 100755
Viewed
Empty file.

@imangohari1 I changed file permissions from 644 (r+w) to 755 (added x) so people can just execute in bash without having to put python .. This is better. Also half files were 644 and half were 755. Adds consistency and better UX

imangohari1

Hi Daniel,
I did more testing on this and here are some suggested changes:

There are some typos/minor-issues in README. the flux pytest does not run on G1 since it gives an OOM. I fixed them all in the attached patch. Please apply it with git am < 0001*
- 0001-fea-Added-some-fixed.-Added-G1-pytest-for-flux.patch
When using the flux fine-tuned data with text_to_image_generation.py I am seeing an error ValueError: The current scheduler class <class 'optimum.habana.diffusers.schedulers.scheduling_ddim.GaudiDDIMScheduler'>'s set_timesteps does not support custom sigmas schedules. Please check whether you are using the correct scheduler. (details below). Can you please investigate this?

testing details

fine-tuning using mpi on 8x.

python ../../gaudi_spawn.py --world_size 8 --use_mpi train_dreambooth_lora_flux.py   --pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev"   --dataset="dog"   --prompt="a photo of sks dog"   --output_dir="dog_lora_flux"   --mixed_precision="bf16"   --weighting_scheme="none"   --resolution=1024   --train_batch_size=1   --learning_rate=1e-4   --guidance_scale=1   --report_to="tensorboard"   --gradient_accumulation_steps=4   --gradient_checkpointing   --lr_scheduler="constant"   --lr_warmup_steps=0   --cache_latents   --rank=4   --max_train_steps=500   --seed="0"   --use_hpu_graphs_for_inference   --use_hpu_graphs_for_training   --gaudi_config_name="Habana/stable-diffusion"

runs fine and then

python ../text_to_image_generation.py     --model_name_or_path "black-forest-labs/FLUX.1-dev"     --lora_id dog_lora_flux     --prompts "A picture of a sks dog in a bucket"     --num_images_per_prompt 5     --batch_size 1     --image_save_dir /tmp/flux_images     --use_habana     --use_hpu_graphs     --gaudi_config Habana/stable-diffusion     --bf16

crashes with

[WARNING|pipeline_utils.py:157] 2024-11-19 20:57:41,645 >> `use_torch_autocast` is True in the given Gaudi configuration but `torch_dtype=torch.bfloat16` was given. Disabling mixed precision and continuing in bf16 only.
============================= HABANA PT BRIDGE CONFIGURATION ===========================
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH =
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG =
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
 PT_HPU_EAGER_PIPELINE_ENABLE = 1
 PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
---------------------------: System Configuration :---------------------------
Num CPU Cores : 160
CPU RAM       : 1056374404 KB
------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/devops/sgohari/tests/codes/pr-reviews/1482/optimum-habana/examples/stable-diffusion/training/../text_to_image_generation.py", line 644, in <module>
    main()
  File "/devops/sgohari/tests/codes/pr-reviews/1482/optimum-habana/examples/stable-diffusion/training/../text_to_image_generation.py", line 613, in main
    outputs = pipeline(prompt=args.prompts, **kwargs_call)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/diffusers/pipelines/flux/pipeline_flux.py", line 589, in __call__
    timesteps, num_inference_steps = retrieve_timesteps(
  File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/flux/pipeline_flux.py", line 127, in retrieve_timesteps
    raise ValueError(
ValueError: The current scheduler class <class 'optimum.habana.diffusers.schedulers.scheduling_ddim.GaudiDDIMScheduler'>'s `set_timesteps` does not support custom sigmas schedules. Please check whether you are using the correct scheduler.

dsocek · 2024-11-23T00:06:05Z

@imangohari1 thanks for minor issue find, I fixed in amend should be good now

imangohari1

LGTM! @regisss please review.

dsocek · 2024-11-27T19:54:00Z

rebased

regisss

I left a few comments.

Also, I realize that we didn't add FLUX to the table of validated diffusion models in the README and in the docs, can you do it in this PR please?

regisss · 2024-11-28T22:44:14Z

examples/stable-diffusion/training/README.md


-```py
+```bash
+python -c "\


I agree with @imangohari1, why using python -c "..." instead of simply having a Python script?
edit: I saw the reply below, continuing there

examples/stable-diffusion/training/README.md

regisss · 2024-11-28T22:48:00Z

examples/stable-diffusion/training/README.md

+Let's first download this dataset locally:
+
+```bash
+python -c "\


I agree that it's easier to use that way, but on the other hand it makes the code snippet much less clear IMO, see for example this one: https://github.com/dsocek/optimum-habana/tree/flux-fine-tuning/examples/stable-diffusion/training#cat-toy-example
I would rather keep pure Python code snippets.

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

dsocek · 2024-11-28T23:37:46Z

@regisss all fixed and added FLUX.1 to main README and docs/source/index

HuggingFaceDocBuilderDev · 2024-11-29T10:53:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

the-pikachu · 2025-02-09T15:51:01Z

Single card performance

Ran single card FLUX.1-dev training on Gaudi2 with:

python train_dreambooth_lora_flux.py \
  --pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
  --dataset="dog" \
  --prompt="a photo of sks dog" \
  --output_dir="dog_lora_flux" \
  --mixed_precision="bf16" \
  --weighting_scheme="none" \
  --resolution=1024 \
  --train_batch_size=1 \
  --learning_rate=1e-4 \
  --guidance_scale=1 \
  --report_to="tensorboard" \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --cache_latents \
  --rank=4 \
  --max_train_steps=500 \
  --seed="0" \
  --use_hpu_graphs_for_inference \
  --use_hpu_graphs_for_training \
  --gaudi_config_name="Habana/stable-diffusion"

Output:

11/11/2024 16:07:42 - INFO - __main__ - ***** Running training *****
11/11/2024 16:07:42 - INFO - __main__ -   Num examples = 5
11/11/2024 16:07:42 - INFO - __main__ -   Num batches each epoch = 5
11/11/2024 16:07:42 - INFO - __main__ -   Num Epochs = 250
11/11/2024 16:07:42 - INFO - __main__ -   Instantaneous batch size per device = 1
11/11/2024 16:07:42 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
11/11/2024 16:07:42 - INFO - __main__ -   Gradient Accumulation steps = 4
11/11/2024 16:07:42 - INFO - __main__ -   Total optimization steps = 500
Steps: 100%|█████████████████████| 500/500 [1:53:10<00:00, 11.50s/it, loss=0.165, lr=0.0001]
Model weights saved in dog_lora_flux/pytorch_lora_weights.safetensors

Hi Team,

Actually i runned the training code(train_dreambooth_flux_lora.py) provided for flux model , what exactly in the above, the same only i runned with my images (13 images) and the prompt. It was runned on Gaudi2 machine with single card. but the problem is its not capturing the images features when i inference with the trained safetensors. what is the issue here? I am unable to figure out the issue. Please suggest the way to train with my image features. I did not crop the images, all quality images only.

* Add flag to run inference with partial dataset (huggingface#1420) * Add peft generation example (huggingface#1427) * Upgrade to SynapseAI 1.18.0 (huggingface#1418) * Simplify HQT config files (huggingface#1219) * unify_measurements.py script support to unify PCQ 70B 8x (huggingface#1322) * Add misc. training args (huggingface#1346) * Add quantization config for low bs case (huggingface#1377) * Remove HQT from OHF (huggingface#1257) Co-authored-by: Adam Stachowicz <astachowicz@habana.ai> Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> Co-authored-by: Yeonsil Yoon <yyoon@habana.ai> * Load INC GPTQ checkpoint & rename params (huggingface#1364) Co-authored-by: Yaser Afshar <yaser.afshar@intel.com> Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com> Co-authored-by: Yeonsil Yoon <yyoon@habana.ai> * Enable FusedSDPA fp8 in Llama FT (huggingface#1388) Co-authored-by: Yaser Afshar <yaser.afshar@intel.com> Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com> * Valid sequence length for sdpa (huggingface#1183) Co-authored-by: Harish <hsubramony@habana.ai> Co-authored-by: Libin Tang <litang@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Multiple fixes (dynamo graph break, qwen-moe, multicard) (huggingface#1410) * datasets downgrade version to 2.21.0 (huggingface#1413) * Update ci sentence_transformer.sh (huggingface#1424) * Fix load INC load weights compile error due to Transformer 4.45 upgrade. (huggingface#1421) * Update language-modeling README.md, add trust_remote_code for flan-t5-xl (huggingface#1422) * Update unify_measurements.py support info (huggingface#1425) * GPT2 torch.compile fix (huggingface#1434) * Added missing allocate_kv_cache() call in CausalLM class (huggingface#1431) * Fix merge error and update text-to-speech readme (huggingface#1436) * Fix OOM error for code llama (huggingface#1437) * Fix error on 4bit checkpoint load with run_lm_eval on TF4.45.2 (huggingface#1439) * Fix scoped linear all-reduce for starcoder model (huggingface#1432) * Fixed recursion error in SentenceTransformer (huggingface#1428) * Fix Llama 3.1 generation (huggingface#1444) * Update text-gen README.md to add auto-gptq fork install steps (huggingface#1442) * Added gemma specific fp8 quantization file (huggingface#1445) * Remove cache folder from image data folder (huggingface#1446) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Bump dev version * Enable DeepSpeed for image-to-text example (huggingface#1455) * Fix bug when loading 4bit checkpoint quantized in INC (huggingface#1447) * Fixes 'Tokenizer does not have padding token' introduced by huggingface#1444 for Llama3.1 (huggingface#1457) * Fix facebook/hf-seamless-m4t-medium crash (huggingface#1433) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Fix bias update in scoped all reduce (huggingface#1456) * Added skip for unsuported tests for mistral/mixtral (huggingface#1462) * Update sentence transformer to v3.2.1 (huggingface#1470) * Optimized inference of Cohere model on HPU (huggingface#1329) Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * Idefics2 (huggingface#1270) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Remove deprecated Mixed precision flags (huggingface#1471) Change-Id: I1c2e2460dc2072ba7b311f239441b304694918c8 * Optimized inference of XGLM model on HPU (huggingface#1323) Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * Add mllama support (huggingface#1419) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Enable flash attention for gemma (huggingface#1454) * Readme: replace tabs with spaces (huggingface#1485) * Move fast tests to Gaudi2 (huggingface#1498) * Support loading 4 bit Qwen2 (huggingface#1476) Signed-off-by: Mengni Wang <mengni.wang@intel.com> * Add textual inversion XL for Gaudi (huggingface#868) Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> * Remove torch req from LM example (huggingface#1491) * Remove keep_input_mutations (huggingface#1492) * Fix trust_remote_code (huggingface#1493) * Upgrade ViT README with torch.compile (huggingface#1494) * Tests for text gen output text (huggingface#1411) * Corrected Throughput measure for GaudiDDPMPipeline (huggingface#1460) * Fix text generation test * Add G3 in T5-L README (huggingface#1523) * Fix tuple object error (huggingface#1354) * Add warmup time and compile time log for the eval/prediction. (huggingface#1489) * Fix style * Enable `paligemma` model for image-to-text example (huggingface#1407) Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Add support for MLPERF optimized pipeline from example (huggingface#1465) Co-authored-by: sushil dubey <sdubey@habana.ai> * Enable Gemma2 Inference on Gaudi (huggingface#1504) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> Signed-off-by: Mengni Wang <mengni.wang@intel.com> Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: billishyahao <yahao.he@intel.com> Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com> Co-authored-by: Yeonsil Yoon <yyoon@habana.ai> Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Sun Choi <schoi@habana.ai> Co-authored-by: xinhe <xin3.he@intel.com> Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai> Co-authored-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: Soila Kavulya <soila.p.kavulya@intel.com> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> Co-authored-by: ZhengHongming888 <hongming.zheng@intel.com> Co-authored-by: XinyuYe-Intel <xinyu.ye@intel.com> Co-authored-by: Vivek Goel <vgoel@habana.ai> Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com> Co-authored-by: Miroslav Goncharenko <miroslav.goncharenko@intel.com> Co-authored-by: Wang, Mengni <mengni.wang@intel.com> Co-authored-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> Co-authored-by: Vidya Galli <vidya.s.galli@intel.com> Co-authored-by: deepak-gowda-narayana <140652370+deepak-gowda-narayana@users.noreply.github.com> * Add check_neural_compressor_min_version for 4 bit behavior (huggingface#1500) Signed-off-by: Xin <xin3.he@intel.com> Signed-off-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> * Fixed Gemma FP8 flash_attention lower throughput issue (huggingface#1510) * Pass "lazy_mode" arg to GaudiLlamaModel GaudiTrainer (huggingface#1515) Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai> * Removed workaround for NaN bug causing graph break. (huggingface#1516) Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai> * Disable default sdpa in Albert (#22) (huggingface#1517) Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> * Implement fused sdpa for wav2vec2 (#18) (huggingface#1520) * Memory optimization for gpt_bitcode (#4) (huggingface#1513) Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> * text_generation: improve parameters check (huggingface#1527) * transformers: fixed some typos (huggingface#1528) * Update DeepSpeed CI baselines * Update FSDP CI baseline * Optimum-Habana docs re-org (huggingface#1488) Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Greg Serochi <greg.serochi@intel.com> Co-authored-by: Kiangpeng Lau <kiangpeng.lau@intel.com> Co-authored-by: Seethong Vang <seethong.vang@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Anastasia Uvarova <anastasia.uvarova@intel.com> Co-authored-by: Mohit Deopujari <mohit.deopujari@intel.com> Co-authored-by: Chen Levkovich <chen.levkovich@intel.com> Co-authored-by: Libin Tang <libin.tang@intel.com> * Makes the with_stack of the profiler changeable (huggingface#1497) * FLUX with diffusers 0.31.0 (huggingface#1450) Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Baochen Yang <baochen.yang@intel.com> Co-authored-by: Huijuan Zhou <huijuan.zhou@intel.com> Co-authored-by: Sergey Plotnikov <sergey.plotnikov@intel.com> Co-authored-by: Deepak Narayana <deepak.narayana@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix some CI baselines * Add split runners to CI (2 devices per runner for fast tests) * Fix fast CI to work with split runners (huggingface#1534) * Fix dtype issue with valid sequence length in torch.compile bs=1 (huggingface#1532) * Support beam search with reuse_cache and bucket_internal (huggingface#1472) * Add mixtral trl sft (huggingface#1349) * Enable tiiuae/falcon-11B-vlm in image_to_text example (huggingface#1490) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Add Llama 3.1 ft to CI (huggingface#1529) * Migrate OH CLIP (roberta-clip) training to torch.compile (huggingface#1507) * test_text_generation: fix non-Gaudi2 case (huggingface#1530) * text-generation: improve output printing (huggingface#1486) * Text-generation, model set-up: torch.compile for attributes instead of models' types (huggingface#1452) * FLUX Fine-Tuning for Gaudi (huggingface#1482) Signed-off-by: Daniel Socek <daniel.socek@intel.com> * Enable fusedsdpa kernel for vision part of mllama (huggingface#1531) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Minicpm enabling (huggingface#1342) Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Fix bridgetower example (#312) (huggingface#1481) * Migrate OH Wave2Vec-AC training to torch.compile - README update (huggingface#1537) Co-authored-by: Chaojun Zhang <chzhang@habana.ai> * Flux Image-To-Image pipeline (huggingface#1524) Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> * Enable Falcon-mamba (huggingface#1480) Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Enable dynamic compile for mpi(training) (huggingface#1509) * Migrate OH T5-large training to torch.compile (huggingface#1506) * Add support for Baichuan2 (huggingface#1479) Signed-off-by: Haihao Xiang <haihao.xiang@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Wei Lin <wei2.lin@intel.com> * trainer: fixed spelling (huggingface#1538) * Create CI Eager/Lazy for Language Modeling (huggingface#1448) * Fixes for llava-next test failures in 1.19 (huggingface#1535) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Enable DeepSeek-V2 (huggingface#1475) Signed-off-by: Matrix YAO <matrix.yao@intel.com> * Refactor Qwen2 Family (huggingface#1541) * Add support for optimized SDXL pipeline (huggingface#1519) * Make style * Add the checkout parameters of falcon-mamba pytest (huggingface#1540) Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Avoid negative values in eval metrics (huggingface#1533) * Fixes in unify_measurements (huggingface#1496) Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Eran Geva <egeva@habana.ai> * Fix lm_eval script for starcoder and gemma (huggingface#1463) * Add option to use bf16 in PT sdp (#5) (huggingface#1514) Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> * Fix tests.test_peft_inference failure (huggingface#1543) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * [wav2vec2] Remove tensor.item and dynamic slicing operations in the loop that cause graph break (huggingface#1508) * Update lm_eval version (huggingface#1473) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix lm_eval script for starcoder and gemma (huggingface#1463) * Add option to use bf16 in PT sdp (#5) (huggingface#1514) Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> * Fix tests.test_peft_inference failure (huggingface#1543) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Update lm_eval version (huggingface#1473) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix bad import in Baichuan code (huggingface#1547) * Restore performance in generate (huggingface#1546) Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com> Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai> Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> * Enable pyTorch-IMage-Models (TIMM) with HPUs (huggingface#1459) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Add HF login for 8x Gaudi2 CI * Adding support for Context Parallelism using Deepseed's DistributedAttention (huggingface#1501) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix bad import in Baichuan code (huggingface#1547) * Restore performance in generate (huggingface#1546) Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com> Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai> Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> * Enable pyTorch-IMage-Models (TIMM) with HPUs (huggingface#1459) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Add HF login for 8x Gaudi2 CI * Adding support for Context Parallelism using Deepseed's DistributedAttention (huggingface#1501) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix Llama CI * Fix Llama CI * Add DynamicMoE support for Mixtral (huggingface#1511) Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> * Fix for llava models not generating text with test failures in 1.19 (huggingface#1548) * Refactor KV cache, Rope , reduce common code (huggingface#1148) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Adjust Qwen2-7B test case (huggingface#1551) * [run_lm_eval.py] Fixed too many print dump json info (huggingface#1553) Signed-off-by: Focus Luo <focus.luo@intel.com> * Fix for single_card llama7b and falcon40b CI errors (huggingface#1549) * Implemented fusedSDPA for stable diffusion (#36) (huggingface#1545) Co-authored-by: Yixiu Chen <yixiu.chen@intel.com> Co-authored-by: Libin Tang <litang@habana.ai> * Apply --sdp_on_bf16 to image-to-text examples (huggingface#1557) * Fix accuracy regression in Gemma (huggingface#1556) * Fix FusedSDPA wrapper from TransformerEngine (huggingface#1562) * Add DynamicMoE support for Mixtral (huggingface#1511) Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> * Fix for llava models not generating text with test failures in 1.19 (huggingface#1548) * Refactor KV cache, Rope , reduce common code (huggingface#1148) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Adjust Qwen2-7B test case (huggingface#1551) * [run_lm_eval.py] Fixed too many print dump json info (huggingface#1553) Signed-off-by: Focus Luo <focus.luo@intel.com> * Fix for single_card llama7b and falcon40b CI errors (huggingface#1549) * Implemented fusedSDPA for stable diffusion (#36) (huggingface#1545) Co-authored-by: Yixiu Chen <yixiu.chen@intel.com> Co-authored-by: Libin Tang <litang@habana.ai> * Apply --sdp_on_bf16 to image-to-text examples (huggingface#1557) * Fix accuracy regression in Gemma (huggingface#1556) * Fix FusedSDPA wrapper from TransformerEngine (huggingface#1562) * Run albert-xxlarge-v1 CI as torch.compile mode (huggingface#1563) * Update README commands for the models to use --sdp_on_bf16 (huggingface#1566) * Minicpm patch (huggingface#1567) Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Updated gemma_2b_it CI (huggingface#1561) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fixed Adalora Test for OH 1.15 (huggingface#1564) * Fixed LORACP Test for OH 1.15 (huggingface#1568) * Run albert-xxlarge-v1 CI as torch.compile mode (huggingface#1563) * Update README commands for the models to use --sdp_on_bf16 (huggingface#1566) * Minicpm patch (huggingface#1567) Signed-off-by: Daniel Huang <daniel1.huang@intel.com> * Updated gemma_2b_it CI (huggingface#1561) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fixed Adalora Test for OH 1.15 (huggingface#1564) * Fixed LORACP Test for OH 1.15 (huggingface#1568) * Add requirements.txt * Update the baseline for 1.18 to reflect performance in 1.19 (huggingface#1571) * Fix prefix llama ci failure (huggingface#1570) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * fusedsdpa for stable diffusion xl (huggingface#1565) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix prefix llama ci failure (huggingface#1570) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Add sdp_on_bf16 to tests,text-gen (huggingface#1559) * Fix mllama test (huggingface#1569) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Fix lazy_mode assignment (huggingface#1558) Co-authored-by: Yaser Afshar <yaser.afshar@intel.com> * Fix mllama test (huggingface#1569) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Fix lazy_mode assignment (huggingface#1558) Co-authored-by: Yaser Afshar <yaser.afshar@intel.com> * Fix diffusers import (huggingface#1574) * Update README commands for more models to use --sdp_on_bf16 (huggingface#1575) Co-authored-by: Libin Tang <litang@habana.ai> * Generation utils update (minor) (huggingface#1468) * style: removed tabs (huggingface#1577) * Add chatglm (huggingface#1478) Co-authored-by: Wei Lin <wei2.lin@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Leo Zhao <leo.zhao@intel.com> * Enable num_return_sequences in beam search (huggingface#1536) * gpt_bigcode: added internal bucketing fix (huggingface#1526) * Update the Gaudi trainer with transformers 4.45.2 (huggingface#1398) * Revert "add check_neural_compressor_min_version for 4 bit behavior" (huggingface#1578) * Revert PR huggingface#1473 (huggingface#1582) * Enable num_return_sequences in beam search (huggingface#1536) * gpt_bigcode: added internal bucketing fix (huggingface#1526) * Revert "add check_neural_compressor_min_version for 4 bit behavior" (huggingface#1578) * Revert PR huggingface#1473 (huggingface#1582) * Remove deprecated env variables * Add sdp_on_bf16 argument to CI for run_image2text_lora_finetune and a… (huggingface#1585) * Remove unnecessary neural compressor fix for 1.19 release (huggingface#1584) * Make style * Fixed spelling (huggingface#1576) * Update docs for baichuan2 training (huggingface#1586) * Fixed spelling (huggingface#1576) * Update docs for baichuan2 training (huggingface#1586) * Adjust bert and roberta targets (huggingface#1588) * Update text-gen readme for autogptq (huggingface#1589) * Update README to Include Information on Performance Degradation and Mitigation Options (huggingface#1555) * Fix Accuracy Calculation Issue in GPT-NeoX (huggingface#1591) * Readme update for llama-405B (huggingface#1587) Co-authored-by: Mohit Sinha <msinha@habana.ai> Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix Accuracy Calculation Issue in GPT-NeoX (huggingface#1591) * Add WA flag for falcon-180b to resolve text-gen critical reset error during tests (huggingface#1590) * Add WA flag for falcon-180b to resolve text-gen critical reset error during tests (huggingface#1590) * Add sdp_on_bf16 option to diffusers and image/audio classicifation tests (huggingface#1592) * Update transformers tests generation util v4.45.2 (huggingface#1441) Co-authored-by: Gustavo <gustavo.malkomes> Co-authored-by: Yaser Afshar <yaser.afshar@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Update README.md (huggingface#1595) * Limit position embeddings in inference (huggingface#1598) Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> * Verify model output is provided when check_output is enabled (huggingface#1597) * Limit position embeddings in inference (huggingface#1598) Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com> * Verify model output is provided when check_output is enabled (huggingface#1597) * Update README.md (huggingface#1595) * Fix scikit-learn to 1.5.2 to fix f1 evaluation crash in 1.6.0 (huggingface#1596) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Revert common KVCache not to check token_idx (huggingface#1594) * Update language-modeling README file (huggingface#1599) Co-authored-by: Libin Tang <litang@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Update readme for audio-classification example (huggingface#1602) * SDPA flag update - static code analysis (huggingface#1601) * Revert common KVCache not to check token_idx (huggingface#1594) * Remove unwanted merged changes in SD pipeline * Revert LlamaKVCache due to memory increase (huggingface#1605) * Check rope_scaling attr (huggingface#1609) * skip certain tests for G1 with empty param list (huggingface#1613) * Revert "Update transformers tests generation util v4.45.2 (huggingface#1441)" (huggingface#1614) This reverts commit 2ba520a. * audio classification readme update (huggingface#1604) * fix readme cmds for clip-roberta (huggingface#1603) * fix readme cmds for clip-roberta * comments and cleanup * Fix run_generation test commands for TRL out usage example (huggingface#1624) Fix run_generation example * Add arbitrary scales (#15) (huggingface#1625) Co-authored-by: Linoy Buchnik <linoybu@gmail.com> * Modify Qwen2 TRL command to avoid OOM. (huggingface#1630) Add --use_flash_attention to avoid OOM for Qwen2 * Replace the UNET custom attention processors (huggingface#1608) Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> * Falcon Model Support (huggingface#1612) Co-authored-by: leopck <sckphoong@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Update sdp_on_bf16 option for ST example (huggingface#1615) * Update save lora weights for diffusers with text_encoder_2 layers (huggingface#1626) * Fix `save_lora_weights` in `pipeline_utils.py` (huggingface#1643) * Refactor mixtral moe block. (huggingface#1635) * speech-recognition: downgrade datasets version (huggingface#1646) * add sdp_on_bf16 to controlnet (huggingface#1631) * add sdp_on_bf16 to controlnet * Update pipeline_controlnet.py pass sdp_on_bf16 to controlnet_pipeline * Update text_to_image_generation.py * Update text_to_image_generation.py * Quick fix for quantization/custom op list loading (huggingface#1657) Signed-off-by: Daniel Socek <daniel.socek@intel.com> * Update multi-node test dockerfile (huggingface#1662) * Fixes on OH 1.15 pre release (huggingface#1661) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Fix distributed issue for ST Trainer (huggingface#1649) * Fix distributed issue for timm (huggingface#1653) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> * Added missing parameter for llama function call (huggingface#1663) Co-authored-by: Libin Tang <litang@habana.ai> * Add reuse_cache for llama3-405b measurement (huggingface#1664) * Update EFA dockerfile to SynapseAI 1.19.0 (huggingface#1665) Co-authored-by: Libin Tang <litang@habana.ai> * Fix bug for GaudiMixtralAttentionLongSequence forward (huggingface#1650) Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> * Update to SynapseAI v1.19 * Release: v1.15.0 * Fix style * save_model - incorrect conflict resolution * Fix style --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> Signed-off-by: Mengni Wang <mengni.wang@intel.com> Signed-off-by: Daniel Socek <daniel.socek@intel.com> Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> Signed-off-by: Xin <xin3.he@intel.com> Signed-off-by: xinhe3 <xinhe3@habana.ai> Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Signed-off-by: yuanwu <yuan.wu@intel.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Signed-off-by: Urszula Golowicz <urszula.golowicz@intel.com> Signed-off-by: Focus Luo <focus.luo@intel.com> Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> Co-authored-by: Pramod Kumar <144990617+pramodkumar-habanalabs@users.noreply.github.com> Co-authored-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Roi Tiefenbrunn <roi.tief97@gmail.com> Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com> Co-authored-by: Konrad Drozd <konrad.drozd@intel.com> Co-authored-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Yeonsil Yoon <yyoon@habana.ai> Co-authored-by: Danny Semiat <dsemiat@habana.ai> Co-authored-by: Yaser Afshar <yaser.afshar@intel.com> Co-authored-by: Harish Subramony <81822986+hsubramony@users.noreply.github.com> Co-authored-by: Piotr Bielak <pbielak@users.noreply.github.com> Co-authored-by: Sayantan Sarkar <supersarkar@gmail.com> Co-authored-by: Harish <hsubramony@habana.ai> Co-authored-by: Libin Tang <litang@habana.ai> Co-authored-by: ZhengHongming888 <hongming.zheng@intel.com> Co-authored-by: Jimin Ha <jha@habana.ai> Co-authored-by: Seunghyuk Park (shepark) <separk@habana.ai> Co-authored-by: Dmitry <dmitry.smertin@intel.com> Co-authored-by: Soila Kavulya <soila.p.kavulya@intel.com> Co-authored-by: Sun Choi <schoi@habana.ai> Co-authored-by: xinhe <xin3.he@intel.com> Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai> Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> Co-authored-by: XinyuYe-Intel <xinyu.ye@intel.com> Co-authored-by: Vivek Goel <vgoel@habana.ai> Co-authored-by: Akihiro Takahashi <akihiro.takahashi@intel.com> Co-authored-by: Miroslav Goncharenko <miroslav.goncharenko@intel.com> Co-authored-by: Wang, Mengni <mengni.wang@intel.com> Co-authored-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Vidya Galli <vidya.s.galli@intel.com> Co-authored-by: deepak-gowda-narayana <140652370+deepak-gowda-narayana@users.noreply.github.com> Co-authored-by: Supreet Singh <100715017+SupreetSinghPalne@users.noreply.github.com> Co-authored-by: kaixuanliu <kaixuan.liu@intel.com> Co-authored-by: ANSHUMAN TRIPATHY <a.tripathy87@gmail.com> Co-authored-by: sushil dubey <sdubey@habana.ai> Co-authored-by: Luca Calabria <luca.calabria@intel.com> Co-authored-by: billishyahao <yahao.he@intel.com> Co-authored-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: KP (Edwin) Lau <kiangpeng.lau@intel.com> Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai> Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> Co-authored-by: Greg Serochi <greg.serochi@intel.com> Co-authored-by: Seethong Vang <seethong.vang@intel.com> Co-authored-by: Anastasia Uvarova <anastasia.uvarova@intel.com> Co-authored-by: Mohit Deopujari <mohit.deopujari@intel.com> Co-authored-by: Chen Levkovich <chen.levkovich@intel.com> Co-authored-by: Libin Tang <libin.tang@intel.com> Co-authored-by: ranzhejiang <zhejiang.ran@intel.com> Co-authored-by: Baochen Yang <baochen.yang@intel.com> Co-authored-by: Huijuan Zhou <huijuan.zhou@intel.com> Co-authored-by: Sergey Plotnikov <sergey.plotnikov@intel.com> Co-authored-by: Deepak Narayana <deepak.narayana@intel.com> Co-authored-by: Witold Szczurek <152967125+wszczurekhabana@users.noreply.github.com> Co-authored-by: Wei Lin <forever871001@163.com> Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com> Co-authored-by: Chaojun Zhang <chzhang@habana.ai> Co-authored-by: Daniel Huang <daniel1.huang@intel.com> Co-authored-by: Yuan Wu <yuan.wu@intel.com> Co-authored-by: Xiang, Haihao <haihao.xiang@intel.com> Co-authored-by: Jianqian Zhou <jianqian.zhou@intel.com> Co-authored-by: Wei Lin <wei2.lin@intel.com> Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai> Co-authored-by: Yao Matrix <yaoweifeng0301@126.com> Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Eran Geva <egeva@habana.ai> Co-authored-by: Alexey Belyakov <alexey.belyakov@intel.com> Co-authored-by: Bhargav <beede@habana.ai> Co-authored-by: Krzysztof Wiśniewski <krzysztof2.wisniewski@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> Co-authored-by: FocusLuo <focus.luo@intel.com> Co-authored-by: Yixiu Chen <yixiu.chen@intel.com> Co-authored-by: Nariman Piroozan <87953329+npiroozan@users.noreply.github.com> Co-authored-by: Edward Mascarenhas <edward.mascarenhas@intel.com> Co-authored-by: Shiv Kaul <skaul@habana.ai> Co-authored-by: bmengke <mengkejiergeli.ba@intel.com> Co-authored-by: Leo Zhao <leo.zhao@intel.com> Co-authored-by: Mohit Sinha <msinha@habana.ai> Co-authored-by: Harshvardhan Chauhan <hchauhan@habana.ai> Co-authored-by: Gustavo Malkomes <gustavo.malkomes@intel.com> Co-authored-by: Linoy Buchnik <linoybu@gmail.com> Co-authored-by: Alexey Fadeev <alexey.fadeev@intel.com> Co-authored-by: leopck <sckphoong@habana.ai>

dsocek requested a review from regisss as a code owner November 13, 2024 17:03

dsocek force-pushed the flux-fine-tuning branch 4 times, most recently from 9b9feb9 to a457778 Compare November 17, 2024 16:13

imangohari1 approved these changes Nov 18, 2024

View reviewed changes

dsocek force-pushed the flux-fine-tuning branch from a457778 to 9fb1cb3 Compare November 19, 2024 00:26

imangohari1 suggested changes Nov 19, 2024

View reviewed changes

imangohari1 mentioned this pull request Nov 20, 2024

[WIP] Diffusers upgrade 0.31.0 #1499

Closed

2 tasks

dsocek force-pushed the flux-fine-tuning branch 2 times, most recently from 571d0ab to ce2fdf4 Compare November 23, 2024 00:02

dsocek force-pushed the flux-fine-tuning branch from ce2fdf4 to ebdba25 Compare November 25, 2024 19:34

imangohari1 approved these changes Nov 25, 2024

View reviewed changes

libinta added the run-test Run CI for PRs from external contributors label Nov 26, 2024

dsocek force-pushed the flux-fine-tuning branch from ebdba25 to 67f8f3e Compare November 27, 2024 19:52

regisss reviewed Nov 28, 2024

View reviewed changes

Add flux fine-tuning script for Gaudi

921199b

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

dsocek force-pushed the flux-fine-tuning branch from 67f8f3e to 921199b Compare November 28, 2024 23:34

regisss approved these changes Nov 29, 2024

View reviewed changes

regisss merged commit 0b7c336 into huggingface:main Nov 29, 2024

Liangyx2 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Jan 20, 2025

FLUX Fine-Tuning for Gaudi (huggingface#1482)

0b487de

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

Conversation

dsocek commented Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FLUX Fine-Tuning for Gaudi

Overview

List of Contributions in this PR

Quality

Performance

Single card performance

Multi-card performance

Tests

Uh oh!

imangohari1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

imangohari1 Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

dsocek Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

regisss Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

imangohari1 Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

regisss Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dsocek commented Nov 19, 2024

Uh oh!

imangohari1 left a comment

Choose a reason for hiding this comment

testing details

Uh oh!

dsocek commented Nov 23, 2024

Uh oh!

imangohari1 left a comment

Choose a reason for hiding this comment

Uh oh!

dsocek commented Nov 27, 2024

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

regisss Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

regisss Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

dsocek commented Nov 28, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Nov 29, 2024

Uh oh!

the-pikachu commented Feb 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Single card performance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dsocek commented Nov 13, 2024 •

edited

Loading

the-pikachu commented Feb 9, 2025 •

edited

Loading