Skip to content

Flux.1#1331

Closed
KimBioInfoStudio wants to merge 33 commits intohuggingface:mainfrom
KimBioInfoStudio:kim/flux
Closed

Flux.1#1331
KimBioInfoStudio wants to merge 33 commits intohuggingface:mainfrom
KimBioInfoStudio:kim/flux

Conversation

@KimBioInfoStudio
Copy link
Copy Markdown
Contributor

@KimBioInfoStudio KimBioInfoStudio commented Sep 14, 2024

What does this PR do?

adaption of diffuser.pipelines.FluxPipeline

Env:

IMG="vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest"
docker run -dit --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host --name flux  ${IMG} /bin/bash
docker exec -it flux python -m pip install git+https://github.com/kimbioinfostudio/optimum-habana.git@kim/flux
docker exec -it flux python -m pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.17.0
docker exec -w /root -it flux bash -c "git clone -b kim/flux https://github.com/kimbioinfostudio/optimum-habana.git"
docker exec -w /root/optimum-habana/examples/stable-diffusion -it flux bash 

Performance:

Device Mode Steps FPS
G2H Eagar 28 0.399
G2H Eagar 4 2.121
G2H Lazy 28 0.002
G2H Graph 28 0.086
G2H Graph 4 0.587

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@KimBioInfoStudio KimBioInfoStudio marked this pull request as draft September 14, 2024 03:20
@KimBioInfoStudio
Copy link
Copy Markdown
Contributor Author

KimBioInfoStudio commented Sep 18, 2024

lazy mode w/o graph

python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 28 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --gaudi_config Habana/stable-diffusion \
    --bf16

got output as following:

[INFO|pipeline_flux.py:339] 2024-09-27 07:12:01,106 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 07:12:01,106 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [06:50<00:00, 14.85s/it][INFO|pipeline_flux.py:416] 2024-09-27 07:19:06,461 >> Speed metrics: {'generation_runtime': 425.355, 'generation_samples_per_second': 0.002, 'generation_steps_per_second': 0.067}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [07:05<00:00, 15.19s/it]
09/27/2024 07:19:19 - INFO - __main__ - Saving images in /tmp/flux_1_images...

output image:
flux_image_1

@KimBioInfoStudio KimBioInfoStudio marked this pull request as ready for review September 18, 2024 08:59
@KimBioInfoStudio KimBioInfoStudio changed the title Flux Flux。1 Sep 18, 2024
@KimBioInfoStudio KimBioInfoStudio changed the title Flux。1 Flux.1 Sep 18, 2024
baocheny and others added 9 commits September 23, 2024 14:03
@KimBioInfoStudio
Copy link
Copy Markdown
Contributor Author

KimBioInfoStudio commented Sep 27, 2024

graph mode:

python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 28 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

output:

[INFO|pipeline_flux.py:339] 2024-09-27 06:18:43,177 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 06:18:43,177 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:35<00:00,  9.50it/s][INFO|pipeline_flux.py:416] 2024-09-27 06:19:27,857 >> Speed metrics: {'generation_runtime': 44.6799, 'generation_samples_per_second': 0.086, 'generation_steps_per_second': 2.413}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:44<00:00,  1.60s/it]
09/27/2024 06:19:40 - INFO - __main__ - Saving images in /tmp/flux_1_images...
python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 4 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

output:

[INFO|pipeline_flux.py:339] 2024-09-27 06:14:42,741 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 06:14:42,741 >> The first two iterations are slower so it is recommended to feed more batches.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:33<00:00,  6.27s/it][INFO|pipeline_flux.py:416] 2024-09-27 06:15:16,976 >> Speed metrics: {'generation_runtime': 34.2343, 'generation_samples_per_second': 0.587, 'generation_steps_per_second': 2.35}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:34<00:00,  8.56s/it]
09/27/2024 06:15:29 - INFO - __main__ - Saving images in /tmp/flux_1_images...

@KimBioInfoStudio
Copy link
Copy Markdown
Contributor Author

eager:

PT_HPU_LAZY_MODE=0 \
python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 28 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --gaudi_config Habana/stable-diffusion \
    --bf16

output:

[INFO|pipeline_flux.py:339] 2024-09-27 07:27:16,601 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 07:27:16,601 >> The first two iterations are slower so it is recommended to feed more batches.
  4%|██████▏                                                                                                                                                                      | 1/28 [00:01<00:41,  1.53s/it]09/27/2024 07:27:18 - WARNING - habana_frameworks.torch.utils.internal - Calling mark_step function does not have any effect. It's lazy mode only functionality. (warning logged once)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:03<00:00, 11.58it/s][INFO|pipeline_flux.py:416] 2024-09-27 07:27:20,589 >> Speed metrics: {'generation_runtime': 3.9884, 'generation_samples_per_second': 0.399, 'generation_steps_per_second': 11.162}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:03<00:00,  7.02it/s]
09/27/2024 07:27:59 - INFO - __main__ - Saving images in /tmp/flux_1_images...
PT_HPU_LAZY_MODE=0 \
python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 4 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --gaudi_config Habana/stable-diffusion \
    --bf16
[INFO|pipeline_flux.py:339] 2024-09-27 07:29:50,265 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 07:29:50,265 >> The first two iterations are slower so it is recommended to feed more batches.
 25%|███████████████████████████████████████████▌                                                                                                                                  | 1/4 [00:01<00:04,  1.53s/it]09/27/2024 07:29:51 - WARNING - habana_frameworks.torch.utils.internal - Calling mark_step function does not have any effect. It's lazy mode only functionality. (warning logged once)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.94it/s][INFO|pipeline_flux.py:416] 2024-09-27 07:29:52,107 >> Speed metrics: {'generation_runtime': 1.8415, 'generation_samples_per_second': 2.121, 'generation_steps_per_second': 8.482}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.17it/s]
09/27/2024 07:29:56 - INFO - __main__ - Saving images in /tmp/flux_1_images...

dsocek and others added 4 commits September 27, 2024 18:22
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
@huijuanzh
Copy link
Copy Markdown
Contributor

@regisss please help to review this PR.
test under diffusers 0.31.0.dev0
4 inference steps:
Nvidia A800 Throughput(BF16):1.24 it/s
Eager Gaudi2 Throughput(BF16):8.484 it/s
Graph Gaudi2 Throughtput(BF16):2.348 it/s

28 inference steps:
Nvidia A800 Throughput(BF16):1.71 it/s
Eager Gaudi2 Throughput(BF16):11.172 it/s
Graph Gaudi2 Throughtput(BF16):2.408 it/s

Copy link
Copy Markdown
Contributor

@ssarkar2 ssarkar2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please delete measure_all_500, measure_all etc. binary files like npz needn't be uploaded

@KimBioInfoStudio
Copy link
Copy Markdown
Contributor Author

Performance With Batching Enabled:

Device Mode Prompts Image Per Prompts BS Steps FPS
G2H Graph 1 4 4 28 0.113
G2H Graph 5 1 5 28 0.113

@KimBioInfoStudio
Copy link
Copy Markdown
Contributor Author

please delete measure_all_500, measure_all etc. binary files like npz needn't be uploaded

@ssarkar2 removed, pls review again

keep text_ids latent_image_ids split for diffuser 0.30.x
@imangohari1
Copy link
Copy Markdown
Contributor

This work is inlcuded in #1450
We should close this PR and merge the necessary changes via #1450.

@hsubramony @libinta @regisss
could any of you please close this PR? Thanks.

@hsubramony hsubramony closed this Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants