Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 2 additions & 5 deletions ppdiffusers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
**PPDiffusers**是一款支持多种模态(如文本图像跨模态、图像、语音)扩散模型(Diffusion Model)训练和推理的国产化工具箱,依托于[**PaddlePaddle**](https://www.paddlepaddle.org.cn/)框架和[**PaddleNLP**](https://github.com/PaddlePaddle/PaddleNLP)自然语言处理开发库。

## News 📢
* 🔥 **2024.10.18 发布 0.29.0 版本,新增图像生成模型[Stable Diffusion 3 (SD3)](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/text_to_image/README_sd3.md),支持DreamBooth训练及高性能推理;SD3、SDXL适配昇腾910B,提供国产计算芯片上的训推能力;DIT支持[高性能推理](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/class_conditional_image_generation/DiT/README.md#23-paddle-inference-%E9%AB%98%E6%80%A7%E8%83%BD%E6%8E%A8%E7%90%86);支持PaddleNLP 3.0 beta版本。**

* 🔥 **2024.07.15 发布 0.24.1 版本,新增[Open-Sora](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/Open-Sora),支持模型训练和推理;全面支持Paddle 3.0。**

* 🔥 **2024.04.17 发布 0.24.0 版本,支持[Sora相关技术](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/sora),支持[DiT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/class_conditional_image_generation/DiT)、[SiT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/class_conditional_image_generation/DiT#exploring-flow-and-diffusion-based-generative-models-with-scalable-interpolant-transformers-sit)、[UViT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/text_to_image_mscoco_uvit)训练推理,新增[NaViT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/navit)、[MAGVIT-v2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/video_tokenizer/magvit2)模型;
Expand All @@ -38,11 +40,6 @@ Stable Diffusion支持[BF16 O2训练](https://github.com/PaddlePaddle/PaddleMIX/
[LoRA加载升级](#加载HF-LoRA权重),支持加载SDXL的LoRA权重;
[Controlnet](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/ppdiffusers/pipelines/controlnet)升级,支持ControlNetImg2Img、ControlNetInpaint、StableDiffusionXLControlNet等。**

* 🔥 **2023.06.20 发布 0.16.1 版本,新增[T2I-Adapter](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/t2i-adapter),支持训练与推理;ControlNet升级,支持[reference only推理](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#controlnet-reference-only);新增[WebUIStableDiffusionPipeline](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#automatic1111-webui-stable-diffusion),
支持通过prompt的方式动态加载lora、textual_inversion权重;
新增[StableDiffusionHiresFixPipeline](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#stable-diffusion-with-high-resolution-fixing),支持高分辨率修复;
新增关键点控制生成任务评价指标[COCOeval](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/scripts/cocoeval_keypoints_score);
新增多种模态扩散模型Pipeline,包括视频生成([Text-to-Video-Synth](#文本视频多模)、[Text-to-Video-Zero](#文本视频多模))、音频生成([AudioLDM](#文本音频多模)、[Spectrogram Diffusion](#音频));新增文图生成模型[IF](#文本图像多模)。**



Expand Down
2 changes: 1 addition & 1 deletion ppdiffusers/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.24.1
0.29.0
7 changes: 4 additions & 3 deletions ppdiffusers/deploy/sd3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ python -m paddle.distributed.launch --gpus 0,1 text_to_image_generation-stable_d
```
## 在 NVIDIA A800-SXM4-80GB 上测试的性能如下:

| Paddle batch parallel | Paddle Single Card | PyTorch | Paddle 动态图 |
| --------------------- | ------------------ | --------- | ------------ |
| 0.86 s | 1.2 s | 1.78 s | 4.202 s |

| Paddle batch parallel | Paddle Single Card | PyTorch | TensorRT | Paddle 动态图 |
| --------------------- | ------------------ | --------- | -------- | ------------ |
| 0.86 s | 1.2 s | 1.78 s | 1.16 s | 4.202 s |​⬤
145 changes: 145 additions & 0 deletions ppdiffusers/deploy/sd3/text_to_image_generation-stable_diffusion_3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import argparse
import paddle
def parse_args():
parser = argparse.ArgumentParser(
description=" Use PaddleMIX to accelerate the Stable Diffusion3 image generation model."
)
parser.add_argument(
"--benchmark",
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
help="if set to True, measure inference performance",
)
parser.add_argument(
"--inference_optimize",
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
help="If set to True, all optimizations except Triton are enabled.",
)
parser.add_argument(
"--inference_optimize_bp",
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
help="If set to True, batch parallel is enabled in DIT and dual-GPU acceleration is used.",
)
parser.add_argument("--height", type=int, default=512, help="Height of the generated image.")
parser.add_argument("--width", type=int, default=512, help="Width of the generated image.")
parser.add_argument("--num-inference-steps", type=int, default=50, help="Number of inference steps.")
parser.add_argument("--dtype", type=str, default="float32", help="Inference data types.")

return parser.parse_args()


args = parse_args()

if args.inference_optimize:
os.environ["INFERENCE_OPTIMIZE"] = "True"
os.environ["INFERENCE_OPTIMIZE_TRITON"] = "True"
if args.inference_optimize_bp:
os.environ["INFERENCE_OPTIMIZE_BP"] = "True"
if args.dtype == "float32":
inference_dtype = paddle.float32
elif args.dtype == "float16":
inference_dtype = paddle.float16


if args.inference_optimize_bp:
from paddle.distributed import fleet
from paddle.distributed.fleet.utils import recompute
import numpy as np
import random
import paddle.distributed as dist
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
model_parallel_size = 2
data_parallel_size = 1
strategy.hybrid_configs = {
"dp_degree": data_parallel_size,
"mp_degree": model_parallel_size,
"pp_degree": 1
}
fleet.init(is_collective=True, strategy=strategy)
hcg = fleet.get_hybrid_communicate_group()
mp_id = hcg.get_model_parallel_rank()
rank_id = dist.get_rank()

import datetime
from ppdiffusers import StableDiffusion3Pipeline


pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
paddle_dtype=inference_dtype,
)

pipe.transformer = paddle.incubate.jit.inference(
pipe.transformer,
save_model_dir="./tmp/sd3",
enable_new_ir=True,
cache_static_model=True,
# V100环境下,需设置exp_enable_use_cutlass=False,
exp_enable_use_cutlass=True,
delete_pass_lists=["add_norm_fuse_pass"],
)

generator = paddle.Generator().manual_seed(42)
prompt = "A cat holding a sign that says hello world"


image = pipe(
prompt, num_inference_steps=args.num_inference_steps, width=args.width, height=args.height, generator=generator
).images[0]

if args.benchmark:
# warmup
for i in range(3):
image = pipe(
prompt,
num_inference_steps=args.num_inference_steps,
width=args.width,
height=args.height,
generator=generator,
).images[0]

repeat_times = 10
sumtime = 0.0
for i in range(repeat_times):
paddle.device.synchronize()
starttime = datetime.datetime.now()
image = pipe(
prompt,
num_inference_steps=args.num_inference_steps,
width=args.width,
height=args.height,
generator=generator,
).images[0]
paddle.device.synchronize()
endtime = datetime.datetime.now()
duringtime = endtime - starttime
duringtime = duringtime.seconds * 1000 + duringtime.microseconds / 1000.0
sumtime += duringtime
print("SD3 end to end time : ", duringtime, "ms")

print("SD3 ave end to end time : ", sumtime / repeat_times, "ms")
cuda_mem_after_used = paddle.device.cuda.max_memory_allocated() / (1024**3)
print(f"Max used CUDA memory : {cuda_mem_after_used:.3f} GiB")

if args.inference_optimize_bp:
if rank_id == 0:
image.save("text_to_image_generation-stable_diffusion_3-result.png")
else:
image.save("text_to_image_generation-stable_diffusion_3-result.png")
2 changes: 2 additions & 0 deletions ppdiffusers/examples/controlnet/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ paddlenlp>=2.7.2
opencv-python
ppdiffusers>=0.24.0
cchardet
gradio==3.16.2
basicsr==1.4.2
2 changes: 1 addition & 1 deletion ppdiffusers/examples/dreambooth/README_sd3.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ import paddle
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers", paddle_dtype=paddle.float16
)
pipeline.load_lora_weights('your-lora-checkpoint')
pipe.load_lora_weights('your-lora-checkpoint')

image = pipe("A picture of a sks dog in a bucket", num_inference_steps=25).images[0]
image.save("sks_dog_dreambooth_lora.png")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@
url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/sketch-mountains-input.png"
init_image = load_image(url).resize((512, 512))
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
images = pipe(prompt=prompt, image=init_image, strength=0.95, guidance_scale=7.5).images[0]
image = pipe(prompt=prompt, image=init_image, strength=0.95, guidance_scale=7.5).images[0]

image.save("image_to_image_text_guided_generation-stable_diffusion_3-result.png")
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -11,134 +11,13 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import argparse
import paddle
def parse_args():
parser = argparse.ArgumentParser(
description=" Use PaddleMIX to accelerate the Stable Diffusion3 image generation model."
)
parser.add_argument(
"--benchmark",
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
help="if set to True, measure inference performance",
)
parser.add_argument(
"--inference_optimize",
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
help="If set to True, all optimizations except Triton are enabled.",
)
parser.add_argument(
"--inference_optimize_bp",
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
help="If set to True, batch parallel is enabled in DIT and dual-GPU acceleration is used.",
)
parser.add_argument("--height", type=int, default=512, help="Height of the generated image.")
parser.add_argument("--width", type=int, default=512, help="Width of the generated image.")
parser.add_argument("--num-inference-steps", type=int, default=50, help="Number of inference steps.")
parser.add_argument("--dtype", type=str, default="float32", help="Inference data types.")

return parser.parse_args()


args = parse_args()

if args.inference_optimize:
os.environ["INFERENCE_OPTIMIZE"] = "True"
os.environ["INFERENCE_OPTIMIZE_TRITON"] = "True"
if args.inference_optimize_bp:
os.environ["INFERENCE_OPTIMIZE_BP"] = "True"
if args.dtype == "float32":
inference_dtype = paddle.float32
elif args.dtype == "float16":
inference_dtype = paddle.float16


if args.inference_optimize_bp:
from paddle.distributed import fleet
from paddle.distributed.fleet.utils import recompute
import numpy as np
import random
import paddle.distributed as dist
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
model_parallel_size = 2
data_parallel_size = 1
strategy.hybrid_configs = {
"dp_degree": data_parallel_size,
"mp_degree": model_parallel_size,
"pp_degree": 1
}
fleet.init(is_collective=True, strategy=strategy)
hcg = fleet.get_hybrid_communicate_group()
mp_id = hcg.get_model_parallel_rank()
rank_id = dist.get_rank()

import datetime
import paddle
from ppdiffusers import StableDiffusion3Pipeline


pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
paddle_dtype=inference_dtype,
"stabilityai/stable-diffusion-3-medium-diffusers", paddle_dtype=paddle.float16
)

pipe.transformer = paddle.incubate.jit.inference(
pipe.transformer,
save_model_dir="./tmp/sd3",
enable_new_ir=True,
cache_static_model=True,
exp_enable_use_cutlass=True,
delete_pass_lists=["add_norm_fuse_pass"],
)

generator = paddle.Generator().manual_seed(42)
prompt = "A cat holding a sign that says hello world"


image = pipe(
prompt, num_inference_steps=args.num_inference_steps, width=args.width, height=args.height, generator=generator
).images[0]

if args.benchmark:
# warmup
for i in range(3):
image = pipe(
prompt,
num_inference_steps=args.num_inference_steps,
width=args.width,
height=args.height,
generator=generator,
).images[0]

repeat_times = 10
sumtime = 0.0
for i in range(repeat_times):
paddle.device.synchronize()
starttime = datetime.datetime.now()
image = pipe(
prompt,
num_inference_steps=args.num_inference_steps,
width=args.width,
height=args.height,
generator=generator,
).images[0]
paddle.device.synchronize()
endtime = datetime.datetime.now()
duringtime = endtime - starttime
duringtime = duringtime.seconds * 1000 + duringtime.microseconds / 1000.0
sumtime += duringtime
print("SD3 end to end time : ", duringtime, "ms")

print("SD3 ave end to end time : ", sumtime / repeat_times, "ms")
cuda_mem_after_used = paddle.device.cuda.max_memory_allocated() / (1024**3)
print(f"Max used CUDA memory : {cuda_mem_after_used:.3f} GiB")

if args.inference_optimize_bp:
if rank_id == 0:
image.save("text_to_image_generation-stable_diffusion_3-result.png")
else:
image.save("text_to_image_generation-stable_diffusion_3-result.png")
image = pipe(prompt, generator=generator).images[0]
image.save("text_to_image_generation-stable_diffusion_3-result.png")
2 changes: 1 addition & 1 deletion ppdiffusers/examples/kandinsky2_2/text_to_image/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ prior_components = {"prior_" + k: v for k,v in pipe_prior.components.items()}
pipe = KandinskyV22CombinedPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", **prior_components)

prompt='A robot pokemon, 4k photo'
images = pipe(prompt=prompt, negative_prompt=negative_prompt).images
images = pipe(prompt=prompt).images
images[0]
```

Expand Down
Loading