You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+37-6Lines changed: 37 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,10 @@ The features implemented will eventually be shared between SD 2.1 and SDXL as mu
13
13
* Legacy trainer does not implement precomputed embeds/latents
14
14
* Currently, the legacy trainer is somewhat neglected. The last release pre-SDXL support should be used for SD 2.1.
15
15
16
+
## Tutorial
17
+
18
+
Please fully explore this README before embarking on [the tutorial](/TUTORIAL.md), as it contains vital information that you might need to know first.
19
+
16
20
## General design philosophy
17
21
18
22
* Just throw captioned images into a dir and the script does the rest.
@@ -23,13 +27,9 @@ The features implemented will eventually be shared between SD 2.1 and SDXL as mu
23
27
24
28
* VAE (latents) outputs are precomputed before training and saved to storage, so that we do not need to invoke the VAE during the forward pass.
25
29
* Since SDXL has two text encoders, we precompute all of the captions into embeds and then store those as well.
26
-
* Train on a 40G GPU when using lower base resolutions.
30
+
***Train on a 40G GPU** when using lower base resolutions. Sorry, but it's just not doable to train SDXL's full U-net on 24G, even with Adafactor.
27
31
* EMA (Exponential moving average) weight network as an optional way to reduce model over-cooking.
28
32
29
-
With this script, at 1024x1024 batch size 10, we can nearly saturate a single 80G A100!
30
-
31
-
At 1024x1024 batch size 4, we can use a 48G A6000 GPU, which reduces the cost of multi-GPU training!
32
-
33
33
## Stable Diffusion 2.0 / 2.1
34
34
35
35
Stable Diffusion 2.1 is notoriously difficult to fine-tune. Many of the default scripts are not making the smartest choices, and result in poor-quality outputs:
@@ -38,10 +38,41 @@ Stable Diffusion 2.1 is notoriously difficult to fine-tune. Many of the default
38
38
* Not using enforced zero SNR on the terminal timestep, using offset noise instead. This results in a more noisy image.
39
39
* Training on only square, 768x768 images, that will result in the model losing the ability to (or at the very least, simply not improving) generalise across aspect ratios.
40
40
41
+
## Hardware Requirements
42
+
43
+
All testing of this script has been done using:
44
+
45
+
* A100-80G
46
+
* A6000 48G
47
+
* 4090 24G
48
+
49
+
Despite optimisations, SDXL training **will not work on a 24G GPU**, though SD 2.1 training works fantastically well there.
50
+
51
+
### SDXL 1.0
52
+
53
+
At 1024x1024 batch size 10, we can nearly saturate a single 80G A100's entire VRAM pool!
54
+
55
+
At 1024x1024 batch size 4, we can begin to make use of a 48G A6000 GPU, which substantially reduces the cost of multi-GPU training!
56
+
57
+
With a resolution reduction down to 768 pixels, you can shift requirements down to an A100-40G.
58
+
59
+
For further reductions, when training at a resolution of `256x256` the model can still generalise training data quite well, in addition to supporting a much higher batch size around 15 if the VRAM is present.
60
+
61
+
### Stable Diffusion 2.x
62
+
63
+
Generally, a batch size of 4-8 for aspect bucketed data at 768px base was achievable within 24G of VRAM.
64
+
65
+
On an A100-80G, a batch size of 15 could be reached with nearly all of the VRAM in use
66
+
67
+
For 1024px training, the VRAM requirement goes up substantially, but it is still doable in roughly an equivalent footprint to an _optimised_ SDXL setup.
68
+
69
+
Optimizations from the SDXL trainer could be ported to the legacy trainer (text embed cache, precomputed latents) to bring this down, substantially, and make 1024px training more viable on consumer kit.
70
+
41
71
## Scripts
42
72
73
+
*`ubuntu.sh` - This is a basic "installer" that makes it quick to deploy on a Vast.ai instance.
43
74
*`train_sdxl.sh` - This is where the magic happens.
44
-
*`training.sh` - some variables are here, but if they are, they're not meant to be tuned.
75
+
*`training.sh` - This is the legacy Stable Diffusion 1.x / 2.x trainer. The last stable version was before SDXL support was introduced. 😞
45
76
*`sdxl-env.sh.example` - These are the SDXL training parameters, you should copy to `sdxl-env.sh`
46
77
*`sd21-env.sh.example` - These are the training parameters, copy to `env.sh`
0 commit comments