Skip to content

Commit 39aac5c

Browse files
author
bghira
committed
Documentation pt2
1 parent 578c49c commit 39aac5c

File tree

1 file changed

+37
-6
lines changed

1 file changed

+37
-6
lines changed

README.md

Lines changed: 37 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ The features implemented will eventually be shared between SD 2.1 and SDXL as mu
1313
* Legacy trainer does not implement precomputed embeds/latents
1414
* Currently, the legacy trainer is somewhat neglected. The last release pre-SDXL support should be used for SD 2.1.
1515

16+
## Tutorial
17+
18+
Please fully explore this README before embarking on [the tutorial](/TUTORIAL.md), as it contains vital information that you might need to know first.
19+
1620
## General design philosophy
1721

1822
* Just throw captioned images into a dir and the script does the rest.
@@ -23,13 +27,9 @@ The features implemented will eventually be shared between SD 2.1 and SDXL as mu
2327

2428
* VAE (latents) outputs are precomputed before training and saved to storage, so that we do not need to invoke the VAE during the forward pass.
2529
* Since SDXL has two text encoders, we precompute all of the captions into embeds and then store those as well.
26-
* Train on a 40G GPU when using lower base resolutions.
30+
* **Train on a 40G GPU** when using lower base resolutions. Sorry, but it's just not doable to train SDXL's full U-net on 24G, even with Adafactor.
2731
* EMA (Exponential moving average) weight network as an optional way to reduce model over-cooking.
2832

29-
With this script, at 1024x1024 batch size 10, we can nearly saturate a single 80G A100!
30-
31-
At 1024x1024 batch size 4, we can use a 48G A6000 GPU, which reduces the cost of multi-GPU training!
32-
3333
## Stable Diffusion 2.0 / 2.1
3434

3535
Stable Diffusion 2.1 is notoriously difficult to fine-tune. Many of the default scripts are not making the smartest choices, and result in poor-quality outputs:
@@ -38,10 +38,41 @@ Stable Diffusion 2.1 is notoriously difficult to fine-tune. Many of the default
3838
* Not using enforced zero SNR on the terminal timestep, using offset noise instead. This results in a more noisy image.
3939
* Training on only square, 768x768 images, that will result in the model losing the ability to (or at the very least, simply not improving) generalise across aspect ratios.
4040

41+
## Hardware Requirements
42+
43+
All testing of this script has been done using:
44+
45+
* A100-80G
46+
* A6000 48G
47+
* 4090 24G
48+
49+
Despite optimisations, SDXL training **will not work on a 24G GPU**, though SD 2.1 training works fantastically well there.
50+
51+
### SDXL 1.0
52+
53+
At 1024x1024 batch size 10, we can nearly saturate a single 80G A100's entire VRAM pool!
54+
55+
At 1024x1024 batch size 4, we can begin to make use of a 48G A6000 GPU, which substantially reduces the cost of multi-GPU training!
56+
57+
With a resolution reduction down to 768 pixels, you can shift requirements down to an A100-40G.
58+
59+
For further reductions, when training at a resolution of `256x256` the model can still generalise training data quite well, in addition to supporting a much higher batch size around 15 if the VRAM is present.
60+
61+
### Stable Diffusion 2.x
62+
63+
Generally, a batch size of 4-8 for aspect bucketed data at 768px base was achievable within 24G of VRAM.
64+
65+
On an A100-80G, a batch size of 15 could be reached with nearly all of the VRAM in use
66+
67+
For 1024px training, the VRAM requirement goes up substantially, but it is still doable in roughly an equivalent footprint to an _optimised_ SDXL setup.
68+
69+
Optimizations from the SDXL trainer could be ported to the legacy trainer (text embed cache, precomputed latents) to bring this down, substantially, and make 1024px training more viable on consumer kit.
70+
4171
## Scripts
4272

73+
* `ubuntu.sh` - This is a basic "installer" that makes it quick to deploy on a Vast.ai instance.
4374
* `train_sdxl.sh` - This is where the magic happens.
44-
* `training.sh` - some variables are here, but if they are, they're not meant to be tuned.
75+
* `training.sh` - This is the legacy Stable Diffusion 1.x / 2.x trainer. The last stable version was before SDXL support was introduced. 😞
4576
* `sdxl-env.sh.example` - These are the SDXL training parameters, you should copy to `sdxl-env.sh`
4677
* `sd21-env.sh.example` - These are the training parameters, copy to `env.sh`
4778

0 commit comments

Comments
 (0)