Skip to content

Commit 03e82fc

Browse files
author
bghira
committed
Update env example to fix terminal SNR parameters plus reorganise it to make the top more relevant to users
1 parent 39aac5c commit 03e82fc

File tree

2 files changed

+72
-55
lines changed

2 files changed

+72
-55
lines changed

TUTORIAL.md

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,19 +39,34 @@ A publicly-available dataset is available [on huggingface hub](https://huggingfa
3939

4040
Approximately 162GB of images are available in the `split_train` directory, although this format is not required by SimpleTuner.
4141

42-
### Batch size impacts aspect bucketing
42+
You can simply create a single folder full of jumbled-up images, or they can be neatly organised into subdirectories.
4343

44-
Your maximum batch size is a function of your available VRAM and image resolution.
44+
**Here are some important guidelines:**
45+
46+
### Training batch size
47+
48+
Your maximum batch size is a function of your available VRAM and image resolution:
49+
50+
```
51+
vram use = batch size * resolution + base_requirements
52+
```
53+
54+
To reduce VRAM use, you can reduce batch size or resolution, but the base requirements will always bite us in the ass. SDXL is a **huge** model.
55+
56+
To summarise:
4557

4658
- You want as high of a batch size as you can tolerate.
4759
- The larger you set `RESOLUTION`, the more VRAM is used, and the lower your batch size can be.
4860
- A larger batch size requires more training data in each bucket, since each one **must** contain a minimum of that many images.
61+
- If you can't get a single iteration done with batch size of 1 and resolution of 128x128 on Adafactor or AdamW8Bit, your hardware just won't work.
4962

50-
Consequently, this means you should use as much high quality training data as you can acquire.
63+
Which brings up the next point: **you should use as much high quality training data as you can acquire.**
5164

5265
### Selecting images
5366

54-
- JPEG artifacts and blurry images are a no-go. If you're trying to extract frames from a movie to train from, you're going to have a bad time as the compression ruins most of it - only the excessively large releases in the 40+ GB range are really going to be useful for improving image clarity.
67+
- JPEG artifacts and blurry images are a no-go. The model **will** pick these up.
68+
- Same goes for watermarks and "badges", artist signatures. That will all be picked up effortlessly.
69+
- If you're trying to extract frames from a movie to train from, you're going to have a bad time. Compression ruins most films - only the large 40+ GB releases are really going to be useful for improving image clarity.
5570
- Image resolutions optimally should be divisible by 64.
5671
- This isn't **required**, but is beneficial to follow.
5772
- Square images are not required, though they will work.

sdxl-env.sh.example

Lines changed: 53 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,110 +1,112 @@
1-
# Reproducible training.
2-
export TRAINING_SEED=420420420
1+
# Configure these values.
32

43
# Restart where we left off. Change this to "checkpoint-1234" to start from a specific checkpoint.
54
export RESUME_CHECKPOINT="latest"
65

76
# How often to checkpoint. Depending on your learning rate, you may wish to change this.
8-
97
# For the default settings with 10 gradient accumulations, more frequent checkpoints might be preferable at first.
108
export CHECKPOINTING_STEPS=150
119
# This is how many checkpoints we will keep. Two is safe, but three is safer.
1210
export CHECKPOINTING_LIMIT=2
1311

12+
# This is decided as a relatively conservative 'constant' learning rate.
13+
# Adjust higher or lower depending on how burnt your model becomes.
1414
export LEARNING_RATE=8e-7 #@param {type:"number"}
1515

16-
# Configure these values.
1716
# Using a Huggingface Hub model:
1817
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
1918
# Using a local path to a huggingface hub model or saved checkpoint:
2019
#export MODEL_NAME="/datasets/models/pipeline"
2120

21+
# Make DEBUG_EXTRA_ARGS empty to disable wandb.
22+
export DEBUG_EXTRA_ARGS="--report_to=wandb"
2223
export TRACKER_PROJECT_NAME="sdxl-training"
2324
export TRACKER_RUN_NAME="simpletuner-sdxl"
2425

2526
# Use this to append an instance prompt to each caption, used for adding trigger words.
2627
# This has not been tested in SDXL.
2728
#export INSTANCE_PROMPT="lotr style "
28-
# This will be used for WandB uploads.
29+
# If you also supply a user prompt library or `--use_prompt_library`, this will be added to those lists.
2930
export VALIDATION_PROMPT="ethnographic photography of teddy bear at a picnic"
31+
export VALIDATION_GUIDANCE=7.5
32+
# You'll want to set this to 0.7 if you are training a terminal SNR model.
33+
export VALIDATION_GUIDANCE_RESCALE=0.0
34+
3035
# How frequently we will save and run a pipeline for validations.
3136
export VALIDATION_STEPS=100
37+
# Max number of steps OR epochs can be used. But we default to Epochs.
38+
export MAX_NUM_STEPS=30000
39+
# Will likely overtrain, but that's fine.
40+
export NUM_EPOCHS=25
3241

3342
# Location of training data.
3443
export BASE_DIR="/notebooks/datasets"
3544
export INSTANCE_DIR="${BASE_DIR}/training_data"
3645
export OUTPUT_DIR="${BASE_DIR}/models"
46+
# By default, images will be resized so their SMALLER EDGE is 1024 pixels, maintaining aspect ratio.
47+
# Setting this value to 768px might result in more reasonable training data sizes for SDXL.
48+
export RESOLUTION=1024
49+
# Adjust this for your GPU memory size. This, and resolution, are the biggest VRAM killers.
50+
export TRAIN_BATCH_SIZE=10
51+
# Accumulate your update gradient over many steps, to save VRAM while still having higher effective batch size:
52+
# effective batch size = ($TRAIN_BATCH_SIZE * $GRADIENT_ACCUMULATION_STEPS).
53+
export GRADIENT_ACCUMULATION_STEPS=4
3754

38-
# Some data that we generate will be cached here.
55+
# Some data that we generate will be cached here. Training state is baked into the checkpoints themselves.
3956
export STATE_PATH="${BASE_DIR}/training_state.json"
4057
# Store whether we've seen an image or not, to prevent repeats.
4158
export SEEN_STATE_PATH="${BASE_DIR}/training_images_seen.json"
4259

43-
# Max number of steps OR epochs can be used. But we default to Epochs.
44-
export MAX_NUM_STEPS=30000
45-
# Will likely overtrain, but that's fine.
46-
export NUM_EPOCHS=25
47-
48-
# Use any standard scheduler type.
60+
# Use any standard scheduler type. constant, polynomial, constant_with_warmup
4961
export LR_SCHEDULE="constant"
50-
# Whether this is used, depends on whether you have epochs or num_steps in use.
62+
# A warmup period allows the model and the EMA weights more importantly to familiarise itself with the current quanta.
5163
export LR_WARMUP_STEPS=$((MAX_NUM_STEPS / 10))
52-
# Adjust this for your GPU memory size.
53-
export TRAIN_BATCH_SIZE=10
54-
55-
# Validation image settings.
56-
VALIDATION_GUIDANCE=7.5
57-
VALIDATION_GUIDANCE_RESCALE=0.0
58-
59-
60-
# Leave these alone unless you know what you are doing.
61-
export RESOLUTION=1024
62-
export GRADIENT_ACCUMULATION_STEPS=4 # Yes, it slows training down. No, you don't want to change this.
63-
64-
# SDXL text encoder training is not currently tested.
65-
#export TEXT_ENCODER_LIMIT=101 # Train the text encoder for % of the process. Buggy.
66-
#export TEXT_ENCODER_FREEZE_STRATEGY='before' # before, after, between.
67-
#export TEXT_ENCODER_FREEZE_BEFORE=22 # Ignored when using 'after' strategy.
68-
#export TEXT_ENCODER_FREEZE_AFTER=24 # Ignored when using 'before' strategy.
6964

7065
# Caption dropout probability. Set to 0.1 for 10% of captions dropped out. Set to 0 to disable.
66+
# You may wish to disable dropout if you want to limit your changes strictly to the prompts you show the model.
67+
# You may wish to increase the rate of dropout if you want to more broadly adopt your changes across the model.
7168
export CAPTION_DROPOUT_PROBABILITY=0.1
7269

73-
# Mixed precision is the best. You honestly might need to YOLO it in fp16 mode for Google Colab type setups.
74-
export MIXED_PRECISION="bf16" # Might not be supported on all GPUs. fp32 will be needed for others.
75-
76-
# With Pytorch 2.1, you might have pretty good luck here.
77-
# If you're using aspect bucketing however, each resolution change will recompile.
78-
export TRAINING_DYNAMO_BACKEND='no' # or 'inductor' if you want to brave PyTorch 2 compile issues
79-
80-
# This has to be changed if you're training with multiple GPUs.
81-
export TRAINING_NUM_PROCESSES=10
82-
export TRAINING_NUM_MACHINES=1
83-
84-
# These should remain empty if you remove their options.
85-
export ACCELERATE_EXTRA_ARGS="--multi_gpu" # --multi_gpu or other similar flags for huggingface accelerate
86-
export DEBUG_EXTRA_ARGS="--print_filenames --report_to=wandb" # Removing print_filenames can ease on spam.
87-
export TRAINER_EXTRA_ARGS="--allow_tf32 --use_8bit_adam --use_ema" # anything you want to pass along extra to the actual train_sdxl.py script.
88-
89-
# These are pretty sketchy to change. --use_original_images can be removed to enable image cropping. Not tested for SDXL.
90-
export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --enable_xformers_memory_efficient_attention --use_original_images=true"
91-
export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --gradient_checkpointing --gradient_accumulation_steps=${GRADIENT_ACCUMULATION_STEPS}"
70+
# TF32 is great on Ampere or Ada, not sure about earlier generations.
71+
export TRAINER_EXTRA_ARGS="--allow_tf32 --use_8bit_adam --use_ema"
9272

9373
## For offset noise training:
74+
# Not recommended for terminal SNR models.
9475
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --offset_noise --noise_offset=0.02"
9576

96-
## For noise input pertubation - adds extra noise, randomly. This is separate from offset noise:
77+
## For noise input pertubation - adds extra noise, randomly. This is separate from offset noise, but can help stabilize it and reduce overfitting.
78+
# Not recommended for terminal SNR models.
9779
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --input_pertubation=0.01"
9880

9981
## For terminal SNR training:
10082
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --prediction_type=v_prediction --rescale_betas_zero_snr"
101-
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --training_scheduler_timestep_spacing=leading --inference_scheduler_timestep_spacing=trailing"
83+
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --training_scheduler_timestep_spacing=trailing --inference_scheduler_timestep_spacing=trailing"
10284

10385
## For experimental min-SNR weighted loss training (5 is suggested value by the original researchers):
86+
# Not recommended for terminal SNR models.
10487
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --snr_gamma=5.0"
10588

10689
# For Wasabi S3 filesystem backend (experimental)
10790
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --data_backend=aws --aws_bucket_name=test123"
10891
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_endpoint_url=https://s3.wasabisys.com"
10992
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_access_key=1234567890"
11093
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_secret_access_key=0987654321"
94+
95+
96+
# Reproducible training. Set to -1 to disable.
97+
export TRAINING_SEED=420420420
98+
99+
# Below here, these are pretty sketchy to change. --use_original_images can be removed to enable image cropping. Not tested for SDXL.
100+
# Mixed precision is the best. You honestly might need to YOLO it in fp16 mode for Google Colab type setups.
101+
export MIXED_PRECISION="bf16" # Might not be supported on all GPUs. fp32 will be needed for others.
102+
103+
# This has to be changed if you're training with multiple GPUs.
104+
export TRAINING_NUM_PROCESSES=1
105+
export TRAINING_NUM_MACHINES=1
106+
export ACCELERATE_EXTRA_ARGS="" # --multi_gpu or other similar flags for huggingface accelerate
107+
export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --enable_xformers_memory_efficient_attention --use_original_images=true"
108+
export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --gradient_checkpointing --gradient_accumulation_steps=${GRADIENT_ACCUMULATION_STEPS}"
109+
110+
# With Pytorch 2.1, you might have pretty good luck here.
111+
# If you're using aspect bucketing however, each resolution change will recompile. Seriously, just don't do it.
112+
export TRAINING_DYNAMO_BACKEND='no' # or 'inductor' if you want to brave PyTorch 2 compile issues

0 commit comments

Comments
 (0)