Update env example to fix terminal SNR parameters plus reorganise it to make the top more relevant to users

bghira · bghira · commit 03e82fc37b70 · 2023-08-28T18:46:23.000-07:00
diff --git a/TUTORIAL.md b/TUTORIAL.md
@@ -39,19 +39,34 @@ A publicly-available dataset is available [on huggingface hub](https://huggingfa
 
 Approximately 162GB of images are available in the `split_train` directory, although this format is not required by SimpleTuner.
 
-### Batch size impacts aspect bucketing
+You can simply create a single folder full of jumbled-up images, or they can be neatly organised into subdirectories.
 
-Your maximum batch size is a function of your available VRAM and image resolution.
+**Here are some important guidelines:**
+
+### Training batch size
+
+Your maximum batch size is a function of your available VRAM and image resolution:
+
+```
+vram use = batch size * resolution + base_requirements
+```
+
+To reduce VRAM use, you can reduce batch size or resolution, but the base requirements will always bite us in the ass. SDXL is a **huge** model.
+
+To summarise:
 
 - You want as high of a batch size as you can tolerate.
 - The larger you set `RESOLUTION`, the more VRAM is used, and the lower your batch size can be.
 - A larger batch size requires more training data in each bucket, since each one **must** contain a minimum of that many images.
+- If you can't get a single iteration done with batch size of 1 and resolution of 128x128 on Adafactor or AdamW8Bit, your hardware just won't work.
 
-Consequently, this means you should use as much high quality training data as you can acquire.
+Which brings up the next point: **you should use as much high quality training data as you can acquire.**
 
 ### Selecting images
 
-- JPEG artifacts and blurry images are a no-go. If you're trying to extract frames from a movie to train from, you're going to have a bad time as the compression ruins most of it - only the excessively large releases in the 40+ GB range are really going to be useful for improving image clarity.
+- JPEG artifacts and blurry images are a no-go. The model **will** pick these up.
+- Same goes for watermarks and "badges", artist signatures. That will all be picked up effortlessly.
+- If you're trying to extract frames from a movie to train from, you're going to have a bad time. Compression ruins most films - only the large 40+ GB releases are really going to be useful for improving image clarity.
 - Image resolutions optimally should be divisible by 64.
   - This isn't **required**, but is beneficial to follow.
 - Square images are not required, though they will work.
diff --git a/sdxl-env.sh.example b/sdxl-env.sh.example
@@ -1,110 +1,112 @@
-# Reproducible training.
-export TRAINING_SEED=420420420
+# Configure these values.
 
 # Restart where we left off. Change this to "checkpoint-1234" to start from a specific checkpoint.
 export RESUME_CHECKPOINT="latest"
 
 # How often to checkpoint. Depending on your learning rate, you may wish to change this.
-
 # For the default settings with 10 gradient accumulations, more frequent checkpoints might be preferable at first.
 export CHECKPOINTING_STEPS=150
 # This is how many checkpoints we will keep. Two is safe, but three is safer.
 export CHECKPOINTING_LIMIT=2
 
+# This is decided as a relatively conservative 'constant' learning rate.
+# Adjust higher or lower depending on how burnt your model becomes.
 export LEARNING_RATE=8e-7 #@param {type:"number"}
 
-# Configure these values.
 # Using a Huggingface Hub model:
 export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
 # Using a local path to a huggingface hub model or saved checkpoint:
 #export MODEL_NAME="/datasets/models/pipeline"
 
+# Make DEBUG_EXTRA_ARGS empty to disable wandb.
+export DEBUG_EXTRA_ARGS="--report_to=wandb"
 export TRACKER_PROJECT_NAME="sdxl-training"
 export TRACKER_RUN_NAME="simpletuner-sdxl"
 
 # Use this to append an instance prompt to each caption, used for adding trigger words.
 # This has not been tested in SDXL.
 #export INSTANCE_PROMPT="lotr style "
-# This will be used for WandB uploads.
+# If you also supply a user prompt library or `--use_prompt_library`, this will be added to those lists.
 export VALIDATION_PROMPT="ethnographic photography of teddy bear at a picnic"
+export VALIDATION_GUIDANCE=7.5
+# You'll want to set this to 0.7 if you are training a terminal SNR model.
+export VALIDATION_GUIDANCE_RESCALE=0.0
+
 # How frequently we will save and run a pipeline for validations.
 export VALIDATION_STEPS=100
+# Max number of steps OR epochs can be used. But we default to Epochs.
+export MAX_NUM_STEPS=30000
+# Will likely overtrain, but that's fine.
+export NUM_EPOCHS=25
 
 # Location of training data.
 export BASE_DIR="/notebooks/datasets"
 export INSTANCE_DIR="${BASE_DIR}/training_data"
 export OUTPUT_DIR="${BASE_DIR}/models"
+# By default, images will be resized so their SMALLER EDGE is 1024 pixels, maintaining aspect ratio.
+# Setting this value to 768px might result in more reasonable training data sizes for SDXL.
+export RESOLUTION=1024
+# Adjust this for your GPU memory size. This, and resolution, are the biggest VRAM killers.
+export TRAIN_BATCH_SIZE=10
+# Accumulate your update gradient over many steps, to save VRAM while still having higher effective batch size:
+# effective batch size = ($TRAIN_BATCH_SIZE * $GRADIENT_ACCUMULATION_STEPS).
+export GRADIENT_ACCUMULATION_STEPS=4
 
-# Some data that we generate will be cached here.
+# Some data that we generate will be cached here. Training state is baked into the checkpoints themselves.
 export STATE_PATH="${BASE_DIR}/training_state.json"
 # Store whether we've seen an image or not, to prevent repeats.
 export SEEN_STATE_PATH="${BASE_DIR}/training_images_seen.json"
 
-# Max number of steps OR epochs can be used. But we default to Epochs.
-export MAX_NUM_STEPS=30000
-# Will likely overtrain, but that's fine.
-export NUM_EPOCHS=25
-
-# Use any standard scheduler type.
+# Use any standard scheduler type. constant, polynomial, constant_with_warmup
 export LR_SCHEDULE="constant"
-# Whether this is used, depends on whether you have epochs or num_steps in use.
+# A warmup period allows the model and the EMA weights more importantly to familiarise itself with the current quanta.
 export LR_WARMUP_STEPS=$((MAX_NUM_STEPS / 10))
-# Adjust this for your GPU memory size.
-export TRAIN_BATCH_SIZE=10
-
-# Validation image settings.
-VALIDATION_GUIDANCE=7.5
-VALIDATION_GUIDANCE_RESCALE=0.0
-
-
-# Leave these alone unless you know what you are doing.
-export RESOLUTION=1024
-export GRADIENT_ACCUMULATION_STEPS=4          # Yes, it slows training down. No, you don't want to change this.
-
-# SDXL text encoder training is not currently tested.
-#export TEXT_ENCODER_LIMIT=101                # Train the text encoder for % of the process. Buggy.
-#export TEXT_ENCODER_FREEZE_STRATEGY='before' # before, after, between.
-#export TEXT_ENCODER_FREEZE_BEFORE=22         # Ignored when using 'after' strategy.
-#export TEXT_ENCODER_FREEZE_AFTER=24          # Ignored when using 'before' strategy.
 
 # Caption dropout probability. Set to 0.1 for 10% of captions dropped out. Set to 0 to disable.
+# You may wish to disable dropout if you want to limit your changes strictly to the prompts you show the model.
+# You may wish to increase the rate of dropout if you want to more broadly adopt your changes across the model.
 export CAPTION_DROPOUT_PROBABILITY=0.1
 
-# Mixed precision is the best. You honestly might need to YOLO it in fp16 mode for Google Colab type setups.
-export MIXED_PRECISION="bf16"                # Might not be supported on all GPUs. fp32 will be needed for others.
-
-# With Pytorch 2.1, you might have pretty good luck here.
-# If you're using aspect bucketing however, each resolution change will recompile.
-export TRAINING_DYNAMO_BACKEND='no'          # or 'inductor' if you want to brave PyTorch 2 compile issues
-
-# This has to be changed if you're training with multiple GPUs.
-export TRAINING_NUM_PROCESSES=10
-export TRAINING_NUM_MACHINES=1
-
-# These should remain empty if you remove their options.
-export ACCELERATE_EXTRA_ARGS="--multi_gpu"                          # --multi_gpu or other similar flags for huggingface accelerate
-export DEBUG_EXTRA_ARGS="--print_filenames --report_to=wandb"     # Removing print_filenames can ease on spam.
-export TRAINER_EXTRA_ARGS="--allow_tf32 --use_8bit_adam --use_ema"  # anything you want to pass along extra to the actual train_sdxl.py script.
-
-# These are pretty sketchy to change. --use_original_images can be removed to enable image cropping. Not tested for SDXL.
-export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --enable_xformers_memory_efficient_attention --use_original_images=true"
-export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --gradient_checkpointing --gradient_accumulation_steps=${GRADIENT_ACCUMULATION_STEPS}"
+# TF32 is great on Ampere or Ada, not sure about earlier generations.
+export TRAINER_EXTRA_ARGS="--allow_tf32 --use_8bit_adam --use_ema"
 
 ## For offset noise training:
+# Not recommended for terminal SNR models.
 #export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --offset_noise --noise_offset=0.02"
 
-## For noise input pertubation - adds extra noise, randomly. This is separate from offset noise:
+## For noise input pertubation - adds extra noise, randomly. This is separate from offset noise, but can help stabilize it and reduce overfitting.
+# Not recommended for terminal SNR models.
 #export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --input_pertubation=0.01"
 
 ## For terminal SNR training:
 #export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --prediction_type=v_prediction --rescale_betas_zero_snr"
-#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --training_scheduler_timestep_spacing=leading --inference_scheduler_timestep_spacing=trailing"
+#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --training_scheduler_timestep_spacing=trailing --inference_scheduler_timestep_spacing=trailing"
 
 ## For experimental min-SNR weighted loss training (5 is suggested value by the original researchers):
+# Not recommended for terminal SNR models.
 #export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --snr_gamma=5.0"
 
 # For Wasabi S3 filesystem backend (experimental)
 #export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --data_backend=aws --aws_bucket_name=test123"
 #export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_endpoint_url=https://s3.wasabisys.com"
 #export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_access_key=1234567890"
 #export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_secret_access_key=0987654321"
+
+
+# Reproducible training. Set to -1 to disable.
+export TRAINING_SEED=420420420
+
+# Below here, these are pretty sketchy to change. --use_original_images can be removed to enable image cropping. Not tested for SDXL.
+# Mixed precision is the best. You honestly might need to YOLO it in fp16 mode for Google Colab type setups.
+export MIXED_PRECISION="bf16"                # Might not be supported on all GPUs. fp32 will be needed for others.
+
+# This has to be changed if you're training with multiple GPUs.
+export TRAINING_NUM_PROCESSES=1
+export TRAINING_NUM_MACHINES=1
+export ACCELERATE_EXTRA_ARGS=""                          # --multi_gpu or other similar flags for huggingface accelerate
+export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --enable_xformers_memory_efficient_attention --use_original_images=true"
+export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --gradient_checkpointing --gradient_accumulation_steps=${GRADIENT_ACCUMULATION_STEPS}"
+
+# With Pytorch 2.1, you might have pretty good luck here.
+# If you're using aspect bucketing however, each resolution change will recompile. Seriously, just don't do it.
+export TRAINING_DYNAMO_BACKEND='no'          # or 'inductor' if you want to brave PyTorch 2 compile issues