Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
210 changes: 210 additions & 0 deletions documentation/distillation/FASTWAN_DMD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
# DMD Distillation Quickstart (SimpleTuner)

In this example, we'll be training a **3-step student** using **DMD (Distribution Matching Distillation)** from a large flow-matching teacher model like [Wan 2.1 T2V](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B).

DMD features:

* **Generator (Student)**: Learns to match teacher in fewer steps
* **Fake Score Transformer**: Discriminates between teacher and student outputs
* **Multi-step simulation**: Optional train-inference consistency mode

---

## ✅ Hardware Requirements


⚠️ DMD is memory-intensive due to the fake score transformer which requires a complete second copy of the base model be retained in-memory.

It's recommended to attempt LCM or DCM distillation methods for the 14B Wan model instead of DMD, if you do not have the needed VRAM.

An NVIDIA B200 may be required when distilling the 14B model without sparse attention support.

Using LoRA student training can reduce the requirements substantially, but still quite hefty.

---

## 📦 Installation

```bash
git clone --branch=release https://github.com/bghira/SimpleTuner.git
cd SimpleTuner
python3.12 -m venv .venv && source .venv/bin/activate
pip install -U poetry pip
poetry config virtualenvs.create false
poetry install
```

> If you're on ROCm or Apple: you must instead use `poetry install -C install/variant` where `variant` is `rocm` or `apple`.

---

## 📁 Configuration

Edit your `config/config.json`:

```json
{
"aspect_bucket_rounding": 2,
"attention_mechanism": "diffusers",
"base_model_precision": "int8-quanto",
"caption_dropout_probability": 0.1,
"checkpointing_steps": 200,
"checkpoints_total_limit": 3,
"compress_disk_cache": true,
"data_backend_config": "config/wan/multidatabackend.json",
"delete_problematic_images": false,
"disable_benchmark": false,
"disable_bucket_pruning": true,
"distillation_method": "dmd",
"distillation_config": {
"dmd_denoising_steps": "1000,757,522",
"min_timestep_ratio": 0.02,
"max_timestep_ratio": 0.98,
"generator_update_interval": 5,
"real_score_guidance_scale": 3.0,
"simulate_generator_forward": false,
"fake_score_lr": 1e-5,
"fake_score_lr_scheduler": "cosine_with_min_lr",
"min_lr_ratio": 0.5
},
"ema_update_interval": 5,
"ema_validation": "ema_only",
"flow_schedule_shift": 5,
"grad_clip_method": "value",
"gradient_accumulation_steps": 1,
"gradient_checkpointing": true,
"hub_model_id": "wan-disney-DMD-3step",
"ignore_final_epochs": true,
"learning_rate": 2e-5,
"lora_alpha": 128,
"lora_rank": 128,
"lora_type": "standard",
"lr_scheduler": "cosine_with_min_lr",
"lr_warmup_steps": 100,
"max_grad_norm": 1.0,
"max_train_steps": 4000,
"minimum_image_size": 0,
"mixed_precision": "bf16",
"model_family": "wan",
"model_type": "lora",
"num_train_epochs": 0,
"optimizer": "adamw_bf16",
"output_dir": "output/wan-dmd",
"pretrained_model_name_or_path": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
"pretrained_t5_model_name_or_path": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
"push_checkpoints_to_hub": true,
"push_to_hub": true,
"quantize_via": "cpu",
"report_to": "wandb",
"resolution": 480,
"resolution_type": "pixel_area",
"resume_from_checkpoint": "latest",
"seed": 1000,
"text_encoder_1_precision": "int8-quanto",
"tracker_project_name": "dmd-training",
"tracker_run_name": "wan-DMD-3step",
"train_batch_size": 1,
"use_ema": true,
"vae_batch_size": 1,
"validation_guidance": 1.0,
"validation_negative_prompt": "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走",
"validation_num_inference_steps": 3,
"validation_num_video_frames": 121,
"validation_prompt": "A black and white animated scene unfolds featuring a distressed upright cow with prominent horns and expressive eyes, suspended by its legs from a hook on a static background wall. A smaller Mickey Mouse-like character enters, standing near a wooden bench, initiating interaction between the two. The cow's posture changes as it leans, stretches, and falls, while the mouse watches with a concerned expression, its face a mixture of curiosity and worry, in a world devoid of color.",
"validation_prompt_library": "config/wan/validation_prompts_dmd.json",
"validation_resolution": "1280x704",
"validation_seed": 42,
"validation_steps": 200,
"webhook_config": "config/wan/webhook.json"
}
```

### Key DMD Settings:

* **`dmd_denoising_steps`**: Steps to distill to (default: "1000,757,522" for 3-step)
* **`generator_update_interval`**: Update generator every N steps (balances training)
* **`simulate_generator_forward`**: Enable multi-step simulation (increases memory)
* **`fake_score_lr`**: Separate learning rate for fake score transformer

---

## 🎬 Dataset & Dataloader

For DMD to work well, you need **diverse, high-quality data**:

```json
{
"dataset_type": "video",
"cache_dir": "cache/wan-dmd",
"resolution_type": "pixel_area",
"crop": false,
"num_frames": 121,
"frame_interval": 1,
"resolution": 480,
"minimum_image_size": 0,
"repeats": 0
}
```

> **Note**: The Disney dataset is **inadequate** for DMD. **DON'T use it!** It's provided merely for illustrative purposes.

You need:
> - High volume (10k+ videos minimum)
> - Diverse content (different styles, motions, subjects)
> - High quality (no compression artifacts)

These may be generated from the parent model.

---

## 🚀 Training Tips

1. **Start without simulation**: Set `"simulate_generator_forward": false` initially
2. **Monitor both losses**: Watch `dmd_loss` and `fake_score_loss` in wandb
3. **Validation frequency**: DMD converges quickly, validate often
4. **Memory management**:
- Use `gradient_checkpointing`
- Lower `train_batch_size` to 1
- Consider `base_model_precision: "int8-quanto"`

---

## 📌 DMD vs DCM

| Feature | DCM | DMD |
|---------|-----|-----|
| Memory usage | Lower | Higher (fake score model) |
| Training time | Longer | Shorter (4k steps typical) |
| Quality | Good | Excellent |
| Inference steps | 4-8+ | 3-8 |
| Stability | Stable | Requires tuning |

---

## 🧩 Troubleshooting

| Problem | Fix |
|---------|-----|
| **OOM errors** | Disable `simulate_generator_forward`, reduce batch size |
| **Fake score not learning** | Increase `fake_score_lr` or use different scheduler |
| **Generator overfitting** | Increase `generator_update_interval` to 10 |
| **Poor 3-step quality** | Try "1000,500" for 2-step first |
| **Training unstable** | Lower learning rates, check data quality |

---

## 🔬 Advanced Options

For brave souls wanting to experiment:

```json
"distillation_config": {
"dmd_denoising_steps": "1000,666,333",
"simulate_generator_forward": true,
"fake_score_use_ema": true,
"adversarial_weight": 0.1,
"shift": 7.0
}
```

> ⚠️ It's recommended to use the original FastVideo implementation of DMD for resource-constrained projects as it supports sequence-parallel and video-sparse attention (VSA) for far more efficient runtime usage.
4 changes: 2 additions & 2 deletions documentation/distillation/WAN_DCM.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DCM Distillation Quickstart (SimpleTuner)

In this example, we'll be training a **50-step student** using **DCM distillation** from a large flow-matching teacher model like [Wan 2.1 T2V](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B).
In this example, we'll be training a **4-step student** using **DCM distillation** from a large flow-matching teacher model like [Wan 2.1 T2V](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B).

DCM supports:

Expand Down Expand Up @@ -109,7 +109,7 @@ Edit your `config/config.json`:
"validation_prompt_library": false,
"validation_resolution": "832x480",
"validation_seed": 42,
"validation_steps": 50,
"validation_steps": 4,
"webhook_config": "config/wan/webhook.json"
}
```
Expand Down
Loading