- 
                Notifications
    
You must be signed in to change notification settings  - Fork 254
 
Add FastWan (DMD) distillation method #1695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds FastWan DMD (Distribution Matching Distillation) as a new distillation method to SimpleTuner. DMD is a memory-intensive distillation technique that uses a generator (student) and fake score transformer (discriminator) to achieve high-quality distillation in fewer steps (typically 3-step).
Key changes:
- Implements DMD distillation infrastructure with generator and fake score transformer components
 - Adds comprehensive DMD configuration options and training logic
 - Updates documentation to include DMD usage examples and comparison with DCM
 
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description | 
|---|---|
| helpers/distillation/factory.py | Adds DMD enum value and factory method for creating DMD distillers | 
| helpers/distillation/dmd/distiller.py | Complete DMD implementation with generator/discriminator training logic | 
| documentation/distillation/WAN_DCM.md | Minor update to validation steps from 50 to 4 | 
| documentation/distillation/FASTWAN_DMD.md | Comprehensive DMD documentation with configuration examples | 
Comments suppressed due to low confidence (1)
helpers/distillation/dmd/distiller.py:322
- [nitpick] The function name '_pred_noise_to_pred_video' is misleading as it suggests video-specific processing, but the function performs generic noise-to-clean conversion that works for any latent type. Consider renaming to '_pred_noise_to_clean_latents' or '_convert_noise_prediction_to_clean'.
 
    def _pred_noise_to_pred_video(self, pred_noise, noise_input, timestep):
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Currently very VRAM-heavy due to lack of VSA and USP in SimpleTuner.