Welcome to the repository for the research project titled How good is Generative Diffusion Model for Enhanced Sampling of Protein Conformations Across Scales and in All-atom Resolution?
Molecular dynamics (MD) simulations are fundamental for probing the structural dynamics of biomolecules, yet their efficiency is limited by the high computational cost of exploring long-timescale events. Generative machine learning (ML) models, particularly the Denoising Diffusion Probabilistic Model (DDPM), offer an emerging strategy to enhance conformational sampling. In this study, we evaluate the capabilities and limitations of DDPM in generating atomistically accurate conformational ensembles across proteins of varying size and structural order, ranging from the 20-residue folded Trp-cage and 58-residue BPTI to the 83-residue intrinsically disordered region Ash1 and the 140-residue intrinsically disordered protein
The code has been adopted from the following GitHub repositories and websites, with necessary modifications.
We sincerely thank the authors of these resources for their valuable contributions.
To run the code, ensure you have the following Python packages installed:
We illustrated the noising-denoising processes
using an image of a Paperflower clicked at the TIFRH campus during summer. The upper panel depicts the forward (noising)
process, represented by the blue arrow, where the original input image
- To train the model, run the script:
python model_train.py
- After the model is trained, a folder named
resultswill be created. Inside this folder, you will find subfolders containing the trained model.
- Once the model is trained, use the following script to generate samples:
python sample_generate.py
- The generated samples will be saved in a folder named
generate_sample.
- We have provided a trained model for the "moon" dataset. The folder
mooncontains training data in.npyformat with two axes: x and y coordinates. - We have also included the backbone torsion data for Trp-cage mini-protein. The folder
Trpcagecontains the training data in.npyformat.
- This code is versatile and can be applied to various types of data, here are some examples:
- Torsion angles
- Raw coordinates of all atoms
- Protein-ligand distances
- Note: For noise prediction, the code utilizes a 1D-UNET model. Ensure that the data for each frame is represented as a 1D array.
- The folder
denoising_diffusion_1Dcontains the main code for the 1D neural network architecture used for noising and denoising the data.
- Here is a nice animation demonstrating how moon data can be generated from pure random noise using DDPM.
Publication Alert 🚀
This work is now published in The Journal of Chemical Physics!🎉
🔍 Key Findings:
-
DDPM for Molecular Dynamics
- Benchmarked the Denoising Diffusion Probabilistic Model (DDPM) as a generative tool for protein conformational sampling.
- Trained on relatively short MD trajectories using torsion angles and all-atom coordinates.
-
Systems Studied
- Folded proteins: Trp-cage (20 residues), BPTI (58 residues).
- Intrinsically disordered systems: Ash1 IDR (83 residues),
$\alpha$ -Synuclein (140 residues).
-
Performance Highlights
- Reproduces key structural features:
- Secondary structure
- Radius of gyration (Rg)
- Contact maps
- Samples sparsely populated regions of the conformational landscape.
- Generates novel conformations not seen in the training data.
- Reproduces key structural features:
-
Strengths
- Captures thermodynamically consistent ensembles.
- Provides computationally efficient enhanced sampling compared to long MD runs.
- Generalizes across both folded and disordered proteins.
-
Limitations
- Occasionally misses low-probability conformers.
- Generates some conformations with unclear physical relevance, especially in IDPs.
- Requires independent validation of generated conformations.
-
Overall Impact
- DDPM is a viable and promising tool for augmenting MD simulations.
- Offers enhanced conformational sampling with significant computational savings.
- Underscores the importance of rigorous validation and careful interpretation when applying generative AI in computational biophysics.
📖 Check out the full study here for more insights!
Stay tuned for more exciting updates!


