Skip to content

palash892/DDPM_conformational_ensemble

Repository files navigation

Generating conformational ensemble using a Denoising Diffusion Probabilistic Model (DDPM)

Welcome to the repository for the research project titled How good is Generative Diffusion Model for Enhanced Sampling of Protein Conformations Across Scales and in All-atom Resolution?

Abstract

Molecular dynamics (MD) simulations are fundamental for probing the structural dynamics of biomolecules, yet their efficiency is limited by the high computational cost of exploring long-timescale events. Generative machine learning (ML) models, particularly the Denoising Diffusion Probabilistic Model (DDPM), offer an emerging strategy to enhance conformational sampling. In this study, we evaluate the capabilities and limitations of DDPM in generating atomistically accurate conformational ensembles across proteins of varying size and structural order, ranging from the 20-residue folded Trp-cage and 58-residue BPTI to the 83-residue intrinsically disordered region Ash1 and the 140-residue intrinsically disordered protein $\alpha$-Synuclein. Training DDPM on relatively short MD trajectories using both torsion angle and all-atom coordinate data, we demonstrate that it can reproduce key structural features such as secondary structure, radius of gyration, and contact maps, while effectively sampling sparsely populated regions of the conformational landscape. Notably, DDPM can also generate novel conformations, including transitions not explicitly observed in the training data. However, the model occasionally overlooks low-probability regions and may produce conformers with unclear physical relevance, warranting independent validation. These limitations are particularly evident in flexible systems such as IDPs. Overall, this work benchmarks DDPM as a viable tool for augmenting MD simulations, offering enhanced sampling with significant computational savings, while noting its limitations in capturing low-populated conformers. At the same time, it highlights the importance of rigorous validation and thoughtful interpretation when deploying generative models in computational biophysics.

Graphical Abstract

The code has been adopted from the following GitHub repositories and websites, with necessary modifications.

We sincerely thank the authors of these resources for their valuable contributions.

Code Requirements

To run the code, ensure you have the following Python packages installed:

A schematic overview of the noising and denoising processes.

We illustrated the noising-denoising processes using an image of a Paperflower clicked at the TIFRH campus during summer. The upper panel depicts the forward (noising) process, represented by the blue arrow, where the original input image $x_0$ is progressively corrupted by adding random noise, and the reverse (denoising) process, represented by the black arrow, where a neural network predicts the added noise. The lower panel illustrates the U-Net model architecture.

Directory Structure & Usage

1. Training the Model

  • To train the model, run the script:
    python model_train.py
  • After the model is trained, a folder named results will be created. Inside this folder, you will find subfolders containing the trained model.

2. Generating Samples

  • Once the model is trained, use the following script to generate samples:
    python sample_generate.py
  • The generated samples will be saved in a folder named generate_sample.

3. Provided Example

  • We have provided a trained model for the "moon" dataset. The folder moon contains training data in .npy format with two axes: x and y coordinates.
  • We have also included the backbone torsion data for Trp-cage mini-protein. The folder Trpcage contains the training data in .npy format.

4. Data Types Supported

  • This code is versatile and can be applied to various types of data, here are some examples:
    • Torsion angles
    • Raw coordinates of all atoms
    • Protein-ligand distances
  • Note: For noise prediction, the code utilizes a 1D-UNET model. Ensure that the data for each frame is represented as a 1D array.

5. Core Neural Network Code

  • The folder denoising_diffusion_1D contains the main code for the 1D neural network architecture used for noising and denoising the data.

Animation

  • Here is a nice animation demonstrating how moon data can be generated from pure random noise using DDPM.

Animation

News!

Publication Alert 🚀

This work is now published in The Journal of Chemical Physics!🎉

🔍 Key Findings:

  • DDPM for Molecular Dynamics

    • Benchmarked the Denoising Diffusion Probabilistic Model (DDPM) as a generative tool for protein conformational sampling.
    • Trained on relatively short MD trajectories using torsion angles and all-atom coordinates.
  • Systems Studied

    • Folded proteins: Trp-cage (20 residues), BPTI (58 residues).
    • Intrinsically disordered systems: Ash1 IDR (83 residues), $\alpha$-Synuclein (140 residues).
  • Performance Highlights

    • Reproduces key structural features:
      • Secondary structure
      • Radius of gyration (Rg)
      • Contact maps
    • Samples sparsely populated regions of the conformational landscape.
    • Generates novel conformations not seen in the training data.
  • Strengths

    • Captures thermodynamically consistent ensembles.
    • Provides computationally efficient enhanced sampling compared to long MD runs.
    • Generalizes across both folded and disordered proteins.
  • Limitations

    • Occasionally misses low-probability conformers.
    • Generates some conformations with unclear physical relevance, especially in IDPs.
    • Requires independent validation of generated conformations.
  • Overall Impact

    • DDPM is a viable and promising tool for augmenting MD simulations.
    • Offers enhanced conformational sampling with significant computational savings.
    • Underscores the importance of rigorous validation and careful interpretation when applying generative AI in computational biophysics.

📖 Check out the full study here for more insights!

Stay tuned for more exciting updates!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages