Skip to content

OTeam-AI4S/ODesign

Repository files navigation

ODesign

Web Server Technical Report Project Page WhatsApp WeChat

🎉Here we present ODesign, an all-atom generative world model for all-to-all biomolecular interaction design. ODesign allows scientists to specify epitopes on arbitrary targets and generate diverse classes of binding partners with fine-grained control.

ODesign is also available at https://odesign.lglab.ac.cn, allowing users to generate binding partners without coding expertise.

Please feel free to contact us via email if you have any questions. You can also join our discussion group on WhatsApp or WeChat.

This work is supported by Lingang Laboratory, Zhejiang University, The Chinese University of Hong Kong, and Shanghai Artificial Intelligence Laboratory. For the full list of funding sources, please refer to our technical report ODesign: A World Model for Biomolecular Interaction Design.

ODesign Video

Installation

Step 1 — Clone the Repository

git clone https://github.com/The-Institute-for-AI-Molecular-Design/ODesign.git
cd ODesign

Step 2 — Prepare the Environment (cuda 12.1)

  • pip
conda create -n odesign python=3.10
conda activate odesign
pip install -r requirements.txt -f https://data.pyg.org/whl/torch-2.3.1+cu121.html
  • Docker
docker build -t odesign -f Dockerfile .

docker run --gpus all -it --rm --shm-size=8g \
  -v /path/to/ckpt_root_dir:/app/ODesign/ckpt \
  -v /path/to/data_root_dir:/app/ODesign/data \
  -v $(pwd)/outputs:/app/ODesign/outputs \
  -v $(pwd)/inference_demo.sh:/app/ODesign/inference_demo.sh \
  -v $(pwd)/train_demo.sh:/app/ODesign/train_demo.sh \
  odesign bash
  • Apptainer
apptainer build odesign.sif odesign.def

apptainer run --nv \
  --writable-tmpfs \
  -B ckpt:/app/ODesign/ckpt \
  -B data:/app/ODesign/data \
  -B outputs:/app/ODesign/outputs \
  -B $(pwd)/inference_demo.sh:/app/ODesign/inference_demo.sh \
  -B $(pwd)/train_demo.sh:/app/ODesign/train_demo.sh \
  odesign.sif bash

Available Models

ODesign currently provides the following pre-trained model variants. Each model supports a specific modality and design mode:

Model Name Design Modality Design Mode Hugging Face
odesign_base_prot_flex protein flexible-receptor odesign_base_prot_flex.pt
odesign_base_prot_rigid protein rigid-receptor odesign_base_prot_rigid.pt
odesign_base_ligand_rigid ligand rigid-receptor odesign_base_ligand_rigid.pt
odesign_base_na_rigid nucleic acid rigid-receptor odesign_base_na_rigid.pt
ODesign Design Mode

Checkpoints of OInvFold module for different design modalities are also stored at Hugging Face.

You can download all available checkpoints using the following command. Alternatively, you may manually download specific checkpoints from the Hugging Face links listed above.

cd ODesign
bash ./ckpt/get_odesign_ckpt.sh [ckpt_root_dir]

Inference

Input Format

Please refer to Section B.1 & B.2 in our Supplementary Information for details about the input JSON format. Example input JSON files for each task can be found in the examples directory.

Please note that ligand chain can also be specified by SMILES string in the smiles field. In this case, you don't need to provide the path to ref_file. An example input JSON file is provided here. To enable this function, please update the running environment using the following command:

conda install -c conda-forge biotite=1.2.0

Run Inference

Step 1 — Download Required Inference Data

Before running inference for the first time, please download the components.v20240608.cif and components.v20240608.cif.rdkit_mol.pkl from Google Drive, and place these files under your specified data_root_dir.

Step 2 — Run the Inference Demo

After data preparation, launch the inference process using:

bash inference_demo.sh

This script generates molecular designs based on the selected model and input json file. You can configure inference behavior by editing the following arguments in inference_demo.sh. An example for multi-GPU inference using torchrun is also provided in inference_demo.sh.

Argument Description Example
infer_model_name Model used for inference. Available options: odesign_base_prot_flex, odesign_base_prot_rigid, odesign_base_ligand_rigid, odesign_base_na_rigid. odesign_base_prot_flex
design_modality Must be specified as dna or rna if using nucleic acid design model. Available options: protein, ligand, dna, rna rna
data_root_dir Directory where downloaded data is stored. ./data
ckpt_root_dir Directory where model checkpoints are stored. ./ckpt
input_json_path Path to the input design specification JSON file. ./examples/.../odesign_input.json
exp_name Custom label for inference output directory. If left empty, a default name is auto-generated. protein_binding_protein_design
seeds Random seeds used during generation. Supports multiple seeds. [42] or [42, 123]
N_sample Number of generated samples per seed. 5
use_msa Utilize MSA information during inference (only set to true if the input JSON includes MSA). false
num_workers Number of dataloader workers. 4
CUDA_VISIBLE_DEVICES GPU device for inference. 0

Output Format

When inference completes, results will be saved in the outputs/ directory which has the following structure:

outputs
└── <exp_name>
    └── <timestamp>
        ├── .hydra
        ├── errors
        ├── <sample_name_1>
        │   ├── seed_XXX
        │   │   ├── predictions
        │   │   │   ├── <sample_name_1>_seed_XXX_bb_0_seq_0.cif
        │   │   │   ├── <sample_name_1>_seed_XXX_bb_0_seq_1.cif        
        │   │   │   └── ...
        │   │   └── traceback.pkl
        │   ├── seed_YYY
        │   │   ├── predictions
        │   │   │   ├── <sample_name_1>_seed_YYY_bb_0_seq_0.cif
        │   │   │   └── ...
        │   │   └── traceback.pkl
        ├── <sample_name_2>
        │   ├── ...
        └── run.log
Folder / File Description
<exp_name>/ Folder corresponding to the user-defined exp_name in inference_demo.sh.
<timestamp>/ Automatically generated run folder to separate multiple runs.
.hydra/ Stores Hydra-generated configuration files.
errors/ Stores error logs if failures occur during inference (empty if the run completes successfully).
<sample_name>/ Named after the sample name field in the input JSON. A JSON may define multiple sample cases.
seed_<value>/ Contains outputs generated using a specific random seed.
predictions/ Contains inverse-folded molecular design results.
*.cif The designed molecules after inverse folding in CIF format.
traceback.pkl Serialized traceback information (the constructed input atom_array).
run.log Full inference execution log.

Please note that the number of designed sequences per backbone structure (default: 1) can be specified by the argument exp.invfold_topk.

Usage

Protein Generation

Protein-binding Protein

Protein-binding Protein

ODesign can generate proteins that bind to specific protein targets. You need to provide a reference structure containing the target protein and specify the hotspot residues that define the binding interface. The model will generate a new protein chain that interacts with the target at the specified hotspot.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_prot_flex and input_json_path=./examples/protein_design/prot_binding_prot/odesign_input.json.

Ligand-binding Protein

Ligand-binding Protein

ODesign can generate proteins that bind to specific small molecule ligands. You need to provide a reference structure containing the ligand and specify the hotspot atoms on the ligand. The model will generate a new protein chain that forms interactions with the ligand at the specified hotspot.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_prot_flex and input_json_path=./examples/protein_design/lig_binding_prot/odesign_input.json.

Atom Scaffold

Atom Scaffold

ODesign can scaffold proteins around specific atoms or functional groups. This is useful for designing proteins that interact with specific chemical moieties. You need to specify the condition atoms that define the scaffold constraints, and the model will generate a protein structure that positions these atoms correctly.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_prot_flex and input_json_path=./examples/protein_design/atom_scaffold/odesign_input.json.

Motif Scaffold

Motif Scaffold

ODesign can scaffold functional protein motifs by generating surrounding protein structure. This is useful for stabilizing functional motifs or creating new protein folds around known functional elements. You need to specify the motif regions from the reference structure and the desired scaffold length.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_prot_flex and input_json_path=./examples/protein_design/motif_scaffold/odesign_input.json.

Ligand Generation

Protein-binding Ligand

Protein-binding Ligand

ODesign can generate small molecule ligands that bind to specific protein targets. You need to provide a reference structure containing the target protein and specify the hotspot residues that define the binding pocket. The model will generate a new ligand molecule that interacts with the target at the specified hotspot.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_ligand_rigid and input_json_path=./examples/ligand_design/prot_binding_lig/odesign_input.json.

Nucleic Acid Generation

Backbone Generation

RNA Backbone

ODesign can generate nucleic acid backbone structures of specified length. This is useful for designing NA molecules from scratch without requiring a reference structure. You only need to specify the desired RNA chain length. Note: This example demonstrates RNA generation. To generate DNA instead, modify the `chain_type` field in the JSON input file to `"dnaChain"` and set `design_modality=dna`.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_na_rigid, design_modality=rna, and input_json_path=./examples/na_design/rna_bb/odesign_input.json.

Protein-binding Nucleic Acid

Protein-binding RNA

ODesign can generate nucleic acid molecules that bind to specific protein targets. You need to provide a reference structure containing the target protein and specify the hotspot residues that define the binding interface. The model will generate a new NA chain that interacts with the target at the specified hotspot. Note: This example demonstrates RNA generation. To generate DNA instead, modify the `chain_type` field in the JSON input file (e.g., change `"rnaChain"` to `"dnaChain"`) and set `design_modality=dna`.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_na_rigid, design_modality=rna, and input_json_path=./examples/na_design/prot_binding_rna/odesign_input.json.

Cyclic Peptide Generation

Protein-binding Cyclic Peptide

ODesign can generate cyclic peptides that bind to specific protein targets. You need to provide a reference structure containing the target protein and specify the hotspot residues that define the binding interface. The model will generate a new cyclic peptide chain that interacts with the target at the specified hotspot.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_prot_flex and input_json_path=./examples/cyclic_peptide_design/odesign_input.json.

Partial Diffusion

ODesign can partially modify existing binding molecules to potentially enhance stability, modulate specificity, or improve expressibility. You need to provide a reference structure containing the target molecule, and specify the partial_diff field in the input JSON file to indicate the regions that require modifications. Please refer to Section B.3 Partial Diffusion in our Supplementary Information for details.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_prot_rigid, input_json_path=./examples/protein_design/prot_binding_prot_partial_diff/odesign_input.json and enable_partial_diff=true.

Training

Step 1 — Download Required Training Data

Before training ODesign, please download the odesign_full_data.tar.gz from Google Drive, and unzip the file using the following command. About 850 GB of disk space is required to keep the unzipped files.

tar -xzvf [data_root_dir]/odesign_train_data.tar.gz -C [data_root_dir]

Step 2 — Run the Training Demo

After data preparation, launch the training process using:

bash train_demo.sh

Please note that the ckpt_root_dir in train_demo.sh should contain the pre-trained folding model checkpoint for ODesign initialization if you are not training from scratch. Our default training setting employs protenix_base_default_v0.5.0.

Cite

If you use ODesign in your work, please cite the following:

@misc{zhang2025odesign,
      title={ODesign: A World Model for Biomolecular Interaction Design}, 
      author={Odin Zhang and Xujun Zhang and Haitao Lin and Cheng Tan and Qinghan Wang and Yuanle Mo and Qiantai Feng and Gang Du and Yuntao Yu and Zichang Jin and Ziyi You and Peicong Lin and Yijie Zhang and Yuyang Tao and Shicheng Chen and Jack Xiaoyu Chen and Chenqing Hua and Weibo Zhao and Runze Ma and Yunpeng Xia and Kejun Ying and Jun Li and Yundian Zeng and Lijun Lang and Peichen Pan and Hanqun Cao and Zihao Song and Bo Qiang and Jiaqi Wang and Pengfei Ji and Lei Bai and Jian Zhang and Chang-yu Hsieh and Pheng Ann Heng and Siqi Sun and Tingjun Hou and Shuangjia Zheng},
      year={2025},
      eprint={2510.22304},
      archivePrefix={arXiv},
      primaryClass={q-bio.BM},
      url={https://arxiv.org/abs/2510.22304}, 
}

Acknowledgements

This project code draws in part upon Protenix and OpenFold, and is supported under the Apache 2.0 License. Thanks for their great work and code.

This project is supported by Lingang Laboratory, Zhejiang University, and The Chinese University of Hong Kong. We are actively seeking highly motivated and talented PhD students to join our team. We offer PhD training opportunities in Pharmacy / AI at Zhejiang University, or Computer Science at The Chinese University of Hong Kong.

If you are interested, please contact odinz@link.cuhk.edu.hk

License

Both source code and model parameters of ODesign are released under the Apache 2.0 License.

About

Official repository of "ODesign: A World Model for Biomolecular Interaction Design"

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages