Installation

Installation
Available Models
Inference
Training
Cite
Acknowledgements
License

🎉Here we present ODesign, an all-atom generative world model for all-to-all biomolecular interaction design. ODesign allows scientists to specify epitopes on arbitrary targets and generate diverse classes of binding partners with fine-grained control.

ODesign is also available at https://odesign.lglab.ac.cn, allowing users to generate binding partners without coding expertise.

Please feel free to contact us via email if you have any questions. You can also join our discussion group on WhatsApp or WeChat.

This work is supported by Lingang Laboratory, Zhejiang University, The Chinese University of Hong Kong, and Shanghai Artificial Intelligence Laboratory. For the full list of funding sources, please refer to our technical report ODesign: A World Model for Biomolecular Interaction Design.

Installation

Step 1 — Clone the Repository

git clone https://github.com/The-Institute-for-AI-Molecular-Design/ODesign.git
cd ODesign

Step 2 — Prepare the Environment (cuda 12.1)

pip

conda create -n odesign python=3.10
conda activate odesign
pip install -r requirements.txt -f https://data.pyg.org/whl/torch-2.3.1+cu121.html

Docker

docker build -t odesign -f Dockerfile .

docker run --gpus all -it --rm --shm-size=8g \
  -v /path/to/ckpt_root_dir:/app/ODesign/ckpt \
  -v /path/to/data_root_dir:/app/ODesign/data \
  -v $(pwd)/outputs:/app/ODesign/outputs \
  -v $(pwd)/inference_demo.sh:/app/ODesign/inference_demo.sh \
  -v $(pwd)/train_demo.sh:/app/ODesign/train_demo.sh \
  odesign bash

Apptainer

apptainer build odesign.sif odesign.def

apptainer run --nv \
  --writable-tmpfs \
  -B ckpt:/app/ODesign/ckpt \
  -B data:/app/ODesign/data \
  -B outputs:/app/ODesign/outputs \
  -B $(pwd)/inference_demo.sh:/app/ODesign/inference_demo.sh \
  -B $(pwd)/train_demo.sh:/app/ODesign/train_demo.sh \
  odesign.sif bash

Available Models

ODesign currently provides the following pre-trained model variants. Each model supports a specific modality and design mode:

Model Name	Design Modality	Design Mode	Hugging Face
`odesign_base_prot_flex`	protein	flexible-receptor	odesign_base_prot_flex.pt
`odesign_base_prot_rigid`	protein	rigid-receptor	odesign_base_prot_rigid.pt
`odesign_base_ligand_rigid`	ligand	rigid-receptor	odesign_base_ligand_rigid.pt
`odesign_base_na_rigid`	nucleic acid	rigid-receptor	odesign_base_na_rigid.pt

Checkpoints of OInvFold module for different design modalities are also stored at Hugging Face.

You can download all available checkpoints using the following command. Alternatively, you may manually download specific checkpoints from the Hugging Face links listed above.

cd ODesign
bash ./ckpt/get_odesign_ckpt.sh [ckpt_root_dir]

Inference

Input Format

Please refer to Section B.1 & B.2 in our Supplementary Information for details about the input JSON format. Example input JSON files for each task can be found in the examples directory.

Please note that ligand chain can also be specified by SMILES string in the smiles field. In this case, you don't need to provide the path to ref_file. An example input JSON file is provided here. To enable this function, please update the running environment using the following command:

conda install -c conda-forge biotite=1.2.0

Run Inference

Step 1 — Download Required Inference Data

Before running inference for the first time, please download the components.v20240608.cif and components.v20240608.cif.rdkit_mol.pkl from Google Drive, and place these files under your specified data_root_dir.

Step 2 — Run the Inference Demo

After data preparation, launch the inference process using:

bash inference_demo.sh

This script generates molecular designs based on the selected model and input json file. You can configure inference behavior by editing the following arguments in inference_demo.sh. An example for multi-GPU inference using torchrun is also provided in inference_demo.sh.

Argument	Description	Example
`infer_model_name`	Model used for inference. Available options: `odesign_base_prot_flex`, `odesign_base_prot_rigid`, `odesign_base_ligand_rigid`, `odesign_base_na_rigid`.	`odesign_base_prot_flex`
`design_modality`	Must be specified as `dna` or `rna` if using nucleic acid design model. Available options: `protein`, `ligand`, `dna`, `rna`	`rna`
`data_root_dir`	Directory where downloaded data is stored.	`./data`
`ckpt_root_dir`	Directory where model checkpoints are stored.	`./ckpt`
`input_json_path`	Path to the input design specification JSON file.	`./examples/.../odesign_input.json`
`exp_name`	Custom label for inference output directory. If left empty, a default name is auto-generated.	`protein_binding_protein_design`
`seeds`	Random seeds used during generation. Supports multiple seeds.	`[42]` or `[42, 123]`
`N_sample`	Number of generated samples per seed.	`5`
`use_msa`	Utilize MSA information during inference (only set to `true` if the input JSON includes MSA).	`false`
`num_workers`	Number of dataloader workers.	`4`
`CUDA_VISIBLE_DEVICES`	GPU device for inference.	`0`

Output Format

When inference completes, results will be saved in the outputs/ directory which has the following structure:

outputs
└── <exp_name>
    └── <timestamp>
        ├── .hydra
        ├── errors
        ├── <sample_name_1>
        │   ├── seed_XXX
        │   │   ├── predictions
        │   │   │   ├── <sample_name_1>_seed_XXX_bb_0_seq_0.cif
        │   │   │   ├── <sample_name_1>_seed_XXX_bb_0_seq_1.cif        
        │   │   │   └── ...
        │   │   └── traceback.pkl
        │   ├── seed_YYY
        │   │   ├── predictions
        │   │   │   ├── <sample_name_1>_seed_YYY_bb_0_seq_0.cif
        │   │   │   └── ...
        │   │   └── traceback.pkl
        ├── <sample_name_2>
        │   ├── ...
        └── run.log

Folder / File	Description
`<exp_name>/`	Folder corresponding to the user-defined `exp_name` in `inference_demo.sh`.
`<timestamp>/`	Automatically generated run folder to separate multiple runs.
`.hydra/`	Stores Hydra-generated configuration files.
`errors/`	Stores error logs if failures occur during inference (empty if the run completes successfully).
`<sample_name>/`	Named after the sample `name` field in the input JSON. A JSON may define multiple sample cases.
`seed_<value>/`	Contains outputs generated using a specific random seed.
`predictions/`	Contains inverse-folded molecular design results.
`*.cif`	The designed molecules after inverse folding in CIF format.
`traceback.pkl`	Serialized traceback information (the constructed input `atom_array`).
`run.log`	Full inference execution log.

Please note that the number of designed sequences per backbone structure (default: 1) can be specified by the argument exp.invfold_topk.

Usage

Protein Generation

Protein-binding Protein

ODesign can generate proteins that bind to specific protein targets. You need to provide a reference structure containing the target protein and specify the hotspot residues that define the binding interface. The model will generate a new protein chain that interacts with the target at the specified hotspot.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_prot_flex and input_json_path=./examples/protein_design/prot_binding_prot/odesign_input.json.

Ligand-binding Protein

ODesign can generate proteins that bind to specific small molecule ligands. You need to provide a reference structure containing the ligand and specify the hotspot atoms on the ligand. The model will generate a new protein chain that forms interactions with the ligand at the specified hotspot.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_prot_flex and input_json_path=./examples/protein_design/lig_binding_prot/odesign_input.json.

Atom Scaffold

ODesign can scaffold proteins around specific atoms or functional groups. This is useful for designing proteins that interact with specific chemical moieties. You need to specify the condition atoms that define the scaffold constraints, and the model will generate a protein structure that positions these atoms correctly.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_prot_flex and input_json_path=./examples/protein_design/atom_scaffold/odesign_input.json.

Motif Scaffold

ODesign can scaffold functional protein motifs by generating surrounding protein structure. This is useful for stabilizing functional motifs or creating new protein folds around known functional elements. You need to specify the motif regions from the reference structure and the desired scaffold length.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_prot_flex and input_json_path=./examples/protein_design/motif_scaffold/odesign_input.json.

Ligand Generation

Protein-binding Ligand

ODesign can generate small molecule ligands that bind to specific protein targets. You need to provide a reference structure containing the target protein and specify the hotspot residues that define the binding pocket. The model will generate a new ligand molecule that interacts with the target at the specified hotspot.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_ligand_rigid and input_json_path=./examples/ligand_design/prot_binding_lig/odesign_input.json.

Nucleic Acid Generation

Backbone Generation

ODesign can generate nucleic acid backbone structures of specified length. This is useful for designing NA molecules from scratch without requiring a reference structure. You only need to specify the desired RNA chain length. Note: This example demonstrates RNA generation. To generate DNA instead, modify the `chain_type` field in the JSON input file to `"dnaChain"` and set `design_modality=dna`.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_na_rigid, design_modality=rna, and input_json_path=./examples/na_design/rna_bb/odesign_input.json.

Protein-binding Nucleic Acid

ODesign can generate nucleic acid molecules that bind to specific protein targets. You need to provide a reference structure containing the target protein and specify the hotspot residues that define the binding interface. The model will generate a new NA chain that interacts with the target at the specified hotspot. Note: This example demonstrates RNA generation. To generate DNA instead, modify the `chain_type` field in the JSON input file (e.g., change `"rnaChain"` to `"dnaChain"`) and set `design_modality=dna`.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_na_rigid, design_modality=rna, and input_json_path=./examples/na_design/prot_binding_rna/odesign_input.json.

Cyclic Peptide Generation

Protein-binding Cyclic Peptide

ODesign can generate cyclic peptides that bind to specific protein targets. You need to provide a reference structure containing the target protein and specify the hotspot residues that define the binding interface. The model will generate a new cyclic peptide chain that interacts with the target at the specified hotspot.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_prot_flex and input_json_path=./examples/cyclic_peptide_design/odesign_input.json.

Partial Diffusion

ODesign can partially modify existing binding molecules to potentially enhance stability, modulate specificity, or improve expressibility. You need to provide a reference structure containing the target molecule, and specify the partial_diff field in the input JSON file to indicate the regions that require modifications. Please refer to Section B.3 Partial Diffusion in our Supplementary Information for details.

To run this example, use:

bash inference_demo.sh

with infer_model_name=odesign_base_prot_rigid, input_json_path=./examples/protein_design/prot_binding_prot_partial_diff/odesign_input.json and enable_partial_diff=true.

Training

Step 1 — Download Required Training Data

Before training ODesign, please download the odesign_full_data.tar.gz from Google Drive, and unzip the file using the following command. About 850 GB of disk space is required to keep the unzipped files.

tar -xzvf [data_root_dir]/odesign_train_data.tar.gz -C [data_root_dir]

Step 2 — Run the Training Demo

After data preparation, launch the training process using:

bash train_demo.sh

Please note that the ckpt_root_dir in train_demo.sh should contain the pre-trained folding model checkpoint for ODesign initialization if you are not training from scratch. Our default training setting employs protenix_base_default_v0.5.0.

Cite

If you use ODesign in your work, please cite the following:

@misc{zhang2025odesign,
      title={ODesign: A World Model for Biomolecular Interaction Design}, 
      author={Odin Zhang and Xujun Zhang and Haitao Lin and Cheng Tan and Qinghan Wang and Yuanle Mo and Qiantai Feng and Gang Du and Yuntao Yu and Zichang Jin and Ziyi You and Peicong Lin and Yijie Zhang and Yuyang Tao and Shicheng Chen and Jack Xiaoyu Chen and Chenqing Hua and Weibo Zhao and Runze Ma and Yunpeng Xia and Kejun Ying and Jun Li and Yundian Zeng and Lijun Lang and Peichen Pan and Hanqun Cao and Zihao Song and Bo Qiang and Jiaqi Wang and Pengfei Ji and Lei Bai and Jian Zhang and Chang-yu Hsieh and Pheng Ann Heng and Siqi Sun and Tingjun Hou and Shuangjia Zheng},
      year={2025},
      eprint={2510.22304},
      archivePrefix={arXiv},
      primaryClass={q-bio.BM},
      url={https://arxiv.org/abs/2510.22304}, 
}

Acknowledgements

This project code draws in part upon Protenix and OpenFold, and is supported under the Apache 2.0 License. Thanks for their great work and code.

This project is supported by Lingang Laboratory, Zhejiang University, and The Chinese University of Hong Kong. We are actively seeking highly motivated and talented PhD students to join our team. We offer PhD training opportunities in Pharmacy / AI at Zhejiang University, or Computer Science at The Chinese University of Hong Kong.

If you are interested, please contact odinz@link.cuhk.edu.hk

License

Both source code and model parameters of ODesign are released under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
ckpt		ckpt
configs		configs
data		data
examples		examples
imgs		imgs
scripts		scripts
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
inference_demo.sh		inference_demo.sh
license_file_list.md		license_file_list.md
odesign.def		odesign.def
requirements.txt		requirements.txt
train_demo.sh		train_demo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Available Models

Inference

Input Format

Run Inference

Output Format

Usage

Protein Generation

Protein-binding Protein

Ligand-binding Protein

Atom Scaffold

Motif Scaffold

Ligand Generation

Protein-binding Ligand

Nucleic Acid Generation

Backbone Generation

Protein-binding Nucleic Acid

Cyclic Peptide Generation

Protein-binding Cyclic Peptide

Partial Diffusion

Training

Cite

Acknowledgements

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Installation

Available Models

Inference

Input Format

Run Inference

Output Format

Usage

Protein Generation

Protein-binding Protein

Ligand-binding Protein

Atom Scaffold

Motif Scaffold

Ligand Generation

Protein-binding Ligand

Nucleic Acid Generation

Backbone Generation

Protein-binding Nucleic Acid

Cyclic Peptide Generation

Protein-binding Cyclic Peptide

Partial Diffusion

Training

Cite

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages