🎉Here we present ODesign, an all-atom generative world model for all-to-all biomolecular interaction design. ODesign allows scientists to specify epitopes on arbitrary targets and generate diverse classes of binding partners with fine-grained control.
ODesign is also available at https://odesign.lglab.ac.cn, allowing users to generate binding partners without coding expertise.
Please feel free to contact us via email if you have any questions. You can also join our discussion group on WhatsApp or WeChat.
This work is supported by Lingang Laboratory, Zhejiang University, The Chinese University of Hong Kong, and Shanghai Artificial Intelligence Laboratory. For the full list of funding sources, please refer to our technical report ODesign: A World Model for Biomolecular Interaction Design.
Step 1 — Clone the Repository
git clone https://github.com/The-Institute-for-AI-Molecular-Design/ODesign.git
cd ODesignStep 2 — Prepare the Environment (cuda 12.1)
- pip
conda create -n odesign python=3.10
conda activate odesign
pip install -r requirements.txt -f https://data.pyg.org/whl/torch-2.3.1+cu121.html- Docker
docker build -t odesign -f Dockerfile .
docker run --gpus all -it --rm --shm-size=8g \
-v /path/to/ckpt_root_dir:/app/ODesign/ckpt \
-v /path/to/data_root_dir:/app/ODesign/data \
-v $(pwd)/outputs:/app/ODesign/outputs \
-v $(pwd)/inference_demo.sh:/app/ODesign/inference_demo.sh \
-v $(pwd)/train_demo.sh:/app/ODesign/train_demo.sh \
odesign bash- Apptainer
apptainer build odesign.sif odesign.def
apptainer run --nv \
--writable-tmpfs \
-B ckpt:/app/ODesign/ckpt \
-B data:/app/ODesign/data \
-B outputs:/app/ODesign/outputs \
-B $(pwd)/inference_demo.sh:/app/ODesign/inference_demo.sh \
-B $(pwd)/train_demo.sh:/app/ODesign/train_demo.sh \
odesign.sif bashODesign currently provides the following pre-trained model variants. Each model supports a specific modality and design mode:
| Model Name | Design Modality | Design Mode | Hugging Face |
|---|---|---|---|
odesign_base_prot_flex |
protein | flexible-receptor | odesign_base_prot_flex.pt |
odesign_base_prot_rigid |
protein | rigid-receptor | odesign_base_prot_rigid.pt |
odesign_base_ligand_rigid |
ligand | rigid-receptor | odesign_base_ligand_rigid.pt |
odesign_base_na_rigid |
nucleic acid | rigid-receptor | odesign_base_na_rigid.pt |
Checkpoints of OInvFold module for different design modalities are also stored at Hugging Face.
You can download all available checkpoints using the following command. Alternatively, you may manually download specific checkpoints from the Hugging Face links listed above.
cd ODesign
bash ./ckpt/get_odesign_ckpt.sh [ckpt_root_dir]Please refer to Section B.1 & B.2 in our Supplementary Information for details about the input JSON format. Example input JSON files for each task can be found in the examples directory.
Please note that ligand chain can also be specified by SMILES string in the smiles field. In this case, you don't need to provide the path to ref_file. An example input JSON file is provided here. To enable this function, please update the running environment using the following command:
conda install -c conda-forge biotite=1.2.0Step 1 — Download Required Inference Data
Before running inference for the first time, please download the components.v20240608.cif and components.v20240608.cif.rdkit_mol.pkl from Google Drive, and place these files under your specified data_root_dir.
Step 2 — Run the Inference Demo
After data preparation, launch the inference process using:
bash inference_demo.shThis script generates molecular designs based on the selected model and input json file. You can configure inference behavior by editing the following arguments in inference_demo.sh. An example for multi-GPU inference using torchrun is also provided in inference_demo.sh.
| Argument | Description | Example |
|---|---|---|
infer_model_name |
Model used for inference. Available options: odesign_base_prot_flex, odesign_base_prot_rigid, odesign_base_ligand_rigid, odesign_base_na_rigid. |
odesign_base_prot_flex |
design_modality |
Must be specified as dna or rna if using nucleic acid design model. Available options: protein, ligand, dna, rna |
rna |
data_root_dir |
Directory where downloaded data is stored. | ./data |
ckpt_root_dir |
Directory where model checkpoints are stored. | ./ckpt |
input_json_path |
Path to the input design specification JSON file. | ./examples/.../odesign_input.json |
exp_name |
Custom label for inference output directory. If left empty, a default name is auto-generated. | protein_binding_protein_design |
seeds |
Random seeds used during generation. Supports multiple seeds. | [42] or [42, 123] |
N_sample |
Number of generated samples per seed. | 5 |
use_msa |
Utilize MSA information during inference (only set to true if the input JSON includes MSA). |
false |
num_workers |
Number of dataloader workers. | 4 |
CUDA_VISIBLE_DEVICES |
GPU device for inference. | 0 |
When inference completes, results will be saved in the outputs/ directory which has the following structure:
outputs
└── <exp_name>
└── <timestamp>
├── .hydra
├── errors
├── <sample_name_1>
│ ├── seed_XXX
│ │ ├── predictions
│ │ │ ├── <sample_name_1>_seed_XXX_bb_0_seq_0.cif
│ │ │ ├── <sample_name_1>_seed_XXX_bb_0_seq_1.cif
│ │ │ └── ...
│ │ └── traceback.pkl
│ ├── seed_YYY
│ │ ├── predictions
│ │ │ ├── <sample_name_1>_seed_YYY_bb_0_seq_0.cif
│ │ │ └── ...
│ │ └── traceback.pkl
├── <sample_name_2>
│ ├── ...
└── run.log
| Folder / File | Description |
|---|---|
<exp_name>/ |
Folder corresponding to the user-defined exp_name in inference_demo.sh. |
<timestamp>/ |
Automatically generated run folder to separate multiple runs. |
.hydra/ |
Stores Hydra-generated configuration files. |
errors/ |
Stores error logs if failures occur during inference (empty if the run completes successfully). |
<sample_name>/ |
Named after the sample name field in the input JSON. A JSON may define multiple sample cases. |
seed_<value>/ |
Contains outputs generated using a specific random seed. |
predictions/ |
Contains inverse-folded molecular design results. |
*.cif |
The designed molecules after inverse folding in CIF format. |
traceback.pkl |
Serialized traceback information (the constructed input atom_array). |
run.log |
Full inference execution log. |
Please note that the number of designed sequences per backbone structure (default: 1) can be specified by the argument
exp.invfold_topk.
ODesign can generate proteins that bind to specific protein targets. You need to provide a reference structure containing the target protein and specify the hotspot residues that define the binding interface. The model will generate a new protein chain that interacts with the target at the specified hotspot.
To run this example, use:
bash inference_demo.shwith infer_model_name=odesign_base_prot_flex and input_json_path=./examples/protein_design/prot_binding_prot/odesign_input.json.
ODesign can generate proteins that bind to specific small molecule ligands. You need to provide a reference structure containing the ligand and specify the hotspot atoms on the ligand. The model will generate a new protein chain that forms interactions with the ligand at the specified hotspot.
To run this example, use:
bash inference_demo.shwith infer_model_name=odesign_base_prot_flex and input_json_path=./examples/protein_design/lig_binding_prot/odesign_input.json.
ODesign can scaffold proteins around specific atoms or functional groups. This is useful for designing proteins that interact with specific chemical moieties. You need to specify the condition atoms that define the scaffold constraints, and the model will generate a protein structure that positions these atoms correctly.
To run this example, use:
bash inference_demo.shwith infer_model_name=odesign_base_prot_flex and input_json_path=./examples/protein_design/atom_scaffold/odesign_input.json.
To run this example, use:
bash inference_demo.shwith infer_model_name=odesign_base_prot_flex and input_json_path=./examples/protein_design/motif_scaffold/odesign_input.json.
ODesign can generate small molecule ligands that bind to specific protein targets. You need to provide a reference structure containing the target protein and specify the hotspot residues that define the binding pocket. The model will generate a new ligand molecule that interacts with the target at the specified hotspot.
To run this example, use:
bash inference_demo.shwith infer_model_name=odesign_base_ligand_rigid and input_json_path=./examples/ligand_design/prot_binding_lig/odesign_input.json.
To run this example, use:
bash inference_demo.shwith infer_model_name=odesign_base_na_rigid, design_modality=rna, and input_json_path=./examples/na_design/rna_bb/odesign_input.json.
To run this example, use:
bash inference_demo.shwith infer_model_name=odesign_base_na_rigid, design_modality=rna, and input_json_path=./examples/na_design/prot_binding_rna/odesign_input.json.
ODesign can generate cyclic peptides that bind to specific protein targets. You need to provide a reference structure containing the target protein and specify the hotspot residues that define the binding interface. The model will generate a new cyclic peptide chain that interacts with the target at the specified hotspot.
To run this example, use:
bash inference_demo.shwith infer_model_name=odesign_base_prot_flex and input_json_path=./examples/cyclic_peptide_design/odesign_input.json.
ODesign can partially modify existing binding molecules to potentially enhance stability, modulate specificity, or improve expressibility. You need to provide a reference structure containing the target molecule, and specify the partial_diff field in the input JSON file to indicate the regions that require modifications. Please refer to Section B.3 Partial Diffusion in our Supplementary Information for details.
To run this example, use:
bash inference_demo.shwith infer_model_name=odesign_base_prot_rigid, input_json_path=./examples/protein_design/prot_binding_prot_partial_diff/odesign_input.json and enable_partial_diff=true.
Step 1 — Download Required Training Data
Before training ODesign, please download the odesign_full_data.tar.gz from Google Drive, and unzip the file using the following command. About 850 GB of disk space is required to keep the unzipped files.
tar -xzvf [data_root_dir]/odesign_train_data.tar.gz -C [data_root_dir]Step 2 — Run the Training Demo
After data preparation, launch the training process using:
bash train_demo.shPlease note that the ckpt_root_dir in train_demo.sh should contain the pre-trained folding model checkpoint for ODesign initialization if you are not training from scratch. Our default training setting employs protenix_base_default_v0.5.0.
If you use ODesign in your work, please cite the following:
@misc{zhang2025odesign,
title={ODesign: A World Model for Biomolecular Interaction Design},
author={Odin Zhang and Xujun Zhang and Haitao Lin and Cheng Tan and Qinghan Wang and Yuanle Mo and Qiantai Feng and Gang Du and Yuntao Yu and Zichang Jin and Ziyi You and Peicong Lin and Yijie Zhang and Yuyang Tao and Shicheng Chen and Jack Xiaoyu Chen and Chenqing Hua and Weibo Zhao and Runze Ma and Yunpeng Xia and Kejun Ying and Jun Li and Yundian Zeng and Lijun Lang and Peichen Pan and Hanqun Cao and Zihao Song and Bo Qiang and Jiaqi Wang and Pengfei Ji and Lei Bai and Jian Zhang and Chang-yu Hsieh and Pheng Ann Heng and Siqi Sun and Tingjun Hou and Shuangjia Zheng},
year={2025},
eprint={2510.22304},
archivePrefix={arXiv},
primaryClass={q-bio.BM},
url={https://arxiv.org/abs/2510.22304},
}
This project code draws in part upon Protenix and OpenFold, and is supported under the Apache 2.0 License. Thanks for their great work and code.
This project is supported by Lingang Laboratory, Zhejiang University, and The Chinese University of Hong Kong. We are actively seeking highly motivated and talented PhD students to join our team. We offer PhD training opportunities in Pharmacy / AI at Zhejiang University, or Computer Science at The Chinese University of Hong Kong.
If you are interested, please contact odinz@link.cuhk.edu.hk
Both source code and model parameters of ODesign are released under the Apache 2.0 License.








