Skip to content
1 change: 1 addition & 0 deletions docs/source/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ This script will:
- Setup a directory for OpenFold3 model parameters [default: `~/.openfold3`]
- Writes the path to `$OPENFOLD_CACHE/ckpt_path`
- Download the model parameters, if the parameter file does not already exist
- Download and setup the [Chemical Component Dictionary (CCD) with Biotite](https://www.biotite-python.org/latest/apidoc/biotite.structure.info.get_ccd.html)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the pointer of this link is super helpful, this is just a function to work with Biotite's CCD representation in code. Why not just add python -m biotite.setup_ccd if the intent is to show users how to download and set it up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could surface this as work-around but that's really our abstraction leaking. One day we might remove biotite, and the users should be none the wiser about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was to provide users who might be unfamiliar with the CCD some context about what it is since it is now included in our top level Installation documentation and how biotite is used in relation to the CCD.

We can certainly replace this link with a more general pointer to the CCD (perhaps this one: https://www.wwpdb.org/data/ccd) if that would be more helpful. This might make more sense if we now plan to provide the CCD file ourselves

- Optionally runs an inference integration test on two samples, without MSA alignments (~5 min on A100)
- N.B. To run the integration tests, `pytest` must be installed.

Expand Down
33 changes: 32 additions & 1 deletion openfold3/setup_openfold.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,16 @@
Downloads model parameters and runs verification tests.
"""

import hashlib
import importlib.util
import logging
import os
import subprocess
import sys
from pathlib import Path

import biotite.setup_ccd

logging.basicConfig(level=logging.INFO, format="%(message)s")
logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -172,6 +175,31 @@ def download_parameters(param_dir) -> None:
logger.info("Download completed successfully.")


def setup_biotite_ccd(*, force_download: bool) -> None:
# FIXME: This is only needed because we're locked into biotite 1.2.0 for now.
# And this versions pull a stale CCD file by default.
# Once we can upgrade biotite, we can remove this function entirely
STALE_CCD_CHECKSUMS = {
"f106327c29bc6a247f8eab541648a501" # biotite==1.2.0
}

def ccd_is_stale(*, ccd_path: Path) -> bool:
if not ccd_path.exists():
return True
md5 = hashlib.md5(ccd_path.read_bytes()).hexdigest()
return md5 in STALE_CCD_CHECKSUMS

logger.info("Starting Biotite CCD setup...")
if force_download or ccd_is_stale(ccd_path=biotite.setup_ccd.OUTPUT_CCD):
logger.info(f"Downloading biotite CCD to {biotite.setup_ccd.OUTPUT_CCD}...")
biotite.setup_ccd.main()
else:
logger.info(
"Biotite CCD already configured at "
f"{biotite.setup_ccd.OUTPUT_CCD}, skipping."
)


def run_integration_tests() -> None:
"""Run integration tests."""
confirm = input("Run integration tests? (yes/no)")
Expand Down Expand Up @@ -226,7 +254,10 @@ def main():
if should_download:
download_parameters(param_dir)

# Step 5: Run tests (always run regardless of download status)
# Step 5: Setup CCD with biotite
setup_biotite_ccd(force_download=False)

# Step 6: Run tests (always run regardless of download status)
run_integration_tests()


Expand Down
8 changes: 8 additions & 0 deletions openfold3/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
import pytest
from biotite.structure import AtomArray

from openfold3.setup_openfold import setup_biotite_ccd


@pytest.fixture
def dummy_atom_array():
Expand Down Expand Up @@ -49,3 +51,9 @@ def mse_ala_atom_array():
atom_array.hetero[8:] = False

return atom_array


@pytest.fixture(scope="session", autouse=True)
def ensure_biotite_ccd():
"""Download CCD file before any tests run (once per test session)."""
setup_biotite_ccd(force_download=True)
40 changes: 40 additions & 0 deletions openfold3/tests/core/data/primitives/structure/test_query.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
import pytest

from openfold3.core.data.primitives.structure.query import (
structure_with_ref_mol_from_ccd_code,
structure_with_ref_mol_from_smiles,
)


@pytest.mark.parametrize(
"smiles, ccd_code",
[
# Simple test cases
("CCO", "EOH"), # Ethanol
# Pat Walter's CYP substrates
("CC(C)(C)C(=O)Nc1cc(ccc1n2ccnc2)C(F)(F)F", "A1ASV"), # cyp3a4_9bv5
("Cc1nc(cs1)c2ccc(cc2)n3c(cnn3)c4ccc(cc4)OC", "A1ASU"), # cyp3a4_9bv6
("c1ccc(c(c1)CCC(=O)NC[C@@H]2Cc3cccc(c3O2)c4cccnc4)Cl", "A1AST"), # cyp3a4_9bv7
("c1cc(c(cc1C(F)(F)F)NC(=O)C2CCC2)n3ccnc3", "A1ASS"), # cyp3a4_9bv8
("c1ccc(c(c1)NC(=O)Nc2cc(ccc2n3ccnc3)C(F)(F)F)Cl", "A1ASR"), # cyp3a4_9bv9
("c1ccc(c(c1)CCCl)NC(=O)Nc2cc(ccc2n3ccnc3)C(F)(F)F", "A1ASQ"), # cyp3a4_9bva
("c1ccc(c(c1)CC(=O)Nc2cc(ccc2n3ccnc3)C(F)(F)F)Cl", "A1ASP"), # cyp3a4_9bvb
(
"c1ccc(c(c1)N(Cc2cccc(c2)O)C(=O)Nc3cc(ccc3n4ccnc4)C(F)(F)F)Cl",
"A1ASO",
), # cyp3a4_9bvc
(
"c1ccc(c(c1)NC(=O)N(Cc2cccc(c2)O)c3cc(ccc3n4ccnc4)C(F)(F)F)Cl",
"A1BNX",
), # cyp3a4_9ms1
(
"c1ccc(cc1)C(c2ccccc2)([C@@H]3CCN(C3)CCc4ccc5c(c4)CCO5)C(=O)N",
"A1CIW",
), # cyp3a4_9plk
],
)
def test_consistent_structure_from_smiles_and_ccd_code(smiles, ccd_code):
struct_from_smiles = structure_with_ref_mol_from_smiles(smiles, chain_id="X")
struct_from_ccd = structure_with_ref_mol_from_ccd_code(ccd_code, chain_id="X")

assert len(struct_from_smiles.atom_array) == len(struct_from_ccd.atom_array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick suggestions to make this test more comprehensive:

  • Would it make sense to use the custom_assert_utils.assert_atom_array_equal utility here? Or would we expect some differences in atom ordering / bond ordering so that this assert would fail.

  • If assert_atom_array_equal is not a good fit here, perhaps we could add a test to check the number of carbons or some other aggregate property.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it'd be really nice indeed if we could use that – but it currently doesn't work, just tried empirically (atm atom names different between smiles and CCD, eg C01 vs C1) – maybe we can just match on element count?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah figures. I think the atom counter is good enough here for now, thanks for adding it.

Loading