Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 17 additions & 109 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,109 +1,17 @@
# Git
.git
.gitignore
.gitattributes

# Documentation
README.md
*.md
docs/
*.rst

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Testing
.pytest_cache/
.coverage
htmlcov/
.tox/
.nox/
coverage.xml
*.cover
.hypothesis/

# Jupyter Notebook
.ipynb_checkpoints

# Environment
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# IDE
.vscode/
.idea/
*.swp
*.swo
*~

# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Temporary files
*.tmp
*.temp
*.log

# Docker
Dockerfile*
docker-compose*
.dockerignore

# CI/CD
.github/
.gitlab-ci.yml
.travis.yml
.circleci/

# Large data files (if any)
*.h5
*.hdf5
*.pkl
*.pickle
*.npz
*.npy
data/
datasets/

# Model files (if any)
models/
checkpoints/
*.ckpt
*.pth
*.pt

# Results and outputs
results/
outputs/
logs/
# Exclude everything
*

# Allow specific files
!CITATION.cff
!LICENSE
!pyproject.toml
!README.md
!setup.py

# Allow specific directories
!deepspeed_configs/**
!devtools/**
!environments/**
!examples/**
!openfold3/**
!scripts/**
5 changes: 5 additions & 0 deletions .github/workflows/ci-test-reusable.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ on:
# Can only be called by another workflow, not directly by the user
workflow_call:
inputs:
build_mode:
description: 'Build mode: "lock" for reproducible builds, "yaml" for flexible dev builds'
required: true
type: string
cuda_base_image_tag:
description: 'CUDA base image tag (e.g., 12.2.2-cudnn8-devel-ubuntu22.04)'
required: true
Expand Down Expand Up @@ -68,6 +72,7 @@ jobs:
push: true
build-args: |
CUDA_BASE_IMAGE_TAG=${{ inputs.cuda_base_image_tag }}
BUILD_MODE=${{ inputs.build_mode }}
tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:test-${{ inputs.cuda_base_image_tag }}-${{ github.sha }}
cache-from: type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache-${{ inputs.cuda_base_image_tag }}
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/ci-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,12 @@ jobs:
matrix:
include:
- cuda_base_image_tag: "12.1.1-cudnn8-devel-ubuntu22.04"
build_mode: "yaml"
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}-${{ matrix.cuda_base_image_tag }}
cancel-in-progress: true
uses: ./.github/workflows/ci-test-reusable.yml
with:
cuda_base_image_tag: ${{ matrix.cuda_base_image_tag }}
build_mode: ${{ matrix.build_mode }}
secrets: inherit
56 changes: 48 additions & 8 deletions docker/DOCKER.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,75 @@
## Production images
## Updating the production.lock file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add documentation about where the current production.lock is generated? Specifically, what kind of instance / system, and any other variables that are relevant to environment resolution.

For my understanding: Do we expect the production.lock to change if the system is a GPU / CPU? Or should it be the same because we specify the same docker base image with CUDA?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point – currently the updating and generation is kind of synonymous (it doesn't say that). And it absolutely is platform specific, as you mentioned in your comments later. Yeah, it's specific to the platform (linux64, arm64, etc). The GPU/CPU point is conditional on what's in the environment.yaml – if that pulls a CPU-version of torch, that's what will be installed in the env.

TODO
When you modify `environments/production.yml`, you need to regenerate the lock file to pin exact versions. This ensures reproducible builds, prevents conda from resolving the environment again. `environment/production.lock` is then used for 'stable' builds.

For Blackwell image build, see [Build_instructions_blackwell.md](Build_instructions_blackwell.md)
```bash
# Build the lock file generator image
docker build -f docker/Dockerfile.update-reqs -t openfold3-update-reqs .

# Generate the lock file (linux-64 only for now)
docker run --rm openfold3-update-reqs > environments/production.lock

# Commit the updated lock file
git add environments/production.lock
git commit -m "Update production.lock"
```

## Development images

These images are the biggest but come with all the build tooling, needed to compile things at runtime (Deepspeed)

```bash
docker build \
-f docker/Dockerfile \
--target devel \
-t openfold-docker:devel-yaml .
```

Or more explicitly

```bash
docker build \
-f docker/Dockerfile \
--build-arg BUILD_MODE=yaml \
--build-arg CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04 \
--target devel \
-t openfold-docker:devel .
-t openfold-docker:devel-yaml .
```

## Test images

Build the test image
```
Build the test image, with additional test-only dependencies

```bash
docker build \
-f docker/development/Dockerfile \
-f docker/Dockerfile \
--target test \
-t openfold-docker:test .
```

Run the unit tests
```

```bash
docker run \
--rm \
-v $(pwd -P):/opt/openfold3 \
-t openfold-docker:test \
pytest openfold3/tests -vvv
```

## Production images

Build a 'stable' image with all the dependancies exactly pinned (production.lock)

```bash
docker build \
-f docker/Dockerfile \
--build-arg BUILD_MODE=lock \
--build-arg CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04 \
--target devel \
-t openfold-docker:devel-locked .
```

For Blackwell image build, see [Build_instructions_blackwell.md](Build_instructions_blackwell.md)


49 changes: 39 additions & 10 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
# Full performance multi-stage build with complete CUDA toolchain
ARG CUDA_BASE_IMAGE_TAG=12.2.2-cudnn8-devel-ubuntu22.04
ARG CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated and snuck in via #70, turning back to the old version we used

FROM nvidia/cuda:${CUDA_BASE_IMAGE_TAG} AS builder

# Environment mode: "lock" for reproducible builds, "yaml" for flexible dev builds
ARG BUILD_MODE=lock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to provide a default option for this argument? There are some users who may prefer to use the Dockerfile directly to build their own image, rather than use the publishsed Dockerfile image, so it would be nice to have a default argument here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, yeah i think this already contains the default, but it probably should be 'yaml' instead of 'lock'
The non-default version of this is just

ARG BUILD_MODE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed now


# Install complete build dependencies including CUDA compiler tools
RUN apt-get update && apt-get install -y \
wget \
Expand All @@ -16,19 +19,37 @@ RUN apt-get update && apt-get install -y \
# Install miniforge
# FIXME this needs to be pinned, with more recent versions (25.11.0-1) the package resolution is stuck
RUN wget -P /tmp \
"https://github.com/conda-forge/miniforge/releases/download/25.3.1-0/Miniforge3-Linux-x86_64.sh" \
"https://github.com/conda-forge/miniforge/releases/download/25.11.0-1/Miniforge3-Linux-x86_64.sh" \
&& bash /tmp/Miniforge3-Linux-x86_64.sh -b -p /opt/conda \
&& rm /tmp/Miniforge3-Linux-x86_64.sh

ENV PATH=/opt/conda/bin:$PATH
ENV CONDA_PREFIX=/opt/conda

# Copy and install dependencies with aggressive cleanup
COPY environments/production.yml /opt/openfold3/environment.yml
RUN mamba env update -n base --file /opt/openfold3/environment.yml \
# Copy environment files for both modes (small files, good for caching)
# To regenerate the lock file, see docker/DOCKER.md
# Use BUILD_MODE=lock (default) for reproducible builds, BUILD_MODE=yml for flexible dev builds
COPY environments/production.lock /opt/openfold3/production.lock
COPY environments/production.yml /opt/openfold3/production.yml

# Install environment based on BUILD_MODE
# - lock: uses conda-lock for exact reproducible builds (training/production)
# - yaml: uses mamba env create for flexible version resolution (development/testing)
RUN mamba install -n base -c conda-forge conda-lock --yes \
&& if [ "$BUILD_MODE" = "lock" ]; then \
conda-lock install --name openfold3 /opt/openfold3/production.lock; \
elif [ "$BUILD_MODE" = "yaml" ]; then \
mamba env create -f /opt/openfold3/production.yml --name openfold3; \
else \
echo "Invalid BUILD_MODE: $BUILD_MODE. Use 'lock' or 'yaml'." && exit 1; \
fi \
&& mamba clean --all --yes \
&& conda clean --all --yes

# Activate the openfold3 environment by default
ENV PATH=/opt/conda/envs/openfold3/bin:$PATH
ENV CONDA_PREFIX=/opt/conda/envs/openfold3
ENV CONDA_DEFAULT_ENV=openfold3

# Copy the minimal set of files needed to install the package
COPY setup.py /opt/openfold3/
COPY pyproject.toml /opt/openfold3/
Expand All @@ -52,7 +73,7 @@ ENV TORCH_CUDA_ARCH_LIST="8.0;8.6;9.0"
# python3 -c "import deepspeed; print('DeepSpeed ops loaded successfully')"

# Devel stage - use devel image for full CUDA support
ARG CUDA_BASE_IMAGE_TAG=12.2.2-cudnn8-devel-ubuntu22.04
ARG CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04
FROM nvidia/cuda:${CUDA_BASE_IMAGE_TAG} AS devel

# Install devel dependencies
Expand Down Expand Up @@ -82,16 +103,24 @@ RUN rm -rf /usr/local/cuda/doc \

# Copy the entire conda environment
COPY --from=builder /opt/conda /opt/conda
ENV PATH=/opt/conda/bin:$PATH

# Copy CUTLASS
COPY --from=builder /opt/cutlass /opt/cutlass

# Activate the openfold3 environment by default
ENV PATH=/opt/conda/envs/openfold3/bin:/opt/conda/bin:$PATH
ENV CONDA_PREFIX=/opt/conda/envs/openfold3
ENV CONDA_DEFAULT_ENV=openfold3

# Ensure interactive shells also activate openfold3
RUN /opt/conda/bin/conda init bash \
&& echo "conda activate openfold3" >> /root/.bashrc

# Set environment variables
ENV CUTLASS_PATH=/opt/cutlass
ENV KMP_AFFINITY=none
ENV LIBRARY_PATH=/opt/conda/lib:$LIBRARY_PATH
ENV LD_LIBRARY_PATH=/opt/conda/lib:$LD_LIBRARY_PATH
ENV LIBRARY_PATH=/opt/conda/envs/openfold3/lib:$LIBRARY_PATH
ENV LD_LIBRARY_PATH=/opt/conda/envs/openfold3/lib:$LD_LIBRARY_PATH

# Copy the entire source tree directly (at the very end for optimal caching)
COPY . /opt/openfold3
Expand Down
30 changes: 30 additions & 0 deletions docker/Dockerfile.update-reqs
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Dockerfile for generating conda environment lock files
# This produces a fully-pinned lock file for reproducible builds
#
# Usage:
# docker build -f docker/Dockerfile.update-reqs -t openfold3-update-reqs .
# docker run --rm openfold3-update-reqs > environments/production.lock

FROM mambaorg/micromamba:1.5.10

USER root

# Install conda-lock
RUN micromamba install -y -n base -c conda-forge conda-lock \
&& micromamba clean --all --yes

USER $MAMBA_USER

COPY --chown=$MAMBA_USER:$MAMBA_USER environments/production.yml /tmp/environment.yml

# Generate explicit lock file for linux-64
# The explicit format is directly consumable by mamba/conda
RUN micromamba run -n base conda-lock lock \
--mamba \
--platform linux-64 \
--file /tmp/environment.yml \
--kind explicit \
--filename-template '/tmp/production-{platform}.lock'

# Output the lock file to stdout when container runs
CMD ["cat", "/tmp/production-linux-64.lock"]
Loading
Loading