-
Notifications
You must be signed in to change notification settings - Fork 84
build: added environment.lock for 'stable' builds #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
247d052
93b4dee
3e4c78b
13b0707
39bdb82
f303e56
a8f6f5d
6e790ce
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,109 +1,17 @@ | ||
| # Git | ||
| .git | ||
| .gitignore | ||
| .gitattributes | ||
|
|
||
| # Documentation | ||
| README.md | ||
| *.md | ||
| docs/ | ||
| *.rst | ||
|
|
||
| # Python | ||
| __pycache__/ | ||
| *.py[cod] | ||
| *$py.class | ||
| *.so | ||
| .Python | ||
| build/ | ||
| develop-eggs/ | ||
| dist/ | ||
| downloads/ | ||
| eggs/ | ||
| .eggs/ | ||
| lib/ | ||
| lib64/ | ||
| parts/ | ||
| sdist/ | ||
| var/ | ||
| wheels/ | ||
| *.egg-info/ | ||
| .installed.cfg | ||
| *.egg | ||
| MANIFEST | ||
|
|
||
| # Testing | ||
| .pytest_cache/ | ||
| .coverage | ||
| htmlcov/ | ||
| .tox/ | ||
| .nox/ | ||
| coverage.xml | ||
| *.cover | ||
| .hypothesis/ | ||
|
|
||
| # Jupyter Notebook | ||
| .ipynb_checkpoints | ||
|
|
||
| # Environment | ||
| .env | ||
| .venv | ||
| env/ | ||
| venv/ | ||
| ENV/ | ||
| env.bak/ | ||
| venv.bak/ | ||
|
|
||
| # IDE | ||
| .vscode/ | ||
| .idea/ | ||
| *.swp | ||
| *.swo | ||
| *~ | ||
|
|
||
| # OS | ||
| .DS_Store | ||
| .DS_Store? | ||
| ._* | ||
| .Spotlight-V100 | ||
| .Trashes | ||
| ehthumbs.db | ||
| Thumbs.db | ||
|
|
||
| # Temporary files | ||
| *.tmp | ||
| *.temp | ||
| *.log | ||
|
|
||
| # Docker | ||
| Dockerfile* | ||
| docker-compose* | ||
| .dockerignore | ||
|
|
||
| # CI/CD | ||
| .github/ | ||
| .gitlab-ci.yml | ||
| .travis.yml | ||
| .circleci/ | ||
|
|
||
| # Large data files (if any) | ||
| *.h5 | ||
| *.hdf5 | ||
| *.pkl | ||
| *.pickle | ||
| *.npz | ||
| *.npy | ||
| data/ | ||
| datasets/ | ||
|
|
||
| # Model files (if any) | ||
| models/ | ||
| checkpoints/ | ||
| *.ckpt | ||
| *.pth | ||
| *.pt | ||
|
|
||
| # Results and outputs | ||
| results/ | ||
| outputs/ | ||
| logs/ | ||
| # Exclude everything | ||
| * | ||
|
|
||
| # Allow specific files | ||
| !CITATION.cff | ||
| !LICENSE | ||
| !pyproject.toml | ||
| !README.md | ||
| !setup.py | ||
|
|
||
| # Allow specific directories | ||
| !deepspeed_configs/** | ||
| !devtools/** | ||
| !environments/** | ||
| !examples/** | ||
| !openfold3/** | ||
| !scripts/** |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,35 +1,75 @@ | ||
| ## Production images | ||
| ## Updating the production.lock file | ||
|
|
||
| TODO | ||
| When you modify `environments/production.yml`, you need to regenerate the lock file to pin exact versions. This ensures reproducible builds, prevents conda from resolving the environment again. `environment/production.lock` is then used for 'stable' builds. | ||
|
|
||
| For Blackwell image build, see [Build_instructions_blackwell.md](Build_instructions_blackwell.md) | ||
| ```bash | ||
| # Build the lock file generator image | ||
| docker build -f docker/Dockerfile.update-reqs -t openfold3-update-reqs . | ||
|
|
||
| # Generate the lock file (linux-64 only for now) | ||
| docker run --rm openfold3-update-reqs > environments/production.lock | ||
|
|
||
| # Commit the updated lock file | ||
| git add environments/production.lock | ||
| git commit -m "Update production.lock" | ||
| ``` | ||
|
|
||
| ## Development images | ||
|
|
||
| These images are the biggest but come with all the build tooling, needed to compile things at runtime (Deepspeed) | ||
|
|
||
| ```bash | ||
| docker build \ | ||
| -f docker/Dockerfile \ | ||
| --target devel \ | ||
| -t openfold-docker:devel-yaml . | ||
| ``` | ||
|
|
||
| Or more explicitly | ||
|
|
||
| ```bash | ||
| docker build \ | ||
| -f docker/Dockerfile \ | ||
| --build-arg BUILD_MODE=yaml \ | ||
| --build-arg CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04 \ | ||
| --target devel \ | ||
| -t openfold-docker:devel . | ||
| -t openfold-docker:devel-yaml . | ||
| ``` | ||
|
|
||
| ## Test images | ||
|
|
||
| Build the test image | ||
| ``` | ||
| Build the test image, with additional test-only dependencies | ||
|
|
||
| ```bash | ||
| docker build \ | ||
| -f docker/development/Dockerfile \ | ||
| -f docker/Dockerfile \ | ||
| --target test \ | ||
| -t openfold-docker:test . | ||
| ``` | ||
|
|
||
| Run the unit tests | ||
| ``` | ||
|
|
||
| ```bash | ||
| docker run \ | ||
| --rm \ | ||
| -v $(pwd -P):/opt/openfold3 \ | ||
| -t openfold-docker:test \ | ||
| pytest openfold3/tests -vvv | ||
| ``` | ||
|
|
||
| ## Production images | ||
|
|
||
| Build a 'stable' image with all the dependancies exactly pinned (production.lock) | ||
|
|
||
| ```bash | ||
| docker build \ | ||
| -f docker/Dockerfile \ | ||
| --build-arg BUILD_MODE=lock \ | ||
| --build-arg CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04 \ | ||
| --target devel \ | ||
| -t openfold-docker:devel-locked . | ||
| ``` | ||
|
|
||
| For Blackwell image build, see [Build_instructions_blackwell.md](Build_instructions_blackwell.md) | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,10 @@ | ||
| # Full performance multi-stage build with complete CUDA toolchain | ||
| ARG CUDA_BASE_IMAGE_TAG=12.2.2-cudnn8-devel-ubuntu22.04 | ||
| ARG CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04 | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is unrelated and snuck in via #70, turning back to the old version we used |
||
| FROM nvidia/cuda:${CUDA_BASE_IMAGE_TAG} AS builder | ||
|
|
||
| # Environment mode: "lock" for reproducible builds, "yaml" for flexible dev builds | ||
| ARG BUILD_MODE=lock | ||
|
||
|
|
||
| # Install complete build dependencies including CUDA compiler tools | ||
| RUN apt-get update && apt-get install -y \ | ||
| wget \ | ||
|
|
@@ -16,19 +19,37 @@ RUN apt-get update && apt-get install -y \ | |
| # Install miniforge | ||
| # FIXME this needs to be pinned, with more recent versions (25.11.0-1) the package resolution is stuck | ||
| RUN wget -P /tmp \ | ||
| "https://github.com/conda-forge/miniforge/releases/download/25.3.1-0/Miniforge3-Linux-x86_64.sh" \ | ||
| "https://github.com/conda-forge/miniforge/releases/download/25.11.0-1/Miniforge3-Linux-x86_64.sh" \ | ||
| && bash /tmp/Miniforge3-Linux-x86_64.sh -b -p /opt/conda \ | ||
| && rm /tmp/Miniforge3-Linux-x86_64.sh | ||
|
|
||
| ENV PATH=/opt/conda/bin:$PATH | ||
| ENV CONDA_PREFIX=/opt/conda | ||
|
|
||
| # Copy and install dependencies with aggressive cleanup | ||
| COPY environments/production.yml /opt/openfold3/environment.yml | ||
| RUN mamba env update -n base --file /opt/openfold3/environment.yml \ | ||
| # Copy environment files for both modes (small files, good for caching) | ||
| # To regenerate the lock file, see docker/DOCKER.md | ||
| # Use BUILD_MODE=lock (default) for reproducible builds, BUILD_MODE=yml for flexible dev builds | ||
| COPY environments/production.lock /opt/openfold3/production.lock | ||
| COPY environments/production.yml /opt/openfold3/production.yml | ||
|
|
||
| # Install environment based on BUILD_MODE | ||
| # - lock: uses conda-lock for exact reproducible builds (training/production) | ||
| # - yaml: uses mamba env create for flexible version resolution (development/testing) | ||
| RUN mamba install -n base -c conda-forge conda-lock --yes \ | ||
| && if [ "$BUILD_MODE" = "lock" ]; then \ | ||
| conda-lock install --name openfold3 /opt/openfold3/production.lock; \ | ||
| elif [ "$BUILD_MODE" = "yaml" ]; then \ | ||
| mamba env create -f /opt/openfold3/production.yml --name openfold3; \ | ||
| else \ | ||
| echo "Invalid BUILD_MODE: $BUILD_MODE. Use 'lock' or 'yaml'." && exit 1; \ | ||
| fi \ | ||
| && mamba clean --all --yes \ | ||
| && conda clean --all --yes | ||
|
|
||
| # Activate the openfold3 environment by default | ||
| ENV PATH=/opt/conda/envs/openfold3/bin:$PATH | ||
| ENV CONDA_PREFIX=/opt/conda/envs/openfold3 | ||
| ENV CONDA_DEFAULT_ENV=openfold3 | ||
|
|
||
| # Copy the minimal set of files needed to install the package | ||
| COPY setup.py /opt/openfold3/ | ||
| COPY pyproject.toml /opt/openfold3/ | ||
|
|
@@ -52,7 +73,7 @@ ENV TORCH_CUDA_ARCH_LIST="8.0;8.6;9.0" | |
| # python3 -c "import deepspeed; print('DeepSpeed ops loaded successfully')" | ||
|
|
||
| # Devel stage - use devel image for full CUDA support | ||
| ARG CUDA_BASE_IMAGE_TAG=12.2.2-cudnn8-devel-ubuntu22.04 | ||
| ARG CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04 | ||
| FROM nvidia/cuda:${CUDA_BASE_IMAGE_TAG} AS devel | ||
|
|
||
| # Install devel dependencies | ||
|
|
@@ -82,16 +103,24 @@ RUN rm -rf /usr/local/cuda/doc \ | |
|
|
||
| # Copy the entire conda environment | ||
| COPY --from=builder /opt/conda /opt/conda | ||
| ENV PATH=/opt/conda/bin:$PATH | ||
|
|
||
| # Copy CUTLASS | ||
| COPY --from=builder /opt/cutlass /opt/cutlass | ||
|
|
||
| # Activate the openfold3 environment by default | ||
| ENV PATH=/opt/conda/envs/openfold3/bin:/opt/conda/bin:$PATH | ||
| ENV CONDA_PREFIX=/opt/conda/envs/openfold3 | ||
| ENV CONDA_DEFAULT_ENV=openfold3 | ||
|
|
||
| # Ensure interactive shells also activate openfold3 | ||
| RUN /opt/conda/bin/conda init bash \ | ||
| && echo "conda activate openfold3" >> /root/.bashrc | ||
|
|
||
| # Set environment variables | ||
| ENV CUTLASS_PATH=/opt/cutlass | ||
| ENV KMP_AFFINITY=none | ||
| ENV LIBRARY_PATH=/opt/conda/lib:$LIBRARY_PATH | ||
| ENV LD_LIBRARY_PATH=/opt/conda/lib:$LD_LIBRARY_PATH | ||
| ENV LIBRARY_PATH=/opt/conda/envs/openfold3/lib:$LIBRARY_PATH | ||
| ENV LD_LIBRARY_PATH=/opt/conda/envs/openfold3/lib:$LD_LIBRARY_PATH | ||
|
|
||
| # Copy the entire source tree directly (at the very end for optimal caching) | ||
| COPY . /opt/openfold3 | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| # Dockerfile for generating conda environment lock files | ||
| # This produces a fully-pinned lock file for reproducible builds | ||
| # | ||
| # Usage: | ||
| # docker build -f docker/Dockerfile.update-reqs -t openfold3-update-reqs . | ||
| # docker run --rm openfold3-update-reqs > environments/production.lock | ||
|
|
||
| FROM mambaorg/micromamba:1.5.10 | ||
|
|
||
| USER root | ||
|
|
||
| # Install conda-lock | ||
| RUN micromamba install -y -n base -c conda-forge conda-lock \ | ||
| && micromamba clean --all --yes | ||
|
|
||
| USER $MAMBA_USER | ||
|
|
||
| COPY --chown=$MAMBA_USER:$MAMBA_USER environments/production.yml /tmp/environment.yml | ||
|
|
||
| # Generate explicit lock file for linux-64 | ||
| # The explicit format is directly consumable by mamba/conda | ||
| RUN micromamba run -n base conda-lock lock \ | ||
| --mamba \ | ||
| --platform linux-64 \ | ||
| --file /tmp/environment.yml \ | ||
| --kind explicit \ | ||
| --filename-template '/tmp/production-{platform}.lock' | ||
|
|
||
| # Output the lock file to stdout when container runs | ||
| CMD ["cat", "/tmp/production-linux-64.lock"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add documentation about where the current
production.lockis generated? Specifically, what kind of instance / system, and any other variables that are relevant to environment resolution.For my understanding: Do we expect the
production.lockto change if the system is a GPU / CPU? Or should it be the same because we specify the same docker base image with CUDA?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point – currently the updating and generation is kind of synonymous (it doesn't say that). And it absolutely is platform specific, as you mentioned in your comments later. Yeah, it's specific to the platform (linux64, arm64, etc). The GPU/CPU point is conditional on what's in the environment.yaml – if that pulls a CPU-version of torch, that's what will be installed in the env.