-
Notifications
You must be signed in to change notification settings - Fork 84
build: added environment.lock for 'stable' builds #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
247d052
Update production.lock
jandom 93b4dee
feat: use .lock files instead of environment.yml which may drift
jandom 3e4c78b
use a dedicated conda env and auto-activate it
jandom 13b0707
move to "deny-all, allow-list" in dockerignore
jandom 39bdb82
test: with the environment.lock we can take a fresher version of mini…
jandom f303e56
support building both from .lock and .yaml
jandom a8f6f5d
change default to 'yaml'
jandom 6e790ce
review comments: make the platform explicit
jandom File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,109 +1,17 @@ | ||
| # Git | ||
| .git | ||
| .gitignore | ||
| .gitattributes | ||
|
|
||
| # Documentation | ||
| README.md | ||
| *.md | ||
| docs/ | ||
| *.rst | ||
|
|
||
| # Python | ||
| __pycache__/ | ||
| *.py[cod] | ||
| *$py.class | ||
| *.so | ||
| .Python | ||
| build/ | ||
| develop-eggs/ | ||
| dist/ | ||
| downloads/ | ||
| eggs/ | ||
| .eggs/ | ||
| lib/ | ||
| lib64/ | ||
| parts/ | ||
| sdist/ | ||
| var/ | ||
| wheels/ | ||
| *.egg-info/ | ||
| .installed.cfg | ||
| *.egg | ||
| MANIFEST | ||
|
|
||
| # Testing | ||
| .pytest_cache/ | ||
| .coverage | ||
| htmlcov/ | ||
| .tox/ | ||
| .nox/ | ||
| coverage.xml | ||
| *.cover | ||
| .hypothesis/ | ||
|
|
||
| # Jupyter Notebook | ||
| .ipynb_checkpoints | ||
|
|
||
| # Environment | ||
| .env | ||
| .venv | ||
| env/ | ||
| venv/ | ||
| ENV/ | ||
| env.bak/ | ||
| venv.bak/ | ||
|
|
||
| # IDE | ||
| .vscode/ | ||
| .idea/ | ||
| *.swp | ||
| *.swo | ||
| *~ | ||
|
|
||
| # OS | ||
| .DS_Store | ||
| .DS_Store? | ||
| ._* | ||
| .Spotlight-V100 | ||
| .Trashes | ||
| ehthumbs.db | ||
| Thumbs.db | ||
|
|
||
| # Temporary files | ||
| *.tmp | ||
| *.temp | ||
| *.log | ||
|
|
||
| # Docker | ||
| Dockerfile* | ||
| docker-compose* | ||
| .dockerignore | ||
|
|
||
| # CI/CD | ||
| .github/ | ||
| .gitlab-ci.yml | ||
| .travis.yml | ||
| .circleci/ | ||
|
|
||
| # Large data files (if any) | ||
| *.h5 | ||
| *.hdf5 | ||
| *.pkl | ||
| *.pickle | ||
| *.npz | ||
| *.npy | ||
| data/ | ||
| datasets/ | ||
|
|
||
| # Model files (if any) | ||
| models/ | ||
| checkpoints/ | ||
| *.ckpt | ||
| *.pth | ||
| *.pt | ||
|
|
||
| # Results and outputs | ||
| results/ | ||
| outputs/ | ||
| logs/ | ||
| # Exclude everything | ||
| * | ||
|
|
||
| # Allow specific files | ||
| !CITATION.cff | ||
| !LICENSE | ||
| !pyproject.toml | ||
| !README.md | ||
| !setup.py | ||
|
|
||
| # Allow specific directories | ||
| !deepspeed_configs/** | ||
| !devtools/** | ||
| !environments/** | ||
| !examples/** | ||
| !openfold3/** | ||
| !scripts/** |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,35 +1,78 @@ | ||
| ## Production images | ||
| ## Generating and updating production.lock file | ||
|
|
||
| TODO | ||
| While a conda env can be created from `environments/production.yml`, this causes the environment to be resolved from scratch everytime. | ||
| For reproducible builds, one needs to generate a .lock file that exactly re-creates the environment. | ||
|
|
||
| For Blackwell image build, see [Build_instructions_blackwell.md](Build_instructions_blackwell.md) | ||
| When you modify `environments/production.yml`, you need to regenerate the lock file to pin exact versions. This ensures reproducible builds, prevents conda from resolving the environment again. `environment/production.lock` is then used for 'stable' builds. | ||
|
|
||
| ```bash | ||
| # Build the lock file generator image | ||
| docker build -f docker/Dockerfile.update-reqs -t openfold3-update-reqs . | ||
|
|
||
| # Generate the lock file (linux-64 only for now) | ||
| docker run --rm openfold3-update-reqs > environments/production-linux-64.lock | ||
|
|
||
| # Commit the updated lock file | ||
| git add environments/production-linux-64.lock | ||
| git commit -m "Update production-linux-64.lock" | ||
| ``` | ||
|
|
||
| ## Development images | ||
|
|
||
| These images are the biggest but come with all the build tooling, needed to compile things at runtime (Deepspeed) | ||
|
|
||
| ```bash | ||
| docker build \ | ||
| -f docker/Dockerfile \ | ||
| --target devel \ | ||
| -t openfold-docker:devel-yaml . | ||
| ``` | ||
|
|
||
| Or more explicitly | ||
|
|
||
| ```bash | ||
| docker build \ | ||
| -f docker/Dockerfile \ | ||
| --build-arg BUILD_MODE=yaml \ | ||
| --build-arg CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04 \ | ||
| --target devel \ | ||
| -t openfold-docker:devel . | ||
| -t openfold-docker:devel-yaml . | ||
| ``` | ||
|
|
||
| ## Test images | ||
|
|
||
| Build the test image | ||
| ``` | ||
| Build the test image, with additional test-only dependencies | ||
|
|
||
| ```bash | ||
| docker build \ | ||
| -f docker/development/Dockerfile \ | ||
| -f docker/Dockerfile \ | ||
| --target test \ | ||
| -t openfold-docker:test . | ||
| ``` | ||
|
|
||
| Run the unit tests | ||
| ``` | ||
|
|
||
| ```bash | ||
| docker run \ | ||
| --rm \ | ||
| -v $(pwd -P):/opt/openfold3 \ | ||
| -t openfold-docker:test \ | ||
| pytest openfold3/tests -vvv | ||
| ``` | ||
|
|
||
| ## Production images | ||
|
|
||
| Build a 'stable' image with all the dependancies exactly pinned (production.lock) | ||
|
|
||
| ```bash | ||
| docker build \ | ||
| -f docker/Dockerfile \ | ||
| --build-arg BUILD_MODE=lock \ | ||
| --build-arg CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04 \ | ||
| --target devel \ | ||
| -t openfold-docker:devel-locked . | ||
| ``` | ||
|
|
||
| For Blackwell image build, see [Build_instructions_blackwell.md](Build_instructions_blackwell.md) | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,10 @@ | ||
| # Full performance multi-stage build with complete CUDA toolchain | ||
| ARG CUDA_BASE_IMAGE_TAG=12.2.2-cudnn8-devel-ubuntu22.04 | ||
| ARG CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04 | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is unrelated and snuck in via #70, turning back to the old version we used |
||
| FROM nvidia/cuda:${CUDA_BASE_IMAGE_TAG} AS builder | ||
|
|
||
| # Environment mode: "lock" for reproducible builds, "yaml" for flexible dev builds | ||
| ARG BUILD_MODE=yaml | ||
|
|
||
| # Install complete build dependencies including CUDA compiler tools | ||
| RUN apt-get update && apt-get install -y \ | ||
| wget \ | ||
|
|
@@ -16,19 +19,37 @@ RUN apt-get update && apt-get install -y \ | |
| # Install miniforge | ||
| # FIXME this needs to be pinned, with more recent versions (25.11.0-1) the package resolution is stuck | ||
| RUN wget -P /tmp \ | ||
| "https://github.com/conda-forge/miniforge/releases/download/25.3.1-0/Miniforge3-Linux-x86_64.sh" \ | ||
| "https://github.com/conda-forge/miniforge/releases/download/25.11.0-1/Miniforge3-Linux-x86_64.sh" \ | ||
| && bash /tmp/Miniforge3-Linux-x86_64.sh -b -p /opt/conda \ | ||
| && rm /tmp/Miniforge3-Linux-x86_64.sh | ||
|
|
||
| ENV PATH=/opt/conda/bin:$PATH | ||
| ENV CONDA_PREFIX=/opt/conda | ||
|
|
||
| # Copy and install dependencies with aggressive cleanup | ||
| COPY environments/production.yml /opt/openfold3/environment.yml | ||
| RUN mamba env update -n base --file /opt/openfold3/environment.yml \ | ||
| # Copy environment files for both modes (small files, good for caching) | ||
| # To regenerate the lock file, see docker/DOCKER.md | ||
| # Use BUILD_MODE=yaml (default) for reproducible builds, BUILD_MODE=yml for flexible dev builds | ||
| COPY environments/production-linux-64.lock /opt/openfold3/production-linux-64.lock | ||
| COPY environments/production.yml /opt/openfold3/production.yml | ||
|
|
||
| # Install environment based on BUILD_MODE | ||
| # - lock: uses conda-lock for exact reproducible builds (training/production) | ||
| # - yaml: uses mamba env create for flexible version resolution (development/testing) | ||
| RUN mamba install -n base -c conda-forge conda-lock --yes \ | ||
| && if [ "$BUILD_MODE" = "lock" ]; then \ | ||
| conda-lock install --name openfold3 /opt/openfold3/production-linux-64.lock; \ | ||
| elif [ "$BUILD_MODE" = "yaml" ]; then \ | ||
| mamba env create -f /opt/openfold3/production.yml --name openfold3; \ | ||
| else \ | ||
| echo "Invalid BUILD_MODE: $BUILD_MODE. Use 'lock' or 'yaml'." && exit 1; \ | ||
| fi \ | ||
| && mamba clean --all --yes \ | ||
| && conda clean --all --yes | ||
|
|
||
| # Activate the openfold3 environment by default | ||
| ENV PATH=/opt/conda/envs/openfold3/bin:$PATH | ||
| ENV CONDA_PREFIX=/opt/conda/envs/openfold3 | ||
| ENV CONDA_DEFAULT_ENV=openfold3 | ||
|
|
||
| # Copy the minimal set of files needed to install the package | ||
| COPY setup.py /opt/openfold3/ | ||
| COPY pyproject.toml /opt/openfold3/ | ||
|
|
@@ -52,7 +73,7 @@ ENV TORCH_CUDA_ARCH_LIST="8.0;8.6;9.0" | |
| # python3 -c "import deepspeed; print('DeepSpeed ops loaded successfully')" | ||
|
|
||
| # Devel stage - use devel image for full CUDA support | ||
| ARG CUDA_BASE_IMAGE_TAG=12.2.2-cudnn8-devel-ubuntu22.04 | ||
| ARG CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04 | ||
| FROM nvidia/cuda:${CUDA_BASE_IMAGE_TAG} AS devel | ||
|
|
||
| # Install devel dependencies | ||
|
|
@@ -82,16 +103,24 @@ RUN rm -rf /usr/local/cuda/doc \ | |
|
|
||
| # Copy the entire conda environment | ||
| COPY --from=builder /opt/conda /opt/conda | ||
| ENV PATH=/opt/conda/bin:$PATH | ||
|
|
||
| # Copy CUTLASS | ||
| COPY --from=builder /opt/cutlass /opt/cutlass | ||
|
|
||
| # Activate the openfold3 environment by default | ||
| ENV PATH=/opt/conda/envs/openfold3/bin:/opt/conda/bin:$PATH | ||
| ENV CONDA_PREFIX=/opt/conda/envs/openfold3 | ||
| ENV CONDA_DEFAULT_ENV=openfold3 | ||
|
|
||
| # Ensure interactive shells also activate openfold3 | ||
| RUN /opt/conda/bin/conda init bash \ | ||
| && echo "conda activate openfold3" >> /root/.bashrc | ||
|
|
||
| # Set environment variables | ||
| ENV CUTLASS_PATH=/opt/cutlass | ||
| ENV KMP_AFFINITY=none | ||
| ENV LIBRARY_PATH=/opt/conda/lib:$LIBRARY_PATH | ||
| ENV LD_LIBRARY_PATH=/opt/conda/lib:$LD_LIBRARY_PATH | ||
| ENV LIBRARY_PATH=/opt/conda/envs/openfold3/lib:$LIBRARY_PATH | ||
| ENV LD_LIBRARY_PATH=/opt/conda/envs/openfold3/lib:$LD_LIBRARY_PATH | ||
|
|
||
| # Copy the entire source tree directly (at the very end for optimal caching) | ||
| COPY . /opt/openfold3 | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| # Dockerfile for generating conda environment lock files | ||
| # This produces a fully-pinned lock file for reproducible builds | ||
| # | ||
| # Usage: | ||
| # docker build -f docker/Dockerfile.update-reqs -t openfold3-update-reqs . | ||
| # docker run --rm openfold3-update-reqs > environments/production-linux64.lock | ||
|
|
||
| FROM mambaorg/micromamba:1.5.10 | ||
|
|
||
| USER root | ||
|
|
||
| # Install conda-lock | ||
| RUN micromamba install -y -n base -c conda-forge conda-lock \ | ||
| && micromamba clean --all --yes | ||
|
|
||
| USER $MAMBA_USER | ||
|
|
||
| COPY --chown=$MAMBA_USER:$MAMBA_USER environments/production.yml /tmp/environment.yml | ||
|
|
||
| # Generate explicit lock file for linux-64 | ||
| # The explicit format is directly consumable by mamba/conda | ||
| RUN micromamba run -n base conda-lock lock \ | ||
| --mamba \ | ||
| --platform linux-64 \ | ||
| --file /tmp/environment.yml \ | ||
| --kind explicit \ | ||
| --filename-template '/tmp/production-{platform}.lock' | ||
|
|
||
| # Output the lock file to stdout when container runs | ||
| CMD ["cat", "/tmp/production-linux-64.lock"] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add documentation about where the current
production.lockis generated? Specifically, what kind of instance / system, and any other variables that are relevant to environment resolution.For my understanding: Do we expect the
production.lockto change if the system is a GPU / CPU? Or should it be the same because we specify the same docker base image with CUDA?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point – currently the updating and generation is kind of synonymous (it doesn't say that). And it absolutely is platform specific, as you mentioned in your comments later. Yeah, it's specific to the platform (linux64, arm64, etc). The GPU/CPU point is conditional on what's in the environment.yaml – if that pulls a CPU-version of torch, that's what will be installed in the env.