build: added environment.lock for 'stable' builds#75
Conversation
| @@ -1,5 +1,5 @@ | |||
| # Full performance multi-stage build with complete CUDA toolchain | |||
| ARG CUDA_BASE_IMAGE_TAG=12.2.2-cudnn8-devel-ubuntu22.04 | |||
| ARG CUDA_BASE_IMAGE_TAG=12.1.1-cudnn8-devel-ubuntu22.04 | |||
There was a problem hiding this comment.
This is unrelated and snuck in via #70, turning back to the old version we used
jnwei
left a comment
There was a problem hiding this comment.
LGTM! Thanks for adding the option to pin environments while leaving the CI testing environment flexible.
I just have a few minor comments and questions.
docker/Dockerfile
Outdated
| FROM nvidia/cuda:${CUDA_BASE_IMAGE_TAG} AS builder | ||
|
|
||
| # Environment mode: "lock" for reproducible builds, "yaml" for flexible dev builds | ||
| ARG BUILD_MODE=lock |
There was a problem hiding this comment.
Is there a way to provide a default option for this argument? There are some users who may prefer to use the Dockerfile directly to build their own image, rather than use the publishsed Dockerfile image, so it would be nice to have a default argument here.
There was a problem hiding this comment.
Hey, yeah i think this already contains the default, but it probably should be 'yaml' instead of 'lock'
The non-default version of this is just
ARG BUILD_MODE
| @@ -1,35 +1,75 @@ | |||
| ## Production images | |||
| ## Updating the production.lock file | |||
|
|
|||
There was a problem hiding this comment.
Should we add documentation about where the current production.lock is generated? Specifically, what kind of instance / system, and any other variables that are relevant to environment resolution.
For my understanding: Do we expect the production.lock to change if the system is a GPU / CPU? Or should it be the same because we specify the same docker base image with CUDA?
There was a problem hiding this comment.
Yeah, that's a good point – currently the updating and generation is kind of synonymous (it doesn't say that). And it absolutely is platform specific, as you mentioned in your comments later. Yeah, it's specific to the platform (linux64, arm64, etc). The GPU/CPU point is conditional on what's in the environment.yaml – if that pulls a CPU-version of torch, that's what will be installed in the env.
| https://conda.anaconda.org/conda-forge/linux-64/zstd-1.5.7-hb78ec9c_6.conda#4a13eeac0b5c8e5b8ab496e6c4ddd829 | ||
| https://conda.anaconda.org/conda-forge/linux-64/aws-c-io-0.23.3-had5c4f5_4.conda#a53c9f532e5c4a2b85d4b4c439ea5a5d | ||
| https://conda.anaconda.org/conda-forge/linux-64/brotli-bin-1.2.0-hb03c661_1.conda#af39b9a8711d4a8d437b52c1d78eb6a1 | ||
| https://conda.anaconda.org/conda-forge/linux-64/cuda-cudart-12.4.127-he02047a_2.conda#a748faa52331983fc3adcc3b116fe0e4 |
There was a problem hiding this comment.
just curious - why do we have two versions of cudart in the lockfile? Do they come from different dependencies?
| https://conda.anaconda.org/pytorch/noarch/pytorch-mutex-1.0-cuda.tar.bz2#a948316e36fb5b11223b3fcfa93f8358 | ||
| https://conda.anaconda.org/conda-forge/noarch/tzdata-2025c-h8577fbf_0.conda#338201218b54cadff2e774ac27733990 | ||
| https://conda.anaconda.org/conda-forge/noarch/ca-certificates-2025.11.12-hbd8a1cb_0.conda#f0991f0f84902f6b6009b4d2350a83aa | ||
| https://conda.anaconda.org/conda-forge/noarch/cuda-cudart_linux-64-12.4.127-h85509e4_2.conda#329163110a96514802e9e64d971edf43 |
* Update production.lock * feat: use .lock files instead of environment.yml which may drift * use a dedicated conda env and auto-activate it * move to "deny-all, allow-list" in dockerignore * test: with the environment.lock we can take a fresher version of miniforge * support building both from .lock and .yaml * change default to 'yaml' * review comments: make the platform explicit

Summary
When using environment.yaml, conda may resolve to different package builds at different times. This PR creates a new docker image just to create a hermetic conda environment: taking in environment.yml and producing environment.lock (just for linux for now).
Big change: new conda environment openfold3 instead of base, which gets auto-activated.
update @jnwei and I discussed this and we want to do pinning for the 'stable' image but not for the tests or the development image. I need to re-think a bit how that's put together. This approach will give us the best of both worlds: people who consume the docker image for inference get exactly the correct deps, while developers have a more library-like dependency experience.
Changes
Related Issues
Testing
Other Notes
None