Skip to content

Modernize conda environment#34

Open
sdvillal wants to merge 133 commits intoaqlaboratory:mainfrom
sdvillal:modernize-conda-environment
Open

Modernize conda environment#34
sdvillal wants to merge 133 commits intoaqlaboratory:mainfrom
sdvillal:modernize-conda-environment

Conversation

@sdvillal
Copy link

@sdvillal sdvillal commented Nov 12, 2025

Summary

Adds a modern conda environment following best practices to improve the quality of life of conda users.

The environment is self-contained, including a sane toolchain to build extensions fully compatible with the rest of dependencies, and with batteries included (inference, bioinformatics, fast kernels, dev dependencies).

We maintain a pixi workspace and an automatically generated conda environment for non-pixi users.

We still need to iron-out four known problems (see overcomments in pixi.toml and upcoming issues) and add documentation.

From here, creating conda-forge openfold3 package and bioconda openfold3-extra should be simple enough.

Changes

Related Issues

TBC

Testing

Current environment passes all tests and produce sensible predictions.

Other Notes

This is exploratory at the moment. Will cleanup commit history or open a clean PR when we are done.

@Emrys-Merlin
Copy link

Thank you for the draft!

@Emrys-Merlin
Copy link

DeepSpeed accepted our first upstream fix regarding the ninja detection (deepspeedai/DeepSpeed#7687). Once a new version is released, this should allow us to get rid of the PyPI ninja dependency. Of course, this fix will only come into play if we decide against the vendoring approach.

@sdvillal
Copy link
Author

sdvillal commented Dec 1, 2025

As of 2024/12/01, packages still installed from pypi after installing openfold3 in devel/editable mode:

To investigate

  • aria2, both from CF (v 1.37.0) and pypi (v 0.0.1b0)

Proposed solution: remove aria2 from pypi dependencies, as it is currently unused in OF3 codebase. The pypi package is an old convenience and should in general not be used to install aria2, as it is not even such big of a convenience.

Because of cuequivariance_ops_torch_cu12

  • cuequivariance_ops_torch_cu12
  • cuequivariance_ops_cu12
  • nvidia_cublas_cu12 both from CF (libcublas 12.9.1.4) and pypi (v12.9.1.4)

These should at the very least aligned with the CF version, but likely it is best to just install all from pypi until we understand how to deal with the license. The key question is what to do with libcublas, maybe we should add synonyms to parselmouth in pixi - although I am not 100% sure these two packages are 100% binary compatible.

Currently the biggest block to have a conda package with these is their LICENSE.

See also: NVIDIA/cuEquivariance#218

It could be interesting to see if openequivariance could be a viable alternative:
https://github.com/PASSIONLab/OpenEquivariance

Because of mkl

  • mkl both from CF (2025.3.0) and pypi (2025.3.0)
  • intel_openmp
  • onemkl_license
  • tbb both from CF (2022.3.0) and pypi (2022.3.0)
  • tcmlib
  • umf
  • intel_cmplr_lib_url

Proposed solution: remove mkl from pypi dependencies, as it is actually unused (pytorch links it statically, numpy and scipy are not build against it and do not dynamically dispatch).

@jandom
Copy link
Collaborator

jandom commented Dec 17, 2025

hi there @sdvillal thanks for this – we're currently working on a bunch of related issues
#70
#75

Hopefully we can combine it all together with your PR after the holidays?

@sdvillal
Copy link
Author

sdvillal commented Dec 23, 2025

hi there @sdvillal thanks for this – we're currently working on a bunch of related issues #70 #75

Hopefully we can combine it all together with your PR after the holidays?

Also:
#79
sdvillal#1 (now merged)

@sdvillal
Copy link
Author

Coming back to this after the end of year "hiatus". Current state and TODOs:

  • Need to catch up with changes and PRs upstream.
  • Fully isolating/vendoring the evoformer extension, to get rid completely of the deepspeed dependency, is proving very hairy - so likely we will need to open a PR upstream to also fix CUTLASS detection.
  • Take care of all the open issues above.

all tests pass, predictions seem to be correct
corresponds to a modernized conda environment following best practices
Comments

Overcommenting issues
incomplete, we might not need the native sources
from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026
incomplete, we might not need the native sources
from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026
Use vendored deepspeed in the attention primitives
@sdvillal
Copy link
Author

Note: figuring how to package kalign-python for conda here.

@jandom
Copy link
Collaborator

jandom commented Mar 14, 2026

hi there @sdvillal and @jnwei - I think now is the time to push on this getting merged, the release is out. We'll probably have 1-3 big fixes that will be merged (and probably a 0.4.1 released). But shortly afterwards this should come (imo). Never-mind the technical complexity, this will be quite a change for the lab!

Update I've attempted to resolve the conflicts here https://github.com/sdvillal/openfold-3/pull/3/changes

@sdvillal
Copy link
Author

sdvillal commented Mar 15, 2026

Congrats for the new release! 🥇

As we discussed, happy to restart pushing for this when we get more bandwidth.

I have merged main independently (@Emrys-Merlin as you guys asked, please test). I saw late your PR @jandom! Thanks a lot anyway :-)

TODO

  • Write docs (will do next)
  • Review once again changes (specially after merging the new release)
  • Complete test matrix (see below)
  • Accept and merge! (history is rich, but I would maybe squash it or at the very least clean it again)

Current test matrix

  • CPU tests pass in linux and osx-arm64, but in osx it requires a workaround (see below about kalign-python)
  • Tests pass for both CUDA12 and CUDA13 in A100
  • I currently cannot test in blackwell + linux-64. This is temporary and I hope I can do in the next days.
  • I am not testing yet the "-pypi" environments and I am tempted to remove them.

About kalign-python

Currently we depend on getting kalign-python from pypi - only dependency not in conda-forge.

A kalign-python conda package would ensure it plays well with the ecosystem and allow to create create an openfold-3 conda package for a one-command install. I have already opened a PR to conda-forge to create it. Its fate will depend on deprecating the kalign3 package on conda-forge.

In the meantime, this is not a big deal as kalign-python has no dependencies works out of the box in linux. Unfortunately, it does not work so well in osx: as soon as you have something else in your environment (conda or plain python) that links to an alternative openmp runtime, things fail - see this bug report. In the meantime, the fragile and unsupported workaround is export KMP_DUPLICATE_LIB_OK=TRUE. Anyway, who runs predictions in osx? (I do ;-))

An openfold-3 conda package?

I just started working on an openfold conda recipe, so that people should be able to run "conda install openfold3" and be happy.

As you asked for it, the package might include precompiled kernels to be dispatched at runtime for the actual compute capability. I already did some ground work (e.g., allow hardware independent cross-compilation of the Evoformer in deepspeed). In any case, I feel as soon as we merge this PR, the next step to improve user experience will be to improve kernel compilation (maybe ditch deepspeed or specialise it to newer hardware, get the AMD triton kernel, run benchmarks to broadly advice what works best etc.). Sounds like a lot of fun!

Changes since we last reviewed

  • Removed example docker files and workaround for rdkit ABI incompatibilities when mixing conda and pypi deps (as we discussed)
  • Fixed conda env export task (in any case this will be documented as "untested")
  • Added a task to safely update the lock file (will be documented)
  • Unpinned biotite
  • Reverted the "pickable dataloader" change as you guys also fixed upstream (indirectly, while fixing other things)
  • Removed kalign 2
  • Fixed a test

Notes

  • My bandwidth will be very limited until April 1st, but hopefully there is not so much to do for an initial merge - this should not be so disruptive and I think we will figure better what needs fixing if more people start using it.

  • Should we later take care of "plugins", like the affinity one?

@sdvillal sdvillal marked this pull request as ready for review March 15, 2026 13:39
@jandom
Copy link
Collaborator

jandom commented Mar 16, 2026

Got it to run on a DGX box, great stuff

run_openfold predict \
        --query_json examples/example_inference_inputs/query_ubiquitin.json \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --num_diffusion_samples=1 \
    --num_model_seeds=1 \
    --output_dir output/
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  queued_call()
WARNING:openfold3.entry_points.experiment_runner:No version_tensor is found for this checkpoint.Assuming the user knows checkpoints are parameters are compatible, continuing...
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
💡 Tip: For seamless cloud logging and experiment tracking, try installing [litlogger](https://pypi.org/project/litlogger/) to enable LitLogger, which logs metrics and artifacts automatically to the Lightning Experiments platform.
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
WARNING:openfold3.core.data.tools.colabfold_msa_server:Mapping file /tmp/of3_colabfold_msas/mappings/seq_to_rep_id.json already exists. Appending new sequences.
WARNING:openfold3.core.data.tools.colabfold_msa_server:Mapping file /tmp/of3_colabfold_msas/mappings/rep_id_to_seq.json already exists. Appending new sequences.
Submitting 1 sequences to the Colabfold MSA server for main MSAs...
No complexes found for paired MSA generation. Skipping...
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/multiprocessing/popen_fork.py:67: DeprecationWarning: This process (pid=1974826) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Preprocessing templates: 100%|█████████████████████████████████████████████████████| 1/1 [00:00<00:00, 142.58it/s]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/pytorch_lightning/utilities/_pytree.py:21: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:429: Consider setting `persistent_workers=True` in 'predict_dataloader' to speed up the dataloader worker initialization.
Predicting: |                                                                               | 0/? [00:00<?, ?it/s]/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  queued_call()
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  queued_call()
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  queued_call()
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  queued_call()
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  queued_call()
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  queued_call()
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  queued_call()
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  queued_call()
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  queued_call()
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: 
    Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (8.0) - (12.0)
    
  queued_call()
Predicting DataLoader 0:   0%|                                                              | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
/home/jandom/workspace/openfold-3/.pixi/envs/openfold3-cuda13-pypi/lib/python3.13/site-packages/biotite/structure/io/pdbx/convert.py:912: DeprecationWarning: `include_bonds` parameter is deprecated, intra-residue are always written, if available
  warnings.warn(
Predicting DataLoader 0: 100%|██████████████████████████████████████████████████████| 1/1 [04:47<00:00,  0.00it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████████████████████████████████████████████████| 1/1 [04:47<00:00,  0.00it/s]
Removing empty log directory...
Cleaning up MSA directories...

however, I had to make one change

diff --git a/openfold3/core/data/framework/data_module.py b/openfold3/core/data/framework/data_module.py
index 229b799f..b710a78e 100644
--- a/openfold3/core/data/framework/data_module.py
+++ b/openfold3/core/data/framework/data_module.py
@@ -446,9 +446,9 @@ class DataModule(pl.LightningDataModule):
             generator=self.generators[mode],
             worker_init_fn=worker_init_fn,
             # https://github.com/pytorch/pytorch/issues/87688
-            multiprocessing_context="fork"
-            if torch.backends.mps.is_available() and num_workers
-            else None,
+            # Use "spawn" when workers are needed: fork is unsafe in multi-threaded
+            # processes (e.g. after CUDA init), which causes segfaults on Python 3.12+.
+            multiprocessing_context="spawn" if num_workers else None,
         )
 
     def train_dataloader(self) -> DataLoader:

Update: all the tests pass locally, this seems good to go!

@jandom jandom added the safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. label Mar 16, 2026
@sdvillal
Copy link
Author

sdvillal commented Mar 17, 2026

Good stuff @jandom

I was surprised the new multiprocessing config affected your box (apologies!). I have narrowed down to change things only in osx (in fact I will feel more comfortable if we do not hardcode "spawn" there). Can you try again?

On a positive note, tests also pass nicely in blackwell + linux-64.

@jandom
Copy link
Collaborator

jandom commented Mar 17, 2026

I was surprised the new multiprocessing config affected your box (apologies!). I have narrowed down to change things only in osx (in fact I will feel more comfortable if we do not hardcode "spawn" there). Can you try again?

I'm not sure that's impacting me, haven't tried on my osx yet, this was on a DGX

@jandom
Copy link
Collaborator

jandom commented Mar 17, 2026

Still broken on linux without additional changes, this time only a config change was needed

diff --git a/examples/example_runner_yamls/low_mem.yml b/examples/example_runner_yamls/low_mem.yml
index 78f8acc9..1273ed97 100644
--- a/examples/example_runner_yamls/low_mem.yml
+++ b/examples/example_runner_yamls/low_mem.yml
@@ -1,6 +1,8 @@
-# Model changes for low memory 
+# Model changes for low memory
 model_update:
-  presets: 
+  presets:
     - predict
     - low_mem
-    - pae_enabled
\ No newline at end of file
+    - pae_enabled
+data_module_args:
+  num_workers: 0

Claude then proceed to propose this delightful change (I'm not impressed)

        # macOS MPS requires fork: https://github.com/pytorch/pytorch/issues/87688
        # Linux requires spawn: fork segfaults when CUDA threads are already running
        # (Python 3.12+ warns, 3.14 will change the default)
        multiprocess_context = None
        if num_workers:
            if platform.system() == "Darwin" and torch.backends.mps.is_available():
                multiprocess_context = "fork"
            else:
                multiprocess_context = "spawn"

Update: everything is passing for me locally, yay!

Passed (4):
  ✓ openfold3-cuda12
  ✓ openfold3-cuda12-pypi
  ✓ openfold3-cuda13
  ✓ openfold3-cuda13-pypi

The only thing I see missing here is the Dockerfile updates? (I'm happy to add those, pixi activation is even tricker than conda activation in docker). Here is an idea – let's merge this into a dummy branch on this repo, so that we can start iterating on the docker image stuff (which will be hard on your fork, comparatively)

@sdvillal sdvillal force-pushed the modernize-conda-environment branch from 1fb8bf2 to 610ae56 Compare March 20, 2026 08:47
We would still need to document this for users
@sdvillal sdvillal force-pushed the modernize-conda-environment branch from 610ae56 to 23811a4 Compare March 20, 2026 09:31
@sdvillal
Copy link
Author

sdvillal commented Mar 20, 2026

Thanks a lot for the tests @jandom. Good stuff.

I guess the pae_enabled problem will be taken care of by #142

I have created a more sophisticated solution to the multiprocessing_context in 23811a4 (apologies for the force-push, and note a fixed capitalization error later) that:

  • allows the setting to be overridden by the user
  • provides sensible defaults (note that I have set "forkserver" instead of "spawn" in linux to match the new python 3.14 default, we can go back to "spawn" if it still fails)
  • and gives informative logging if a misconfiguration is detected.

If you like it we will need to document. We can also revert into your solution if we prefer simplicity. Let me know :-)

As we discussed in our call, I removed our example dockerfiles. I would suggest to do that in a subsequent PR, or we can try to bring it to this PR. In any case, if you need help, we can lend a hand.

As we are removing "hacks.py" and the newer deepspeed versions have fixed the cutlass configuration problems, we will need to update the docs introduced here. Happy to do too.

Should we open an issue to document the failing tests for cuEquivariance > 0.7? I do not know if these are legit issues or if the tests need update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants