Skip to content

Add seperate msa command#461

Open
Jnelen wants to merge 2 commits intojwohlwend:mainfrom
Jnelen:msa
Open

Add seperate msa command#461
Jnelen wants to merge 2 commits intojwohlwend:mainfrom
Jnelen:msa

Conversation

@Jnelen
Copy link
Contributor

@Jnelen Jnelen commented Jul 10, 2025

Background
A few users have been asking about generating MSAs, so they can prepare CSVs on a local system using the msa-server api, and then transfer them to an HPC cluster with limited internet (see issues #409, #447). I also found it annoying to deal with, so I added a msa subcommand to try to make this more convenient. It doesn't require all model weights to be downloaded, and just performs the msa generation part, making it more light-weight:

  • New msa subcommand
    • Lets you run only the MSA step (boltz msa …)
    • Supports multiple input files, making it a lot more convenient to generate the MSA for multiple complexes.
    • Downloads or reuses the CCD cache (currently model-dependent), calls ColabFold’s MMseqs2 server, and writes out CSVs
    • Offers options for cache path, server URL, pairing strategy, threading, max sequences, and model choice
    • Also added the new features added in Adding MSA server security #466

Open question
Right now boltz1 still downloads a full ccd.pkl while boltz2 uses mols.tar. If the workflows for msa generation is identical, we could simplify to always use the smaller pickle file.

I’m happy to adjust anything here, just let me know and I will try to improve it! If you like the changes I can also update the readme so this new functionality is briefly mentioned.

@rubenalv
Copy link

rubenalv commented Jul 10, 2025

Do you know of an option where the MSA generation with mmseqs2 (or maybe with mmseqs2-GPU) runs locally? Having enough computational resources, and in situations with a closed system, this would be very helpful.
Although I see that a custom msa can be supplied. Would it be as simple as running mmseqs2 locally, and then providing the custom msa in the .yaml field?

@Jnelen
Copy link
Contributor Author

Jnelen commented Jul 10, 2025

Do you know of an option where the MSA generation with mmseqs2 (or maybe with mmseqs2-GPU) runs locally? Having enough computational resources, and in situations with a closed system, this would be very helpful. Although I see that a custom msa can be supplied. Would it be as simple as running mmseqs2 locally, and then providing the custom msa in the .yaml field?

Hi there,

A fully offline, local MSA generation workflow would be a great complement to the server-based approach I implemented here.
I can try to look into this, but no promises!

Regarding the usage of msa, Boltz accepts custom MSAs in two ways:

  • YAML config
    In your YAML file, set the msa field to the path of your alignment. For example, see line 6 of examples/ligand.yaml.

  • FASTA input
    If you use a FASTA-style input, include your alignment under the >A|protein| header. See how it’s done in examples/ligand.fasta.

The MSA exporter implemented here writes CSV files in the same format that Boltz expects by default. I have used boltz like this and it seems to run perfectly. MSA files in A3M format are also supported and should work fine, though I’ve used them less often.

Hope that helps!

@Jnelen
Copy link
Contributor Author

Jnelen commented Jul 15, 2025

I cleaned up this PR and included the features from #466. Additionally, I updated the README, and also isolated the other minor changes in #480 and #481 respectively. Any feedback or thoughts are welcome!

@Jnelen Jnelen changed the title Add msa command, improve versioning and input checks Add seperate msa command Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants