Conversation
|
Hi @thanhnamitit , this looks really great! Thank you especially for providing such detailed examples complete with cifs, example queries, and documentation and initial results. It is greatly appreciated and will be super helpful for future users. I will review this in greater depth over the next few days, but a few quick comments:
Thank you! |
5de99e9 to
253e5c8
Compare
|
Hi @jnwei, Thank you so much for reviewing the MR and for your kind words! I've addressed all your comments: 1. MSA-Free Testing with CIF Direct TemplatesI created a comprehensive E2E test script (
Bug Found & Fixed: During testing, I discovered an issue when running multimers in MSA-free mode. In if not msa_arrays_to_pair_i:
continueAll 8 tests now pass successfully, You can take a look at the attached log file! 2. HuggingFace Examples PRI've submitted the examples to HuggingFace: (https://huggingface.co/OpenFold/OpenFold3/discussions/12#694272355826ce8d56b0aaac) 3. Documentation UpdatesUpdated the following documentation with the new
Please review when you have a chance. Let me know if you need any changes! |
|
Thank you so much @thanhnamitit ! Overall the changes look great. Thank you also for adding documentation and examples to the HuggingFace repository. We'll review this further in the new year, and we should be able to add this in soon after. |
Summary
Add CIF direct template mode to OpenFold3, allowing users to provide template structures as CIF files without pre-computed alignments. The system automatically aligns template chains to query sequences and selects the best match based on sequence identity × coverage.
Changes
Core Implementation
Template Processing
template_cif_pathsfield toChainmodel with validation to ensure mutual exclusivity withtemplate_alignment_file_pathCifDirectParserclass (openfold3/core/data/io/sequence/template.py) to parse CIF files directlyTemplatePreprocessorInputInference(openfold3/core/data/pipelines/preprocessing/template.py) to support both alignment-based and CIF-direct modes_parse_templates_from_cif_files()method for CIF-direct processingDocumentation
User Guides (
docs/source/Inference.md,docs/source/template_how_to.md)Example Files
Query JSONs
query_homomer_with_direct_cif_templates.json- Homomer examplequery_multimer_with_direct_cif_templates.json- Multimer exampleTemplate CIFs (15 files total)
1dgc.cif,1ysa.cif,1zta.cif,4dmd.cif,4dme.cif6l06.cif,6l07.cif,7cnw.cif,7cnx.cif,7cnz.cif(2 chain groups)Related Issues
N/A
Testing
I've created a script to test the CIF direct template feature across three template modes: no templates, ColabFold MSA server templates, and CIF direct templates (user-provided). The script runs 6 end-to-end inference tests to compare prediction quality across these modes for both homomer and multimer queries.
Test Script
Test Output
Summary
Test Configuration:
--use_templates false)--use_templates truewith automatic template discovery)Results:
Key Findings:
Technical Validation:
template_preprocessor_settings.create_logs: trueOther Notes
N/A