Fix check_sequence filter by gnikolenyi · Pull Request #80 · aqlaboratory/openfold-3

gnikolenyi · 2025-12-28T05:46:54Z

Fixes #72

jnwei

Thank you for submitting a fix for this issue @gnikolenyi ! I understand that this fix might be a smaller part of a set of larger fixes planned with the later release. Thank you for adding this fix separately first.

To ensure correctness of the check_sequence logic, it would be extremely helpful to have a few test cases. Simple examples like the one provided by @ECalfeeAdaptive in #72 are perfect for testing this function as it is easy to see the contents of the examples, and easy to check what the expected output should be.

Please let me know if I can provide any assistance with adding the the tests / documentation.

jnwei · 2025-12-29T08:25:32Z

openfold3/core/data/primitives/sequence/template.py

@@ -167,13 +167,32 @@ def check_sequence(
        bool:


Could we update the return description to reflect the 3 values that are now being returned?

Also, it doesn't appear that we use the other return values of query_aln and hit_aln. Perhaps it would be easier to add these return values later if/when they are needed?

jnwei · 2025-12-29T08:29:13Z

openfold3/core/data/primitives/sequence/template.py

 # Template cache construction
 def check_sequence(
-    query_seq: str,
+    query: TemplateHit,


Not strictly related to this PR, but could we update docstring for TemplateHit? Some of the fields seem out of date:

openfold-3/openfold3/core/data/io/sequence/template.py

Lines 69 to 85 in 0b9df93

class TemplateHit(NamedTuple):

"""Tuple containing template hit information.

Attributes:

index (str):

Row index of the hit in the alignment.

name (str):

PDB-chain ID of the hit.

aligned_cols (int):

Number of

hit_sequence (str):

The PDB ID of the hit.

indices_hit (str):

The PDB ID of the hit.

e_value (str):

The PDB ID of the hit.

"""

jnwei · 2025-12-29T09:03:29Z

openfold3/core/data/primitives/sequence/template.py

@@ -152,8 +152,8 @@ def check_sequence(
    """Applies sequence filters to template hits following AF3 SI Section 2.4.


Could we add a quick description of the template filters from this section. AFAICT from the code, these filters are:

Fails if coverage < min_align threshold.

Fails if coverage >= max_subseq AND covered == identical.

Digging into the second statement, the anticipated outputs are:

covered == identical -- this suggests that the template hit is identical to the query hit, because the non-gaps are located in the same places in the query / template hit. This hit would fail the filter

covered != identical -- this suggests that some of the gap tokens in the matching sequence are not in the same locations, and thus the sequence is not a perfect match. This hit would pass the filter.

If the above understanding is correct, I am not sure if this would resolve the issue raised in the test example given in #72 . In that case, we have a hit which has 100% coverage, but has a different sequence value. In that test case, I believe the function would still fail the checks in this function.

Sequence positions that are aligned but not identical (AA substitutions) don't seem to be covered by this PR. I think the template 'duplicate' logic here from openfold could be re-used for openfold3 and would resolve the issue I raised in #72

jnwei · 2025-12-29T09:04:50Z

openfold3/core/data/primitives/sequence/template.py

@@ -143,7 +143,7 @@ def parse_representatives(

 # Template cache construction
 def check_sequence(


Could we consider making this function name more descriptive. Perhaps something like "check_seqence_similarity_within_range"?

Fix check_sequence filter

0b9df93

gnikolenyi requested a review from jnwei December 28, 2025 05:47

gnikolenyi mentioned this pull request Dec 28, 2025

[BUG] Top templates incorrectly filtered out by training preprocessing script #72

Open

jnwei requested changes Dec 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix check_sequence filter#80

Fix check_sequence filter#80
gnikolenyi wants to merge 1 commit intomainfrom
bugfix/template-check-sequence

gnikolenyi commented Dec 28, 2025 •

edited

Loading

Uh oh!

jnwei left a comment

Uh oh!

jnwei Dec 29, 2025

Uh oh!

jnwei Dec 29, 2025

Uh oh!

jnwei Dec 29, 2025

Uh oh!

ECalfeeAdaptive Dec 30, 2025 •

edited

Loading

Uh oh!

jnwei Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	class TemplateHit(NamedTuple):
	"""Tuple containing template hit information.

	Attributes:
	index (str):
	Row index of the hit in the alignment.
	name (str):
	PDB-chain ID of the hit.
	aligned_cols (int):
	Number of
	hit_sequence (str):
	The PDB ID of the hit.
	indices_hit (str):
	The PDB ID of the hit.
	e_value (str):
	The PDB ID of the hit.
	"""

		@@ -152,8 +152,8 @@ def check_sequence(
		"""Applies sequence filters to template hits following AF3 SI Section 2.4.

		@@ -143,7 +143,7 @@ def parse_representatives(

		# Template cache construction
		def check_sequence(

Conversation

gnikolenyi commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnwei left a comment

Choose a reason for hiding this comment

Uh oh!

jnwei Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

jnwei Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

jnwei Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

ECalfeeAdaptive Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnwei Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gnikolenyi commented Dec 28, 2025 •

edited

Loading

ECalfeeAdaptive Dec 30, 2025 •

edited

Loading