Add recombinase assembly algorithm for attB/attP -- Generalized integrase issue #435#496
Add recombinase assembly algorithm for attB/attP -- Generalized integrase issue #435#496areebamomin wants to merge 1 commit intopydna-group:masterfrom
Conversation
|
Dear @areebamomin, thanks for your contribution. It's already looking great, but I have a few comments and suggestions: Need to have
Nice to haveThese are just some ideas in case you want to invest a more time in this. They would be quite helpful!
|
|
Hi @manulera Thank you for looking it over and the feedback! I will work on this in the following weeks. |
|
Degenerate nucleotide codes represent a position that could be occupied by more than one nucleotide in a consensus sequence. You have them listed here: https://people.bath.ac.uk/jm2219/biology/degenerate.htm You don't have to handle how to find these degenerate sequences in your new code. You can use the function from pydna.sequence_regex import dseqrecord_finditer, compute_regex_site
from pydna.dseqrecord import Dseqrecord
seq = Dseqrecord('CTaaaACGTaaaAC')
# Turn degenerate sequence into regex pattern (case insensitive)
regex_pattern = compute_regex_site('ACNT')
print('regex pattern', regex_pattern)
# Find it in the sequence
result = dseqrecord_finditer(regex_pattern, seq)
print([r for r in result])
# Handles circular sequences, note that it
# returns 12,16 as the span for the circular-spanning motif
seq2 = Dseqrecord('CTaaaACGTaaaAC', circular=True)
result2 = dseqrecord_finditer(regex_pattern, seq2)
print([r for r in result2]) |
|
Any updates on this? |
|
I was traveling for the last month / holidays but hoping to work more on this in the upcoming couple of weeks! |
|
@areebamomin Good to hear, mind that pydna has undergone some fundamental internal changes, the Dseq class now relies on a single string instead of two. See last release v5.5.5 |
|
Hi @areebamomin pinging you here, do you think you will have time to finish this? |
|
Hi @manulera just emailed you! |
This PR implements a recombinase-based assembly algorithm for pydna by adding a new function, make_recombinase_algorithm, to src/pydna/assembly2.py. The function identifies homologous recombination regions by extracting the lowercase core shared between attB and attP recognition sites and returning match tuples in the format expected by Assembly to behave consistently with other supported assembly strategies. A corresponding test suite (tests/test_recombinase_overlap.py) was added to verify homology detection, edge cases, multiple matches, and full integration with the Assembly class. All tests pass successfully using both python run_test.py and pytest, and all doctests in assembly2.py also run without errors.
Hopefully closes or makes some progress on #435 !
Thank you for letting me have a go at learning more about the program and hopefully can build on this to a successful contribution!