-
Notifications
You must be signed in to change notification settings - Fork 35
Description
This issues describes the procedure to search all of our contigs against RdRP and presents results.
(Slack thread: https://hackseq-rna.slack.com/archives/C012H9SDQCA/p1615948152031200)
Input: FASTA files of contigs, either assembled using micro (all SRA .pro DIAMOND hits assembled with rnaviralspades) or macro (all from s3://lovelywater/assembly/contigs/, i.e. all CoV + dicistro + quenya + satellite + 1k random subset assembled using either coronaSPAdes or rnaviralspades).
Output: FASTA of all the contigs that hit RdRP either with HMM and/or palmscan, i.e. the RdRP+ contigs:
s3://serratus-rayan/pro-assembly/rdrpplus.micro.fa
s3://serratus-rayan/pro-assembly/rdrpplus.macro.fa
total size: 8.2 GB
hmmsearch was run using an exhaustive collection of RdRP HMMs:
https://gitlab.pasteur.fr/rchikhi_pasteur/serratus-rdrp-analysis/-/blob/master/hmm_macro_micro/RdRP_all.v2.hmm
alignments were made using this script:
https://gitlab.pasteur.fr/rchikhi_pasteur/serratus-rdrp-analysis/-/blob/master/hmm_macro_micro/align_hmm_to_contigs.sh