-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Hi there,
I began exploring serratus-lite to profile RNA viruses in wastewater datasets this week using the RNA Virus RdRP Search workflow described at: https://github.com/ababaian/serratus/wiki/Serratus-Lite
When I looked at the reference files described at https://github.com/ababaian/serratus/wiki/Access-Data-Release, I noticed there is an rdrp5.fa file in addition to the rdrp1.fa file. It seems like it is non-redundant with rdrp1 at a quick glance. Are there any downsides to concatenating the rdrp1.fa and rdrp5.fa files and using those as the reference for DIAMOND? Also, I wasn't able to find any information on the content of the rdrp5.fa -- is this information available somewhere?
Also, I'm not sure if it would be helpful, I but I also had a bit of a challenge getting the psummarizer.py script to run correctly under python2. With the help of an LLM, I refactored it into python 3 and it seems to be running correctly; the output is sane and matches my expectations of the data and the underlying .pro file as best I can tell. I don't want to create a PR for it because I have no way of ensuring that it is 1:1 identical to what you see with the existing python2 code (since I can't get that to give output in my environment). Attaching it here if you want to take a look at it. No worries if not.
Thanks,
dave