Add Binette#11563
Conversation
ochkalova
left a comment
There was a problem hiding this comment.
Thank you very much for doing this module! I never got to finishing it 😅
I left one request for an additional input arg.
Also, in my branch there is a 1.1 Mb test dataset that allows to make a test run without real CheckM2 DB. It still not super quick and requires a couple of minutes to finish, but I think it's better than downloading full CheckM2 DB... There is 2 tests for 2 options of input_type, I think we can only keep one of them.
The peculiarity here is to push this test data to nf-tests (that's why I never finished this module)
If you have capacity to it and update the test, would be great.
| input: | ||
| tuple val(meta) , path(contig2bin), path(contigs), path(proteins) | ||
| tuple val(meta2), path(checkm2_db) |
There was a problem hiding this comment.
Binnete can process fasta input as well. I would add one more input to define if input is a tsv of folder with fasta files. I think it can be convenient.
| input: | |
| tuple val(meta) , path(contig2bin), path(contigs), path(proteins) | |
| tuple val(meta2), path(checkm2_db) | |
| input: | |
| tuple val(meta) , path(input_binning, stageAs: "input_binning/*"), path(contigs), path(proteins) | |
| val input_type | |
| tuple val(meta2), path(checkm2_db) |
There was a problem hiding this comment.
Will add this tomorrow!
| """ | ||
| binette \\ | ||
| --contig2bin_tables ${contig2bin} \\ | ||
| --contigs ${contigs} \\ | ||
| ${proteins_input} \\ | ||
| --checkm2_db ${checkm2_db} \\ | ||
| --threads ${task.cpus} \\ | ||
| --prefix ${prefix} \\ | ||
| --outdir . \\ | ||
| ${args} |
There was a problem hiding this comment.
| """ | |
| binette \\ | |
| --contig2bin_tables ${contig2bin} \\ | |
| --contigs ${contigs} \\ | |
| ${proteins_input} \\ | |
| --checkm2_db ${checkm2_db} \\ | |
| --threads ${task.cpus} \\ | |
| --prefix ${prefix} \\ | |
| --outdir . \\ | |
| ${args} | |
| def input_arg = "" | |
| if (input_type == 'fasta') { | |
| input_arg = "--bin_dirs" | |
| } else if (input_type == 'tsv') { | |
| input_arg = "--contig2bin_tables" | |
| } else { | |
| error "Invalid input_type: ${input_type}. Must be 'fasta' or 'tsv'" | |
| } | |
| """ | |
| binette \\ | |
| ${input_arg} input_binning/* \\ | |
| --contigs ${contigs} \\ | |
| ${proteins_input} \\ | |
| --checkm2_db ${checkm2_db} \\ | |
| --threads ${task.cpus} \\ | |
| --prefix ${prefix} \\ | |
| --outdir . \\ | |
| ${args} |
| --outdir . \\ | ||
| ${args} | ||
|
|
||
| find final_bins/ -maxdepth 1 -name "*.fa" -type f -exec gzip {} \\; |
There was a problem hiding this comment.
Is it a nf-core standard to compress fasta? I don't know if gz files are convenient for a regular user 🤔
There was a problem hiding this comment.
It's a general nf-core recommendation to gzip files (https://nf-co.re/docs/specifications/components/modules/general#compression-of-input-and-output-files), but perhaps more importantly all the other nf-core binning modules also write gzipped fasta so its consistent for downstream use.
Add binette module for metagenome bin refinement. Unfortunate test depends on the large CheckM2 database...
@ochkalova I only realised as I made this PR you have a branch for Binette - happy to collaborate/work on your branch if you prefer!
PR checklist
Closes #XXX
topic: versions- See version_topicslabelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile condanf-core subworkflows test <SUBWORKFLOW> --profile dockernf-core subworkflows test <SUBWORKFLOW> --profile singularitynf-core subworkflows test <SUBWORKFLOW> --profile conda