Skip to content

Conversation

@pinin4fjords
Copy link
Member

This PR fixes several issues with the anota2seq module:

Problems Fixed

  1. Missing gene IDs: The anota2seq results tables were missing gene identifiers, making it difficult to interpret results
  2. Missing parameter: The parameter was used but never defined
  3. Empty file handling: When no genes passed significance thresholds, the module would create empty/malformed files
  4. Output expectations: Nextflow expected files that might not be created

Changes Made

  • Add parameter: Defaults to "gene_id" but configurable
  • Include gene IDs: Add gene identifiers as the first column in all results tables using and
  • Conditional output: Only write files when there are actual results to avoid empty files
  • Optional outputs: Mark all results TSV files as optional since they're now conditionally created
  • Update tests: Modified test to use buffering results instead of empty mRNA_abundance, updated snapshots

Benefits

  • Results tables now include gene identifiers for downstream analysis
  • Consistent with other nf-core modules that include gene IDs
  • Graceful handling of cases where no genes pass significance thresholds
  • More robust error handling

Testing

  • Updated test snapshots with new MD5 hashes reflecting the gene_id column addition
  • Test now uses buffering results instead of empty mRNA_abundance results
  • All outputs marked as optional to handle conditional file creation

The anota2seq results are now much more usable and consistent with expectations from other differential analysis modules.

- Add missing gene_id_col parameter definition (defaults to 'gene_id')
- Include gene IDs as first column in all results tables using configurable column name
- Only write output files when there are significant results to avoid empty files
- Mark all results TSV outputs as optional since they're conditionally created
- Update test to use buffering results instead of empty mRNA_abundance results
- Update test snapshots with new file formats including gene_id column

This ensures anota2seq results are consistent with other modules and include
gene identifiers for downstream analysis, while gracefully handling cases
where no genes pass significance thresholds.

Co-Authored-By: Sebastian Uhrig <[email protected]>
@pinin4fjords pinin4fjords force-pushed the fix/anota2seq-include-gene-ids branch from 62718f0 to 24c04ae Compare December 5, 2025 13:19
@pinin4fjords pinin4fjords added this pull request to the merge queue Dec 5, 2025
Merged via the queue into master with commit 6717b61 Dec 5, 2025
16 checks passed
@pinin4fjords pinin4fjords deleted the fix/anota2seq-include-gene-ids branch December 5, 2025 14:02
vagkaratzas pushed a commit that referenced this pull request Dec 8, 2025
- Add missing gene_id_col parameter definition (defaults to 'gene_id')
- Include gene IDs as first column in all results tables using configurable column name
- Only write output files when there are significant results to avoid empty files
- Mark all results TSV outputs as optional since they're conditionally created
- Update test to use buffering results instead of empty mRNA_abundance results
- Update test snapshots with new file formats including gene_id column

This ensures anota2seq results are consistent with other modules and include
gene identifiers for downstream analysis, while gracefully handling cases
where no genes pass significance thresholds.

Co-authored-by: Sebastian Uhrig <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants