-
Notifications
You must be signed in to change notification settings - Fork 828
Add validation error for incompatible transcript_fasta and additional_fasta params #1632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validation error for incompatible transcript_fasta and additional_fasta params #1632
Conversation
…_fasta params When both --transcript_fasta and --additional_fasta are provided, the pipeline cannot append spike-in sequences to the user-provided transcriptome. This causes downstream quantification to fail with confusing errors. Now the pipeline fails fast with a clear error message explaining the issue and suggesting solutions. Closes #1450 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
The validation was too strict - it blocked all combinations of transcript_fasta + additional_fasta, but this is only problematic when the pipeline needs to BUILD a pseudo-aligner index (Salmon/Kallisto). If a pre-built index is provided that already contains the spike-ins, the combination is valid. Updated validation logic to check for this condition and updated docs/schema accordingly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
@copilot I need to add a fix to the changes here. The salmon index is only relevant if the pseudoaligner is 'salmon', similarly for kallisto |
|
@pinin4fjords I've opened a new pull request, #1633, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: pinin4fjords <[email protected]>
Co-authored-by: pinin4fjords <[email protected]>
Add validation error for incompatible transcript_fasta and additional_fasta params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds validation to prevent a confusing error that occurs when users provide both --transcript_fasta and --additional_fasta without a pre-built pseudo-aligner index. The validation ensures the pipeline fails fast with a clear, actionable error message instead of failing later during quantification.
Key changes:
- Added early validation in
validateInputParameters()to detect the incompatible parameter combination - Added a clear error message function with actionable solutions for users
- Updated schema help text for both parameters to document the incompatibility
- Enhanced documentation in usage.md with inline notes and a dedicated warning box
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf | Adds validation logic to detect incompatible --transcript_fasta and --additional_fasta parameters when building pseudo-aligner indices, and implements a new error function with clear guidance |
| nextflow_schema.json | Updates help_text for both transcript_fasta and additional_fasta parameters to document the incompatibility when building pseudo-aligner indices |
| docs/usage.md | Enhances documentation with inline notes and a dedicated warning box explaining the parameter incompatibility and valid usage patterns |
| CHANGELOG.md | Documents the enhancement with proper attribution to the contributor and links to the relevant issue |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot the new error is firing even with some of the previously succeeding tests here. I think we may need to tweak the conditional |
|
@pinin4fjords I've opened a new pull request, #1634, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: pinin4fjords <[email protected]>
Add validation error for incompatible transcript_fasta and additional_fasta params
The test profile's transcript_fasta already contains spike-in sequences (GFP) from when it was generated. Setting only additional_fasta = null prevents GFP from being added to the GTF, but Kallisto still builds its index from the existing transcript_fasta (which has GFP). This causes a mismatch where Kallisto outputs GFP counts but the GTF-based metadata doesn't include GFP, causing SE_GENE_UNIFIED to fail. The fix is to also set transcript_fasta = null so the pipeline regenerates the transcriptome from the GTF (consistently without spike-ins). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Updated both real and stub test snapshots to reflect changes from disabling additional_fasta and transcript_fasta: Real test: - Task count: 48 → 47 - Removed CUSTOM_CATADDITIONALFASTA and GUNZIP_ADDITIONAL_FASTA - Added MAKE_TRANSCRIPTS_FASTA (pipeline now generates transcriptome) - Removed custom/out/genome_gfp.* from output files - Updated tx2gene.tsv hash (different without spike-ins) Stub test: - Task count: 22 → 21 - Same process changes as real test - Removed custom/out/genome_transcriptome.* from output files - Empty stable_path array 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
8440a84 to
8dc43b9
Compare
JoseEspinosa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Summary
--transcript_fastaand--additional_fastaare provided and the pipeline needs to build a pseudo-aligner indexProblem
When users provide both
--transcript_fastaand--additional_fastawithout a pre-built pseudo-aligner index, the pipeline proceeds but fails at the Salmon/Kallisto quantification step with a confusing error about missing transcripts. This is because:--additional_fastasequences get appended to the genome FASTA and GTF--transcript_fastais used as-is, without the spike-in sequencesSolution
Fail fast with a clear error message when:
--transcript_fastaand--additional_fastaare provided--pseudo_aligneris set,--skip_pseudo_alignmentis false)--salmon_indexor--kallisto_index)The combination is valid when a pre-built index is provided that already contains the spike-ins (e.g., the test profile provides
salmon_index).Test plan
salmon_index)--additional_fasta(no--transcript_fasta)--transcript_fasta+--additional_fasta+--salmon_indexCloses #1450
🤖 Generated with Claude Code