Skip to content

Conversation

@pinin4fjords
Copy link
Member

@pinin4fjords pinin4fjords commented Nov 25, 2025

Summary

  • Add validation error when both --transcript_fasta and --additional_fasta are provided and the pipeline needs to build a pseudo-aligner index
  • Update documentation to clarify this incompatibility
  • Update schema help text for both parameters

Problem

When users provide both --transcript_fasta and --additional_fasta without a pre-built pseudo-aligner index, the pipeline proceeds but fails at the Salmon/Kallisto quantification step with a confusing error about missing transcripts. This is because:

  1. --additional_fasta sequences get appended to the genome FASTA and GTF
  2. But the user-provided --transcript_fasta is used as-is, without the spike-in sequences
  3. The pseudo-aligner index is built from the transcript FASTA (missing spike-ins)
  4. Alignments to spike-in sequences then can't be quantified

Solution

Fail fast with a clear error message when:

  • Both --transcript_fasta and --additional_fasta are provided
  • AND pseudo-alignment is enabled (--pseudo_aligner is set, --skip_pseudo_alignment is false)
  • AND no pre-built index is provided (--salmon_index or --kallisto_index)

The combination is valid when a pre-built index is provided that already contains the spike-ins (e.g., the test profile provides salmon_index).

Test plan

  • Verify existing tests pass (test profile has both params but also provides salmon_index)
  • Verify pipeline errors out with clear message when both params are provided without a pre-built index
  • Verify pipeline runs normally with only --additional_fasta (no --transcript_fasta)
  • Verify pipeline runs normally with --transcript_fasta + --additional_fasta + --salmon_index

Closes #1450

🤖 Generated with Claude Code

pinin4fjords and others added 2 commits November 25, 2025 19:59
…_fasta params

When both --transcript_fasta and --additional_fasta are provided, the
pipeline cannot append spike-in sequences to the user-provided
transcriptome. This causes downstream quantification to fail with
confusing errors.

Now the pipeline fails fast with a clear error message explaining the
issue and suggesting solutions.

Closes #1450

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@github-actions
Copy link

github-actions bot commented Nov 25, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit ef60e40

+| ✅ 284 tests passed       |+
#| ❔   8 tests were ignored |#
#| ❔   1 tests had warnings |#
!| ❗   9 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: assets/multiqc_config.yml
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
  • pipeline_if_empty_null - ifEmpty(null) found in /home/runner/work/rnaseq/rnaseq/subworkflows/local/prepare_genome/main.nf: _ versions = ch_versions.ifEmpty(null) // channel: [ versions.yml ]
    _

❔ Tests ignored:

❔ Tests fixed:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2025-11-25 22:46:14

pinin4fjords and others added 3 commits November 25, 2025 20:11
The validation was too strict - it blocked all combinations of
transcript_fasta + additional_fasta, but this is only problematic
when the pipeline needs to BUILD a pseudo-aligner index (Salmon/Kallisto).

If a pre-built index is provided that already contains the spike-ins,
the combination is valid. Updated validation logic to check for this
condition and updated docs/schema accordingly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@pinin4fjords
Copy link
Member Author

@copilot I need to add a fix to the changes here. The salmon index is only relevant if the pseudoaligner is 'salmon', similarly for kallisto

Copy link
Contributor

Copilot AI commented Nov 25, 2025

@pinin4fjords I've opened a new pull request, #1633, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 3 commits November 25, 2025 20:55
Add validation error for incompatible transcript_fasta and additional_fasta params
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds validation to prevent a confusing error that occurs when users provide both --transcript_fasta and --additional_fasta without a pre-built pseudo-aligner index. The validation ensures the pipeline fails fast with a clear, actionable error message instead of failing later during quantification.

Key changes:

  • Added early validation in validateInputParameters() to detect the incompatible parameter combination
  • Added a clear error message function with actionable solutions for users
  • Updated schema help text for both parameters to document the incompatibility
  • Enhanced documentation in usage.md with inline notes and a dedicated warning box

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf Adds validation logic to detect incompatible --transcript_fasta and --additional_fasta parameters when building pseudo-aligner indices, and implements a new error function with clear guidance
nextflow_schema.json Updates help_text for both transcript_fasta and additional_fasta parameters to document the incompatibility when building pseudo-aligner indices
docs/usage.md Enhances documentation with inline notes and a dedicated warning box explaining the parameter incompatibility and valid usage patterns
CHANGELOG.md Documents the enhancement with proper attribution to the contributor and links to the relevant issue

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pinin4fjords
Copy link
Member Author

@copilot the new error is firing even with some of the previously succeeding tests here. I think we may need to tweak the conditional

Copy link
Contributor

Copilot AI commented Nov 25, 2025

@pinin4fjords I've opened a new pull request, #1634, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 4 commits November 25, 2025 21:24
Add validation error for incompatible transcript_fasta and additional_fasta params
The test profile's transcript_fasta already contains spike-in sequences
(GFP) from when it was generated. Setting only additional_fasta = null
prevents GFP from being added to the GTF, but Kallisto still builds its
index from the existing transcript_fasta (which has GFP). This causes a
mismatch where Kallisto outputs GFP counts but the GTF-based metadata
doesn't include GFP, causing SE_GENE_UNIFIED to fail.

The fix is to also set transcript_fasta = null so the pipeline
regenerates the transcriptome from the GTF (consistently without
spike-ins).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Updated both real and stub test snapshots to reflect changes from
disabling additional_fasta and transcript_fasta:

Real test:
- Task count: 48 → 47
- Removed CUSTOM_CATADDITIONALFASTA and GUNZIP_ADDITIONAL_FASTA
- Added MAKE_TRANSCRIPTS_FASTA (pipeline now generates transcriptome)
- Removed custom/out/genome_gfp.* from output files
- Updated tx2gene.tsv hash (different without spike-ins)

Stub test:
- Task count: 22 → 21
- Same process changes as real test
- Removed custom/out/genome_transcriptome.* from output files
- Empty stable_path array

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@pinin4fjords pinin4fjords force-pushed the fix/transcript-fasta-additional-fasta-validation branch from 8440a84 to 8dc43b9 Compare November 25, 2025 22:42
Copy link
Member

@JoseEspinosa JoseEspinosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@pinin4fjords pinin4fjords merged commit ba62e56 into dev Nov 26, 2025
62 of 64 checks passed
@pinin4fjords pinin4fjords deleted the fix/transcript-fasta-additional-fasta-validation branch November 26, 2025 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Warn user if both transcript_fasta and additional_fasta are provided.

3 participants