Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Special thanks to the following for their contributions to the release:
- [Elad Herzog](https://github.com/EladH1)
- [Emily Miyoshi](https://github.com/emilymiyoshi)
- [Pontus Höjer](https://github.com/pontushojer)
- [Siddhartha Bagaria](https://github.com/siddharthab)

### Enhancements and fixes

Expand All @@ -28,6 +29,7 @@ Special thanks to the following for their contributions to the release:
- [PR #1624](https://github.com/nf-core/rnaseq/pull/1624) - Document RSeQC inner_distance limitation for genomes with large chromosomes (>500 Mb), such as plant genomes
- [PR #1625](https://github.com/nf-core/rnaseq/pull/1625) - Add documentation warning about Qualimap read counting bug ([#1273](https://github.com/nf-core/rnaseq/issues/1273))
- [PR #1628](https://github.com/nf-core/rnaseq/pull/1628) - Template update for nf-core/tools v3.5.1
- [PR #1632](https://github.com/nf-core/rnaseq/pull/1632) - Add validation error for incompatible `--transcript_fasta` and `--additional_fasta` params ([#1450](https://github.com/nf-core/rnaseq/issues/1450))
- [PR #1630](https://github.com/nf-core/rnaseq/pull/1630) - Fix arm64 profile to use pre-built ARM containers and update documentation
- [PR #1631](https://github.com/nf-core/rnaseq/pull/1631) - Fix bbsplit index staging by using symlinks instead of full copy
- [PR #1635](https://github.com/nf-core/rnaseq/pull/1635) - Fix `--gtf_extra_attributes` to support multiple comma-separated values and correct deprecated parameter name in docs ([#1626](https://github.com/nf-core/rnaseq/issues/1626))
Expand Down
6 changes: 5 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -312,7 +312,7 @@ Notes:

- If `--gff` is provided as input then this will be converted to a GTF file, or the latter will be used if both are provided.
- If `--gene_bed` is not provided then it will be generated from the GTF file.
- If `--additional_fasta` is provided then the features in this file (e.g. ERCC spike-ins) will be automatically concatenated onto both the reference FASTA file as well as the GTF annotation before building the appropriate indices.
- If `--additional_fasta` is provided then the features in this file (e.g. ERCC spike-ins) will be automatically concatenated onto both the reference FASTA file as well as the GTF annotation before building the appropriate indices. Note: if you need the pipeline to build a pseudo-aligner index (Salmon/Kallisto), `--additional_fasta` cannot be used together with `--transcript_fasta` because the pipeline cannot append additional sequences to a user-provided transcriptome. Either omit `--transcript_fasta` and let the pipeline generate it, or provide a pre-built index that already contains the spike-ins.
- When using `--aligner star_rsem`, the pipeline will build separate STAR and RSEM indices. STAR performs alignment with RSEM-compatible parameters, then RSEM quantifies from the resulting BAM files using `--alignments` mode.
- If the `--skip_alignment` option is used along with `--transcript_fasta`, the pipeline can technically run without providing the genomic FASTA (`--fasta`). However, this approach is **not recommended** with `--pseudo_aligner salmon`, as any dynamically generated Salmon index will lack decoys. To ensure optimal indexing with decoys, it is **highly recommended** to include the genomic FASTA (`--fasta`) with Salmon, unless a pre-existing decoy-aware Salmon index is supplied. For more details on the benefits of decoy-aware indexing, refer to the [Salmon documentation](https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode).

Expand Down Expand Up @@ -346,6 +346,10 @@ In addition to the reference genome sequence and annotation, you can provide a r

We recommend not providing a transcriptome FASTA file and instead allowing the pipeline to create it from the provided genome and annotation. Similar to aligner indexes, you can save the created transcriptome FASTA and BED files to a central location for future pipeline runs. This helps avoid redundant computation and having multiple copies on your system. Ensure that all genome, annotation, transcriptome, and index versions match to maintain consistency.

:::warning
If you are using `--additional_fasta` to add spike-in sequences (e.g. ERCC) and need the pipeline to build a pseudo-aligner index (Salmon/Kallisto), you **must not** provide `--transcript_fasta`. The pipeline needs to generate the transcriptome itself so that it includes the spike-in sequences. This combination will cause the pipeline to exit with an error unless you also provide a pre-built index (`--salmon_index` or `--kallisto_index`) that already contains the spike-in sequences.
:::

#### Indices

By default, indices are generated dynamically by the workflow for tools such as STAR and Salmon. Since indexing is an expensive process in time and resources you should ensure that it is only done once, by retaining the indices generated from each batch of reference files by specifying `--save_reference`.
Expand Down
5 changes: 3 additions & 2 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,8 @@
"mimetype": "text/plain",
"pattern": "^\\S+\\.fn?a(sta)?(\\.gz)?$",
"fa_icon": "far fa-file-code",
"description": "Path to FASTA transcriptome file."
"description": "Path to FASTA transcriptome file.",
"help_text": "If not provided, the transcriptome will be generated from the genome FASTA and GTF files. Cannot be used together with `--additional_fasta` when building a pseudo-aligner index, because the pipeline cannot append spike-in sequences to a user-provided transcriptome. Either omit this parameter or provide a pre-built index."
},
"additional_fasta": {
"type": "string",
Expand All @@ -121,7 +122,7 @@
"pattern": "^\\S+\\.fn?a(sta)?(\\.gz)?$",
"fa_icon": "far fa-file-code",
"description": "FASTA file to concatenate to genome FASTA file e.g. containing spike-in sequences.",
"help_text": "If provided, sequences in this file will be concatenated to the genome FASTA file. A GTF file will be automatically created using these sequences, and alignment indices will be created from the combined files. Use `--save_reference` to reuse these indices in future runs."
"help_text": "If provided, sequences in this file will be concatenated to the genome FASTA file. A GTF file will be automatically created using these sequences, and alignment indices will be created from the combined files. Use `--save_reference` to reuse these indices in future runs. Cannot be used together with `--transcript_fasta` when building a pseudo-aligner index - either omit `--transcript_fasta` or provide a pre-built index that already contains the spike-ins."
},
"splicesites": {
"type": "string",
Expand Down
39 changes: 39 additions & 0 deletions subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,23 @@ def validateInputParameters() {
}

if (params.transcript_fasta) {
// Only error if additional_fasta is provided AND we need to build a pseudo-aligner index
// (i.e., no pre-built salmon/kallisto index provided). If the user provides a pre-built
// index that already contains the spike-ins, the combination is valid.
if (params.additional_fasta) {
def needs_to_build_index = false
if (!params.skip_pseudo_alignment && params.pseudo_aligner) {
// Check if the relevant index for the selected pseudo-aligner is missing
if (params.pseudo_aligner == 'salmon' && !params.salmon_index) {
needs_to_build_index = true
} else if (params.pseudo_aligner == 'kallisto' && !params.kallisto_index) {
needs_to_build_index = true
}
}
if (needs_to_build_index) {
transcriptFastaAdditionalFastaError()
}
}
transcriptsFastaWarn()
}

Expand Down Expand Up @@ -496,6 +513,28 @@ def transcriptsFastaWarn() {
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
}

//
// Print an error if using both '--transcript_fasta' and '--additional_fasta' without a pre-built index
//
def transcriptFastaAdditionalFastaError() {
def error_string = "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" +
" Both '--transcript_fasta' and '--additional_fasta' have been provided,\n" +
" but no pre-built pseudo-aligner index (--salmon_index/--kallisto_index).\n\n" +
" The pipeline cannot append additional sequences (e.g. ERCC spike-ins) to a\n" +
" user-provided transcriptome FASTA file. This would cause quantification to\n" +
" fail because the built index would not contain the additional sequences.\n\n" +
" Please either:\n" +
" - Remove '--transcript_fasta' and let the pipeline generate the\n" +
" transcriptome from the genome FASTA and GTF (recommended), or\n" +
" - Provide a pre-built index (--salmon_index/--kallisto_index) that\n" +
" already contains the additional sequences, or\n" +
" - Remove '--additional_fasta' if you do not need spike-in sequences.\n\n" +
" Please see:\n" +
" https://github.com/nf-core/rnaseq/issues/1450\n" +
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
error(error_string)
}

//
// Print a warning if --skip_alignment has been provided
//
Expand Down
10 changes: 10 additions & 0 deletions tests/kallisto.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ nextflow_pipeline {
pseudo_aligner = 'kallisto'
skip_qc = true
skip_alignment = true
// Disable spike-ins since we don't have a kallisto_index with spike-ins.
// Must also disable transcript_fasta because the test profile's transcriptome
// was generated with spike-ins - we need the pipeline to regenerate it.
additional_fasta = null
transcript_fasta = null
}
}

Expand Down Expand Up @@ -46,6 +51,11 @@ nextflow_pipeline {
pseudo_aligner = 'kallisto'
skip_qc = true
skip_alignment = true
// Disable spike-ins since we don't have a kallisto_index with spike-ins.
// Must also disable transcript_fasta because the test profile's transcriptome
// was generated with spike-ins - we need the pipeline to regenerate it.
additional_fasta = null
transcript_fasta = null
}
}

Expand Down
38 changes: 11 additions & 27 deletions tests/kallisto.nf.test.snap
Original file line number Diff line number Diff line change
@@ -1,17 +1,14 @@
{
"Params: --pseudo_aligner kallisto --skip_qc --skip_alignment": {
"content": [
48,
47,
{
"BBMAP_BBSPLIT": {
"bbmap": 39.18
},
"CAT_FASTQ": {
"cat": 9.5
},
"CUSTOM_CATADDITIONALFASTA": {
"python": "3.12.2"
},
"CUSTOM_GETCHROMSIZES": {
"getchromsizes": 1.21
},
Expand All @@ -30,9 +27,6 @@
"GTF_FILTER": {
"python": "3.9.5"
},
"GUNZIP_ADDITIONAL_FASTA": {
"gunzip": 1.13
},
"GUNZIP_GTF": {
"gunzip": 1.13
},
Expand All @@ -42,6 +36,10 @@
"KALLISTO_QUANT": {
"kallisto": "0.51.1"
},
"MAKE_TRANSCRIPTS_FASTA": {
"rsem": "1.3.1",
"star": "2.7.10a"
},
"SALMON_QUANT": {
"salmon": "1.10.3"
},
Expand Down Expand Up @@ -70,10 +68,6 @@
"bbsplit/RAP1_UNINDUCED_REP2.stats.txt",
"bbsplit/WT_REP1.stats.txt",
"bbsplit/WT_REP2.stats.txt",
"custom",
"custom/out",
"custom/out/genome_gfp.fasta",
"custom/out/genome_gfp.gtf",
"fastqc",
"fastqc/trim",
"fastqc/trim/RAP1_IAA_30M_REP1_trimmed_1_val_1_fastqc.html",
Expand Down Expand Up @@ -248,9 +242,7 @@
"trimgalore/WT_REP2_trimmed_2.fastq.gz_trimming_report.txt"
],
[
"genome_gfp.fasta:md5,e23e302af63736a199985a169fdac055",
"genome_gfp.gtf:md5,c98b12c302f15731bfc36bcf297cfe28",
"tx2gene.tsv:md5,0e2418a69d2eba45097ebffc2f700bfe",
"tx2gene.tsv:md5,1be389a28cc26d94b19ea918959ac72e",
"cutadapt_filtered_reads_plot.txt:md5,6fa381627f7c1f664f3d4b2cb79cce90",
"cutadapt_trimmed_sequences_plot_3_Counts.txt:md5,13dfa866fd91dbb072689efe9aa83b1f",
"cutadapt_trimmed_sequences_plot_3_Obs_Exp.txt:md5,07145dd8dd3db654859b18eb0389046c",
Expand All @@ -277,17 +269,14 @@
},
"Params: --pseudo_aligner kallisto --skip_qc --skip_alignment - stub": {
"content": [
22,
21,
{
"BBMAP_BBSPLIT": {
"bbmap": 39.18
},
"CAT_FASTQ": {
"cat": 9.5
},
"CUSTOM_CATADDITIONALFASTA": {
"python": null
},
"CUSTOM_GETCHROMSIZES": {
"getchromsizes": 1.21
},
Expand All @@ -300,15 +289,16 @@
"GTF_FILTER": {
"python": "3.9.5"
},
"GUNZIP_ADDITIONAL_FASTA": {
"gunzip": 1.13
},
"GUNZIP_GTF": {
"gunzip": 1.13
},
"KALLISTO_INDEX": {
"kallisto": "0.51.1"
},
"MAKE_TRANSCRIPTS_FASTA": {
"rsem": "1.3.1",
"star": "2.7.10a"
},
"TRIMGALORE": {
"cutadapt": 4.9,
"pigz": 2.8,
Expand All @@ -319,10 +309,6 @@
}
},
[
"custom",
"custom/out",
"custom/out/genome_transcriptome.fasta",
"custom/out/genome_transcriptome.gtf",
"fastqc",
"fastqc/trim",
"fq_lint",
Expand All @@ -349,8 +335,6 @@
"trimgalore/WT_REP2_trimmed_2.fastq.gz_trimming_report.txt"
],
[
"genome_transcriptome.fasta:md5,d41d8cd98f00b204e9800998ecf8427e",
"genome_transcriptome.gtf:md5,d41d8cd98f00b204e9800998ecf8427e"
]
],
"meta": {
Expand Down