fix(trimgalore): drop process label to process_low#11531
Merged
SPPearce merged 1 commit intonf-core:masterfrom May 5, 2026
Merged
fix(trimgalore): drop process label to process_low#11531SPPearce merged 1 commit intonf-core:masterfrom
SPPearce merged 1 commit intonf-core:masterfrom
Conversation
trim-galore 2.x is a Rust binary that streams reads, so memory stays flat with input size. Empirical 30M PE benchmark (rnaseq pipeline): - peak_rss ~100 MB - realtime ~1.5 min median, ~2 min max The previous `process_high` label (12 cpus / 72 GB / 16 h) is massively over-provisioned for the new implementation and starves shared HPC schedulers. `process_low` (2 cpus / 12 GB / 4 h, scaling with task.attempt) gives ~120x memory headroom and ~80x runtime headroom over observed peaks at 30M PE, comfortably absorbing the 200M+ PE inputs that pipelines actually see in production. The script's own `--cores` calculation derives worker count from `task.cpus` and caps at 8, so allocating more than the label's 2 cpus (which yields 1 worker thread paired) gives diminishing returns; users with bespoke needs can still override `cpus` downstream.
SPPearce
approved these changes
May 5, 2026
Member
Author
|
Thanks @SPPearce ! |
Contributor
|
As a comment on this, the invocation was
Please note that the threading model is Using |
Contributor
Uh? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Drops the TRIMGALORE process label from
process_high(12 cpus / 72 GB / 16 h) toprocess_low(2 cpus / 12 GB / 4 h, scaling withtask.attempt).Why
trim-galore 2.x is a Rust binary that streams reads, so memory stays flat with input size rather than scaling with read count. The
process_highceiling was inherited from the Perl-based 0.6.x era and is now massively over-provisioned, starving shared HPC schedulers for no benefit.Empirical data (30M PE on nf-core/rnaseq)
process_lowbudgetThe script already auto-derives
--coresfromtask.cpusand caps the worker count at 8, so over-allocating cpus doesn't help anyway.Why
process_lowand notprocess_single?process_single(1 cpu / 6 GB / 4 h) is the only smaller standard bucket. For trim_galore's worker-thread math both yield 1 worker (sincecores = max(1, task.cpus - 4)on paired), so the trimming parallelism is identical. The runtime difference comes from the surrounding I/O pipeline:process_singlealso semantically signals "single-threaded by nature" (utilities, parsers, R scripts). trim_galore is genuinely multi-process even when only running one trimming worker, soprocess_lowreads more honestly for "small resource ceiling, still parallel I/O".Users with bespoke needs (huge inputs, custom adapter detection, etc.) can still override resources at the pipeline level.
What's not changing
Test plan