Commit 0991b10
authored
fix(trimgalore): stack process_low_memory on top of process_low (#11541)
* fix(trimgalore): stack process_low_memory on top of process_low
Now that nf-core/tools#4264 has merged and the pipeline-template
`base.config` defines `process_low_memory` (1 GB * task.attempt),
the trimgalore module can drop its memory ceiling without giving up
the cpu/time bands set by process_low.
Composed budget on first attempt:
- cpus = 2 (process_low; 1 worker thread paired)
- memory = 1 GB (process_low_memory; ~10x observed peak_rss of ~100 MB on 30M PE)
- time = 4 h (process_low; ~160x observed runtime of ~1.5 min on 30M PE)
The tool itself caps trim_galore --cores at 8 worker threads, so
allocating more cpus is wasted unless paired with very large inputs;
pipelines that genuinely need that throughput can override per-process
or stack `process_high` instead. This is the conservative default
that satisfies Felix's original 100 MB / 10 min ask while keeping
existing 1-worker-paired throughput intact for typical use.
* fix(trimgalore): drop redundant process_low label
The template's process default is cpus=1 / time=4.h, which is what
process_low_memory already inherits when no other resource label is
applied. process_low's cpus=2 changes nothing useful here:
trim_galore's --cores formula (`cores = max(1, task.cpus - 4)` paired)
saturates at 1 worker thread for any cpus <= 5, so 1 vs 2 cpus gives
the same worker count. Drop the redundant label.
* fix(trimgalore): use process_medium for cpus, keep process_low_memory
Stack process_medium with process_low_memory so the cpu allocation
is enough to give trim_galore 2 paired worker threads on production
data, while keeping the memory ceiling tight at 1 GB.
trim_galore's --cores formula derives worker count from task.cpus:
cores = max(1, task.cpus - 4) // paired
cores = max(1, task.cpus - 3) // single (capped at 8)
Implications:
- cpus = 1 (default) -> 1 worker thread paired
- cpus = 2 (process_low) -> 1 worker thread paired (no gain)
- cpus = 6 (process_medium) -> 2 worker threads paired (~2x throughput)
- cpus = 12 (process_high) -> 8 worker threads paired (saturates the cap)
process_medium is the cheapest band that crosses the 1->2 worker
threshold, where actual trimming throughput first improves. Useful for
real production inputs (100M+ PE) where the extra worker materially
shortens wall time; small test data won't notice the difference. Memory
stays at process_low_memory's 1 GB regardless.1 parent 43fea19 commit 0991b10
1 file changed
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
| 4 | + | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
| |||
0 commit comments