Skip to content

Benchmarks

55.8M paired-end reads (SRR24827378, whole-genome bisulfite sequencing). All outputs verified byte-identical (decompressed) across all core counts via md5 checksums.

Wall Time Comparison

Thread Overhead & CPU Efficiency

Scaling Comparison

Server benchmark: Intel Xeon 6975P-C (32 vCPU)

Section titled “Server benchmark: Intel Xeon 6975P-C (32 vCPU)”

Trim Galore (Perl 5.38 + Cutadapt 5.2 + pigz 2.8 + igzip/ISA-L for decompression)

Section titled “Trim Galore (Perl 5.38 + Cutadapt 5.2 + pigz 2.8 + igzip/ISA-L for decompression)”
-jWall timeCPU timeMemoryThreads (observed)
130:05 (1,805s)3,001s21 MBup to ~6
28:43 (523s)2,009s39 MBup to ~9
44:33 (273s)2,010s42 MBup to ~15
82:51 (171s)2,040s61 MBup to ~27

Trim Galore Oxidized Edition (Rust, zlib-rs)

Section titled “Trim Galore Oxidized Edition (Rust, zlib-rs)”
--coresWall timeCPU timeMemoryThreads (deterministic)
19:59 (599s)599s5 MB1
26:06 (366s)780s43 MB6
43:02 (182s)771s62 MB8
81:32 (92s)784s100 MB12
160:48 (48s)814s171 MB20
240:39 (39s)874s157 MB28
CoresTG wallOxidized wallWall speedupTG CPUOxidized CPUCPU savings
11,805s599s3.0x3,001s599s5.0x
4273s182s1.5x2,010s771s2.6x
8171s92s1.9x2,040s784s2.6x

Production comparison: nf-core default (--cores 8)

Section titled “Production comparison: nf-core default (--cores 8)”

In nf-core pipelines, Trim Galore is typically allocated 12 CPUs (process_high) and run with -j 8 (the module subtracts 4 for overhead). With -j 8, TG spawns up to ~27 threads across Cutadapt workers, pigz compression, and pigz/igzip decompression. nf-core installs TG from bioconda, which includes igzip (Intel ISA-L) for decompression.

TG -j 8Oxidized --cores 4Oxidized --cores 8Oxidized --cores 24
Wall time171s182s92s (1.9x faster)39s (4.4x faster)
CPU time2,040s771s (2.6x less)784s (2.6x less)874s (2.3x less)
Threadsup to ~2781228
Memory61 MB62 MB100 MB157 MB

Three ways to read this:

  • Same speed, fewer resources: Oxidized --cores 4 (8 threads) matches TG -j 8 (up to ~27 threads) in wall time, using 2.6x less CPU and a third of the threads.
  • Same resources, much faster: Oxidized --cores 8 uses 12 threads (fewer than TG's ~27) and is nearly twice as fast.
  • Comparable thread budget, 4.4x faster: Oxidized --cores 24 (28 threads) vs TG -j 8 (up to ~27 threads). Finishes in 39 seconds vs 171 seconds, using 2.3x less CPU.

Trim Galore (Perl 5.34 + Cutadapt 4.9 + pigz)

Section titled “Trim Galore (Perl 5.34 + Cutadapt 4.9 + pigz)”
-jWall timeCPU time
127:04 (1,624s)2,536s
213:50 (830s)2,741s
47:15 (435s)2,868s
--coresWall timeCPU timeSpeedup vs TG -j 1
111:46 (706s)695s2.3x
27:16 (436s)908s3.7x
43:46 (226s)936s7.2x
62:35 (155s)957s10.5x
82:03 (123s)994s13.2x

CPU time is what cloud providers bill for and what drives energy consumption. The Oxidized Edition uses 2.3 to 5x less CPU time than Trim Galore for the same job:

ScenarioTG CPU timeOxidized CPU timeCPU savings
Single-threaded3,001s599s5.0x
8 cores (nf-core default)2,040s784s2.6x

On AWS at $0.05/vCPU-hour, trimming 56M PE reads costs roughly $0.028 with TG vs $0.011 with Oxidized (at 8 cores) — a 2.6× saving per sample. Across a 1000-sample cohort that scales to **$28 vs ~$11**, with proportional savings in carbon footprint and shared-cluster CPU-hour pressure.

  • Timing: All wall time, CPU time, and peak memory measured via /usr/bin/time -v.
  • Thread counts (TG): Observed via ps during execution. These are approximate peak values, as threads are spawned across three independent subprocesses (Cutadapt, pigz, pigz/igzip) whose lifetimes may not fully overlap.
  • Thread counts (Oxidized): Deterministic from the architecture: exactly N+4 threads for --cores N (N workers + 2 decompressors + 1 batcher + 1 writer), or exactly 1 thread for --cores 1.
  • igzip: The bioconda Trim Galore installation includes igzip (Intel ISA-L) for fast single-threaded decompression. Benchmarks were re-run with igzip to match the nf-core production environment; the difference was <1% (decompression is not the bottleneck; compression is).
  • Outputs verified: All outputs were confirmed byte-identical (decompressed) between TG and Oxidized across all core counts via md5 checksums.

For the architectural reasons behind the numbers (single-pass vs three-pass architecture, worker-pool parallelism, thread-budget breakdown), see Threading model.