Skip to content

Samtools Benchmarks

RustQC produces samtools-compatible output files (flagstat, idxstats, stats SN section) as part of its single-pass BAM processing. This page compares the output of each tool against the originals on a 10 GB paired-end BAM file (GM12878, ENCODE).

All samtools-compatible outputs run together in the same single-pass BAM reading that also produces dupRadar, featureCounts, RSeQC, preseq, TIN, and gene body coverage output.

ToolTraditionalRustQC
samtools flagstat~1m
samtools idxstats~1s
samtools stats~5m
All outputs (single pass)5m 13s

Result: Identical

All 16 flagstat metrics match exactly between samtools flagstat and RustQC:

MetricsamtoolsRustQC
Total reads185,718,543185,718,543
Primary175,097,721175,097,721
Secondary10,620,82210,620,822
Supplementary00
Duplicates133,912,519133,912,519
Primary duplicates133,912,519133,912,519
Mapped185,718,543 (100.00%)185,718,543 (100.00%)
Primary mapped175,097,721 (100.00%)175,097,721 (100.00%)
Paired in sequencing175,097,721175,097,721
Read 187,559,39287,559,392
Read 287,538,32987,538,329
Properly paired174,840,354 (99.85%)174,840,354 (99.85%)
With itself and mate mapped174,840,354174,840,354
Singletons257,367 (0.15%)257,367 (0.15%)
Mate on different chr00
Mate on different chr (mapQ>=5)00

The output format is fully compatible with MultiQC and other tools that parse samtools flagstat output.

Result: Identical

Per-chromosome read counts match exactly across all 25 reference sequences. Both files include the same reference names, lengths, mapped counts, and unmapped counts, plus the * row for unplaced reads.

Result: Core counts identical, derived metrics have minor differences

RustQC produces the SN (Summary Numbers) section of samtools stats, which is the section parsed by MultiQC for key alignment statistics.

All primary count fields match exactly:

MetricValue
raw total sequences175,097,721
filtered sequences0
sequences175,097,721
1st fragments87,559,392
last fragments87,538,329
reads mapped175,097,721
reads mapped and paired174,840,354
reads properly paired174,840,354
reads paired175,097,721
reads duplicated133,912,519
reads MQ0345,072
non-primary alignments10,620,822
total length17,319,219,330
bases mapped17,319,219,330
bases duplicated13,211,215,206
mismatches53,181,882
MetricsamtoolsRustQCNotes
bases mapped (cigar)17,220,065,35117,218,285,7340.01% difference in CIGAR base counting
average length9998Rounding (true average ~98.9)
average quality36.436.3Rounding difference
insert size average1553.33084.6Different outlier filtering strategy
insert size stdev2276.615028.6Different outlier filtering strategy

The insert size differences are due to samtools applying iterative outlier filtering (removing pairs beyond mean +/- N*stdev, then recomputing), while RustQC uses all properly paired reads. Both approaches are valid; the insert size statistics are informational rather than used for downstream analysis.

The output format includes the samtools stats header comment required for MultiQC parsing.

  • BAM file: GM12878 ENCODE RNA-Seq, paired-end, ~185M reads, 10 GB
  • samtools version: 1.23 (via Docker)
  • RSeQC version: 5.0.4 (via Docker)
  • RustQC: 10 threads, --gtf mode
  • Hardware: Apple M3 Max, 128 GB RAM