Skip to content

RSeQC Benchmarks

RustQC reimplements 7 RSeQC tools plus TIN (Transcript Integrity Number) analysis as built-in analyses that run in a single pass over the BAM file. It also produces Qualimap-compatible gene body coverage output. This page compares the output of each tool against the originals on a 10 GB paired-end BAM file (GM12878, ENCODE).

All RSeQC tools run together in the same single-pass BAM reading that also produces dupRadar, featureCounts, preseq, samtools, and gene body coverage output. The timings below show the original Python RSeQC tools run individually versus RustQC running all tools in a single pass with 10 threads.

ToolRSeQC (Python)RustQC
bam_stat6m 07s
infer_experiment7s
read_duplication29m 43s
read_distribution6m 00s
junction_annotation4m 37s
junction_saturation6m 32s
inner_distance1m 09s
tin.py~45m
Total (RSeQC only)~99m
All outputs5m 13s

RustQC produces all RSeQC outputs (plus dupRadar, featureCounts, preseq, samtools, and gene body coverage) in under 6 minutes — a ~19x speedup over running the RSeQC tools alone.

Summarises key alignment statistics from the BAM file.

All values are identical between RSeQC and RustQC:

MetricRSeQCRustQC
Total records185,718,543185,718,543
QC failed00
Duplicates133,912,519133,912,519
Non-primary10,620,82210,620,822
Mapped (MAPQ >= cutoff)39,827,09939,827,099
Splice reads12,013,93912,013,939
Non-splice reads27,813,16027,813,160
Proper pairs39,797,92639,797,926

Infers the strandedness of an RNA-Seq experiment by sampling reads and comparing their strand to annotated gene models.

MetricRSeQCRustQC
Failed to determine0.06660.0667
Fraction sense (1++,1—,2+-,2-+)0.01160.0116
Fraction antisense (1+-,1-+,2++,2—)0.92180.9218

The strandedness fractions match to 4 decimal places. The 0.0001 difference in the “failed” fraction is last-digit rounding.

Calculates how mapped reads are distributed over genomic features (CDS, UTR, introns, intergenic regions).

All values are identical:

FeatureRSeQC TagsRustQC TagsRSeQC Tags/KbRustQC Tags/Kb
CDS Exons33,261,44533,261,445939.13939.13
5’ UTR Exons2,634,3962,634,39676.2576.25
3’ UTR Exons8,459,9788,459,978150.40150.40
Introns7,390,6377,390,6375.125.12
TSS up 1kb65,63665,636
TSS up 5kb149,678149,678
TSS up 10kb206,943206,943
TES down 1kb147,759147,759
TES down 5kb373,552373,552
TES down 10kb447,114447,114

Calculates read duplication rates at both the sequence and mapping position level.

The Python RSeQC reference files for the large benchmark were empty (0 bytes) due to the tool running out of memory with this dataset. RustQC produces complete output with no memory issues, generating both position-based and sequence-based duplication rate tables.

Classifies observed splice junctions as known, partially novel, or completely novel by comparing against annotated transcript models.

All values are identical:

MetricRSeQCRustQC
Splice events
Total events13,065,66513,065,665
Known12,805,96712,805,967
Partially novel146,659146,659
Novel91,61491,614
Splice junctions
Total junctions256,466256,466
Known178,797178,797
Partially novel50,93650,936
Novel26,73326,733
RSeQCRSeQC splice events plot
RustQCRustQC splice events plot
RSeQCRSeQC splice junction plot
RustQCRustQC splice junction plot

Evaluates whether sequencing depth is sufficient to detect all splice junctions by subsampling reads at increasing fractions.

At 100% sampling depth:

MetricRSeQCRustQC
Total junctions256,466256,392
Known junctions163,710163,710
Novel junctions92,75692,682

The known junction count matches exactly. The 74-junction difference in total/novel is due to random sampling order — junction_saturation uses randomized subsampling, and different shuffling produces slightly different totals at each sampling fraction.

RSeQCRSeQC junction saturation plot
RustQCRustQC junction saturation plot

Calculates the inner distance between paired-end reads for fragment size estimation, classifying pairs by transcript and exon membership.

Read classifications (sameTranscript, sameExon, overlap type) match between RSeQC and RustQC. Inner distance values show occasional ±1 bp differences due to minor CIGAR alignment length calculation differences. The overall frequency distribution is consistent between both tools.

RSeQCRSeQC inner distance plot
RustQCRustQC inner distance plot

RustQC reimplements RSeQC’s tin.py, producing per-gene TIN scores that measure transcript integrity via Shannon entropy of read coverage uniformity.

MetricRustQC
Genes analysed16,499
TIN mean72.55
TIN median83.89
TIN stdev26.14

The TIN analysis completes as part of the single-pass BAM processing. The original RSeQC tin.py requires a separate full BAM pass and typically takes 45+ minutes for a 10 GB file.

RustQC produces gene-level TIN scores (one row per gene, using the longest transcript as representative), while RSeQC’s tin.py produces transcript-level scores. Both formats are compatible with MultiQC. The gene-level approach provides a cleaner summary with one score per gene.

RustQC produces Qualimap-compatible gene body coverage output, including:

  • A 100-bin coverage profile along normalized gene body positions (5’ to 3’)
  • A Qualimap rnaseq_qc_results.txt file parseable by MultiQC
MetricValue
5’ bias1.20
3’ bias0.92
5’-3’ bias1.24

These metrics help identify degradation patterns — a 5’-3’ bias significantly above 1.0 may indicate RNA degradation, while values near 1.0 indicate uniform coverage.

  • BAM file: GM12878 ENCODE RNA-Seq, paired-end, ~185M reads, 10 GB
  • Annotation: Ensembl BED12 gene model (same file used by both RSeQC and RustQC); Ensembl GTF used for TIN and gene body coverage (require gene-level annotation)
  • RSeQC version: 5.0.4 (run via Docker on Apple M3 Max with Rosetta x86 emulation)
  • RustQC: 10 threads, --gtf mode for TIN/genebody, --bed mode for RSeQC tools
  • Hardware: Apple M3 Max, 128 GB RAM