Skip to content

Samtools Outputs

RustQC produces output files compatible with three core samtools commands: flagstat, idxstats, and stats. These are generated during the same single-pass BAM scan as all other analyses, at zero additional runtime cost.

The output files are designed to be drop-in replacements that downstream tools (particularly MultiQC) can parse as if they came directly from samtools.

All samtools-compatible output files use the BAM file stem as a prefix and are written to a samtools/ subdirectory under the output directory. Use --flat-output to write all files directly to the output directory instead.

  • Directorysamtools/
    • sample.flagstat Alignment flag summary statistics
    • sample.idxstats Per-chromosome read counts
    • sample.stats Summary numbers (SN section)

File: <sample>.flagstat

A text file matching the samtools flagstat output format. Each line reports a count with the format <count> + 0 <description>. The 16 standard metrics are:

LineDescription
1Total reads (QC-passed + QC-failed)
2Primary reads
3Secondary reads
4Supplementary reads
5Duplicates
6Primary duplicates
7Mapped (with percentage of total)
8Primary mapped (with percentage of primary)
9Paired in sequencing
10Read 1
11Read 2
12Properly paired (with percentage of paired)
13With itself and mate mapped
14Singletons (with percentage of paired)
15With mate mapped to a different chr
16With mate mapped to a different chr (mapQ>=5)

The QC-failed column is always 0 (RustQC does not separate QC-pass/fail counts).

Example:

185718543 + 0 in total (QC-passed reads + QC-failed reads)
175097721 + 0 primary
10620822 + 0 secondary
0 + 0 supplementary
133912519 + 0 duplicates
133912519 + 0 primary duplicates
185718543 + 0 mapped (100.00% : N/A)
175097721 + 0 primary mapped (100.00% : N/A)

File: <sample>.idxstats

A tab-separated file matching samtools idxstats output format. Each line has four columns:

ColumnDescription
ref_nameReference sequence name
seq_lengthReference sequence length
mappedNumber of mapped reads
unmappedNumber of unmapped reads

All reference sequences from the BAM header are included, even those with zero reads. A final line with * as the reference name reports unplaced unmapped reads.

Example:

1 248956422 10019968 0
2 242193529 6988244 0
...
* 0 0 0

File: <sample>.stats

Produces the Summary Numbers (SN) section of samtools stats output. This is the section parsed by MultiQC for key alignment statistics. The file includes a comment header that MultiQC uses for format detection.

Key SN fields include:

MetricDescription
raw total sequencesPrimary reads (excluding supplementary/secondary)
reads mappedMapped primary reads
reads duplicatedDuplicate-flagged reads
reads properly pairedProperly paired reads
total lengthSum of all read lengths
bases mapped (cigar)Bases consumed by M/I/=X CIGAR operations
mismatchesMismatches from NM auxiliary tags
error rateMismatches / bases mapped (cigar)
average lengthMean read length
average qualityMean base quality
insert size averageMean insert size (from TLEN)
insert size standard deviationInsert size standard deviation
inward oriented pairsFR-oriented read pairs
outward oriented pairsRF-oriented read pairs
pairs on different chromosomesInter-chromosomal pairs

Example:

# This file was produced by samtools stats and RustQC
SN raw total sequences: 175097721
SN filtered sequences: 0
SN sequences: 175097721
SN reads mapped: 175097721

Each samtools-compatible output can be individually enabled or disabled:

flagstat:
enabled: true # Generate samtools flagstat output
idxstats:
enabled: true # Generate samtools idxstats output
samtools_stats:
enabled: true # Generate samtools stats SN output

All three are enabled by default. The underlying BAM statistics accumulator runs whenever any of bam_stat, flagstat, idxstats, or samtools_stats is enabled.

The output files are format-compatible with samtools and can be used interchangeably with MultiQC and other downstream tools. All integer counts (total reads, mapped, duplicates, etc.) match samtools exactly.

Minor differences may exist in derived floating-point metrics (insert size statistics, error rate) due to implementation differences in how samtools and RustQC handle edge cases in CIGAR parsing and pair orientation classification.

  • samtools: Danecek P, Bonfield JK, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):giab008. samtools website