featureCounts Outputs
How it works
Section titled “How it works”RustQC performs gene-level read counting in the same pass as the dupRadar duplication rate analysis and all RSeQC-equivalent metrics. During BAM processing, RustQC simultaneously:
- Counts reads assigned to each gene (using the same algorithm as featureCounts)
- Tracks duplication rates for the dupRadar analysis
- Computes RSeQC-equivalent quality metrics (strandedness, read distribution, junctions, etc.)
- Aggregates counts by gene biotype
This single-pass approach eliminates the need for separate featureCounts and RSeQC runs, which are required in the traditional workflow. The featureCounts output is produced at zero additional runtime cost.
RustQC follows the same algorithm as Subread featureCounts with these defaults:
- Feature type:
exon - Attribute:
gene_id - Overlap detection: at least 1 base overlap
- Multi-mapping reads: counted (for multi columns), excluded (for unique columns)
- Strand-aware counting based on the
-s/--strandednessflag
The key difference is that RustQC performs counting and duplicate analysis in a single pass, while the traditional workflow requires running featureCounts separately before dupRadar.
Output files
Section titled “Output files”All featureCounts output files use the BAM file stem as a prefix and are written to a featurecounts/ subdirectory under the output directory. Use --flat-output to write all files directly to the output directory instead. Each output can be individually enabled or disabled via the configuration file.
Directoryfeaturecounts/
- sample.featureCounts.tsv Per-gene read counts
- sample.featureCounts.tsv.summary Assignment summary statistics
- sample.biotype_counts.tsv Per-biotype read counts
- sample.biotype_counts_mqc.tsv MultiQC biotype bargraph
- sample.biotype_counts_rrna_mqc.tsv MultiQC rRNA percentage
Counts file
Section titled “Counts file”File: <sample>.featureCounts.tsv
A tab-separated counts file compatible with the Subread featureCounts output format. Contains per-gene read counts with annotation columns:
| Column | Description |
|---|---|
Geneid | Gene identifier |
Chr | Chromosome(s) |
Start | Start position(s) |
End | End position(s) |
Strand | Strand(s) |
Length | Gene length |
<sample> | Read count for this sample |
The file includes a header comment line with the command used to generate it:
# Program:RustQC v0.1.0; Command: rustqc rna sample.bam --gtf genes.gtf -pGeneid Chr Start End Strand Length sample.bamENSG00000000003 chrX 100627108;100629986;... 100636806;100637104;... -;-;... 3768 521ENSG00000000005 chrX 100584936;100585053;... 100585091;100599885;... +;+;... 1339 0This file is directly compatible with downstream tools that accept featureCounts output, such as DESeq2 and MultiQC.
Summary file
Section titled “Summary file”File: <sample>.featureCounts.tsv.summary
Assignment summary statistics in featureCounts format, reporting the number of reads in each category:
Status sample.bamAssigned 22812Unassigned_Unmapped 0Unassigned_NoFeatures 1227Unassigned_Ambiguous 2395- Assigned — reads successfully assigned to a gene
- Unassigned_Unmapped — unmapped reads
- Unassigned_NoFeatures — reads not overlapping any gene
- Unassigned_Ambiguous — reads overlapping multiple genes
Biotype counts
Section titled “Biotype counts”File: <sample>.biotype_counts.tsv
A tab-separated file with per-biotype read counts, sorted alphabetically by biotype name:
protein_coding 12345lncRNA 678rRNA 90The biotype is extracted from the GTF attribute specified by biotype_attribute (default: gene_biotype for Ensembl, gene_type for GENCODE).
Biotype MultiQC files
Section titled “Biotype MultiQC files”<sample>.biotype_counts_mqc.tsv— Biotype counts formatted as a MultiQC bargraph data file, suitable for visualizing the distribution of reads across biotypes.<sample>.biotype_counts_rrna_mqc.tsv— rRNA percentage formatted as a MultiQC general statistics value, reporting the fraction of assigned reads mapping to rRNA genes.
Biotype attribute detection
Section titled “Biotype attribute detection”RustQC automatically detects the biotype attribute in your GTF file:
- Ensembl GTFs use
gene_biotype - GENCODE GTFs use
gene_type
If neither attribute is found, biotype outputs are skipped with a warning. You can override the auto-detection with the --biotype-attribute CLI flag or the featurecounts.biotype_attribute config option.
Configuring outputs
Section titled “Configuring outputs”Use a YAML configuration file to control which featureCounts outputs are generated. All outputs are enabled by default. Set any to false to skip generation:
featurecounts: counts_file: true # .featureCounts.tsv summary_file: true # .featureCounts.tsv.summary biotype_counts: true # .biotype_counts.tsv biotype_counts_mqc: true # .biotype_counts_mqc.tsv biotype_rrna_mqc: true # .biotype_rrna_mqc.tsv biotype_attribute: "gene_biotype" # GTF attribute for biotype groupingDifferences from Subread featureCounts
Section titled “Differences from Subread featureCounts”RustQC’s read counting follows the same algorithm as featureCounts with the same default settings (exon features, gene_id attribute, 1bp overlap, strand-aware counting). The outputs are format-compatible and can be used interchangeably in downstream pipelines.
The primary difference is architectural: RustQC performs counting, dupRadar analysis, and RSeQC-equivalent metrics in a single pass over the BAM file, while the traditional workflow requires running featureCounts, dupRadar, and RSeQC as separate steps. This means the featureCounts output comes at no additional runtime cost when running RustQC.
RustQC also produces biotype count summaries and MultiQC-compatible files that would require additional scripting in the traditional workflow.