Configuration File
The rustqc rna subcommand supports an optional YAML configuration file for
advanced settings that go beyond what CLI flags offer. Pass the config file with
--config / -c:
rustqc rna sample.bam --gtf genes.gtf -p -c config.yaml -o results/# or with BED (RSeQC tools only)rustqc rna sample.bam --bed genes.bed -p -c config.yaml -o results/All sections and fields are optional. Missing fields use their default values. Unknown fields are silently ignored, so config files remain forward-compatible.
Full example
Section titled “Full example”# Chromosome name remappingchromosome_prefix: "chr"chromosome_mapping: chrM: "MT"
# dupRadar output togglesdupradar: dup_matrix: true intercept_slope: true density_scatter_plot: true boxplot: true expression_histogram: true multiqc_intercept: true multiqc_curve: true
# featureCounts output togglesfeaturecounts: counts_file: true summary_file: true biotype_counts: true biotype_counts_mqc: true biotype_rrna_mqc: true biotype_attribute: "gene_biotype"
# RSeQC tool toggles and settingsbam_stat: enabled: trueinfer_experiment: enabled: true sample_size: 200000read_duplication: enabled: trueread_distribution: enabled: truejunction_annotation: enabled: true min_intron: 50junction_saturation: enabled: true min_intron: 50 min_coverage: 1 percentile_floor: 5 percentile_ceiling: 100 percentile_step: 5inner_distance: enabled: true sample_size: 1000000 lower_bound: -250 upper_bound: 250 step: 5
# TIN (Transcript Integrity Number)tin: enabled: true sample_size: 100 min_coverage: 10
# Gene body coverage / Qualimapgenebody_coverage: enabled: true
# Library complexity (preseq lc_extrap)preseq: enabled: true max_extrap: 10000000000 step_size: 1000000 n_bootstraps: 100 confidence_level: 0.95 seed: 1 max_terms: 100 defects: false
# Samtools-compatible outputsflagstat: enabled: trueidxstats: enabled: truesamtools_stats: enabled: trueChromosome name mapping
Section titled “Chromosome name mapping”When the chromosome names in your alignment file differ from those in the GTF, RustQC can remap them automatically. Two mechanisms are available, and they can be combined.
chromosome_prefix
Section titled “chromosome_prefix”A string prefix to prepend to alignment file chromosome names before matching against GTF names. Applied first, before explicit mapping lookups.
# Alignment has "1", "2", "X"; GTF has "chr1", "chr2", "chrX"chromosome_prefix: "chr"chromosome_mapping
Section titled “chromosome_mapping”An explicit mapping from GTF chromosome names (keys) to alignment file chromosome names (values). Applied after the prefix, so explicit entries can override the prefix for specific chromosomes.
# After adding "chr" prefix, override the mitochondrial chromosomechromosome_mapping: chrM: "MT"A common use case is GENCODE GTFs (which use chr1, chr2, …) with Ensembl
alignments (which use 1, 2, …):
chromosome_prefix: "chr"chromosome_mapping: chrM: "MT"dupRadar output toggles
Section titled “dupRadar output toggles”The dupradar: section controls which dupRadar output files are generated.
All outputs are enabled by default.
dupradar: dup_matrix: true # Duplication matrix TSV intercept_slope: true # Intercept/slope fit results density_scatter_plot: true # Density scatter plot (PNG + SVG) boxplot: true # Duplication rate boxplot (PNG + SVG) expression_histogram: true # Expression histogram (PNG + SVG) multiqc_intercept: true # MultiQC intercept/slope file multiqc_curve: true # MultiQC fitted curve fileSet any field to false to skip generating that output:
dupradar: boxplot: false expression_histogram: falsefeatureCounts output toggles
Section titled “featureCounts output toggles”The featurecounts: section controls which featureCounts-compatible output files
are generated, plus the biotype attribute setting.
featurecounts: counts_file: true # featureCounts-compatible counts TSV summary_file: true # Assignment summary file biotype_counts: true # Biotype counts TSV biotype_counts_mqc: true # Biotype counts MultiQC bargraph file biotype_rrna_mqc: true # Biotype rRNA percentage MultiQC file biotype_attribute: "gene_biotype" # GTF attribute for biotype groupingbiotype_attribute
Section titled “biotype_attribute”The GTF attribute name used for biotype grouping. This controls how genes are categorized in the biotype output files.
| GTF source | Typical attribute |
|---|---|
| Ensembl | gene_biotype |
| GENCODE | gene_type |
Default: "gene_biotype"
This can also be set via the --biotype-attribute CLI flag, which takes
precedence over the config file value.
RustQC auto-detects the biotype attribute if the specified one is not found in
the GTF. If neither gene_biotype nor gene_type is present, a warning is
printed and biotype counting is skipped.
RSeQC tool settings
Section titled “RSeQC tool settings”Each of the 7 RSeQC tools has an enabled toggle (default true) and
tool-specific parameter overrides. Disabling a tool here prevents it from
running even when annotation is provided. CLI flags take precedence over
config file values for all parameters.
bam_stat
Section titled “bam_stat”bam_stat: enabled: true # Set to false to skip bam_statNo additional parameters. This tool does not require annotation.
infer_experiment
Section titled “infer_experiment”infer_experiment: enabled: true sample_size: 200000 # Number of reads to sample (default: 200000)Requires annotation (--gtf or --bed). The sample_size can also be set via
--infer-experiment-sample-size.
read_duplication
Section titled “read_duplication”read_duplication: enabled: true # Set to false to skip read_duplicationNo additional parameters. This tool does not require annotation.
read_distribution
Section titled “read_distribution”read_distribution: enabled: true # Set to false to skip read_distributionNo additional parameters. Requires annotation (--gtf or --bed).
junction_annotation
Section titled “junction_annotation”junction_annotation: enabled: true min_intron: 50 # Minimum intron length in bases (default: 50)Requires annotation (--gtf or --bed). The min_intron can also be set via --min-intron.
junction_saturation
Section titled “junction_saturation”junction_saturation: enabled: true min_intron: 50 # Minimum intron length in bases (default: 50) min_coverage: 1 # Minimum read count to consider a junction (default: 1) percentile_floor: 5 # Sampling start percentage (default: 5) percentile_ceiling: 100 # Sampling end percentage (default: 100) percentile_step: 5 # Sampling step size (default: 5)Requires annotation (--gtf or --bed). These parameters can also be set via CLI flags:
--min-intron, --junction-saturation-min-coverage,
--junction-saturation-percentile-floor, --junction-saturation-percentile-ceiling,
--junction-saturation-percentile-step.
inner_distance
Section titled “inner_distance”inner_distance: enabled: true sample_size: 1000000 # Number of reads to sample (default: 1000000) lower_bound: -250 # Histogram lower bound (default: -250) upper_bound: 250 # Histogram upper bound (default: 250) step: 5 # Histogram bin width (default: 5)Requires annotation (--gtf or --bed). These parameters can also be set via CLI flags:
--inner-distance-sample-size, --inner-distance-lower-bound,
--inner-distance-upper-bound, --inner-distance-step.
tin: enabled: true sample_size: 100 # Equally-spaced positions to sample per transcript (default: 100) min_coverage: 10 # Minimum read-start count to compute TIN (default: 10)Requires annotation (--gtf or --bed). The TIN (Transcript Integrity Number)
measures transcript integrity via Shannon entropy of read coverage uniformity.
genebody_coverage
Section titled “genebody_coverage”genebody_coverage: enabled: true # Set to false to skip gene body coverageRequires annotation (--gtf only). Computes coverage profiles along transcript
bodies (100 percentile bins, 5’->3’) and produces Qualimap-compatible output.
preseq
Section titled “preseq”preseq: enabled: true max_extrap: 10000000000 # Maximum extrapolation depth (default: 1e10) step_size: 1000000 # Step size between extrapolation points (default: 1e6) n_bootstraps: 100 # Bootstrap replicates for confidence intervals (default: 100) confidence_level: 0.95 # CI confidence level (default: 0.95) seed: 1 # Random seed for reproducibility (default: 1) max_terms: 100 # Maximum terms in power series (default: 100) defects: false # Use defects model for problematic histograms (default: false)Runs in both GTF and BED modes (only needs BAM fragment info). The max_extrap,
step_size, and n_bootstraps can also be set via CLI flags (--preseq-max-extrap,
--preseq-step-size, --preseq-n-bootstraps). Use --skip-preseq to disable entirely.
flagstat / idxstats / samtools_stats
Section titled “flagstat / idxstats / samtools_stats”flagstat: enabled: true # samtools flagstat-compatible outputidxstats: enabled: true # samtools idxstats-compatible outputsamtools_stats: enabled: true # samtools stats SN-section compatible outputThese produce samtools-compatible output files in the samtools/ subdirectory.
They share the same BAM statistics accumulator as bam_stat — enabling any of
them ensures the statistics are collected.