Bioinformatics Tutorial

RNA-seq: Quantification and Differential Expression

RNA-seq asks: how much RNA is produced from each gene/transcript, and which change across conditions. Good experimental design (replicates!) matters more than any single tool choice.

Two common strategies
StrategyToolsNotes
Genome alignmentSTAR, HISAT2Great for QC, splice junctions, novel discovery
Transcript quantificationSalmon, KallistoFast, robust; relies on transcriptome + annotation

Basic workflow (conceptual)

  1. QC & trim
  2. Quantify transcripts/genes
  3. Normalize + model counts
  4. Differential expression
  5. Pathway and gene set interpretation

Example commands (illustrative)

# Salmon quant (transcriptome-based)
salmon index -t transcripts.fa -i tx_index
salmon quant -i tx_index -l A \
  -1 trimmed_R1.fastq.gz -2 trimmed_R2.fastq.gz \
  -p 8 -o sample_salmon
Statistical notes you can’t skip
  • Replicates: DE without biological replicates is usually not defensible.
  • Batch effects: sequencing run, prep date, and lane can dominate signal.
  • Multiple testing: use FDR (e.g., BH-adjusted p-values).
  • Compositionality: library size normalization is necessary, but not sufficient for all contexts.
Good sign

Samples cluster by biology, not by batch/lane.

Red flag

One sample is an outlier across many QC metrics.

Volcano plot (example)

Highlights genes with large effect size and strong statistical support.

TPM radar (example)

Useful as a quick visualization for a small marker set (not a general DE method).

Common QC plots for RNA-seq
  • Read distribution across features (exonic/intronic/intergenic)
  • 5′→3′ bias (library/fragmentation issues)
  • Gene body coverage and rRNA contamination
  • Sample-to-sample correlation heatmaps
  • PCA/UMAP of normalized expression