RNA-seq: Quantification and Differential Expression

RNA-seq asks: how much RNA is produced from each gene/transcript, and which change across conditions. Good experimental design (replicates!) matters more than any single tool choice.

Two common strategies

Strategy	Tools	Notes
Genome alignment	`STAR`, `HISAT2`	Great for QC, splice junctions, novel discovery
Transcript quantification	`Salmon`, `Kallisto`	Fast, robust; relies on transcriptome + annotation

Basic workflow (conceptual)

QC & trim
Quantify transcripts/genes
Normalize + model counts
Differential expression
Pathway and gene set interpretation

Example commands (illustrative)

# Salmon quant (transcriptome-based)
salmon index -t transcripts.fa -i tx_index
salmon quant -i tx_index -l A \
  -1 trimmed_R1.fastq.gz -2 trimmed_R2.fastq.gz \
  -p 8 -o sample_salmon

Statistical notes you can’t skip

Replicates: DE without biological replicates is usually not defensible.
Batch effects: sequencing run, prep date, and lane can dominate signal.
Multiple testing: use FDR (e.g., BH-adjusted p-values).
Compositionality: library size normalization is necessary, but not sufficient for all contexts.

Good sign

Samples cluster by biology, not by batch/lane.

Red flag

One sample is an outlier across many QC metrics.

Volcano plot (example)

Highlights genes with large effect size and strong statistical support.

TPM radar (example)

Useful as a quick visualization for a small marker set (not a general DE method).

Common QC plots for RNA-seq

Read distribution across features (exonic/intronic/intergenic)
5′→3′ bias (library/fragmentation issues)
Gene body coverage and rRNA contamination
Sample-to-sample correlation heatmaps
PCA/UMAP of normalized expression