RNA-seq: Quantification and Differential Expression
RNA-seq asks: how much RNA is produced from each gene/transcript, and which change across conditions. Good experimental design (replicates!) matters more than any single tool choice.
Two common strategies
| Strategy | Tools | Notes |
|---|---|---|
| Genome alignment | STAR, HISAT2 | Great for QC, splice junctions, novel discovery |
| Transcript quantification | Salmon, Kallisto | Fast, robust; relies on transcriptome + annotation |
Basic workflow (conceptual)
- QC & trim
- Quantify transcripts/genes
- Normalize + model counts
- Differential expression
- Pathway and gene set interpretation
Example commands (illustrative)
# Salmon quant (transcriptome-based)
salmon index -t transcripts.fa -i tx_index
salmon quant -i tx_index -l A \
-1 trimmed_R1.fastq.gz -2 trimmed_R2.fastq.gz \
-p 8 -o sample_salmon
Statistical notes you can’t skip
- Replicates: DE without biological replicates is usually not defensible.
- Batch effects: sequencing run, prep date, and lane can dominate signal.
- Multiple testing: use FDR (e.g., BH-adjusted p-values).
- Compositionality: library size normalization is necessary, but not sufficient for all contexts.
Good sign
Samples cluster by biology, not by batch/lane.
Red flag
One sample is an outlier across many QC metrics.
Volcano plot (example)
Highlights genes with large effect size and strong statistical support.
TPM radar (example)
Useful as a quick visualization for a small marker set (not a general DE method).
Common QC plots for RNA-seq
- Read distribution across features (exonic/intronic/intergenic)
- 5′→3′ bias (library/fragmentation issues)
- Gene body coverage and rRNA contamination
- Sample-to-sample correlation heatmaps
- PCA/UMAP of normalized expression