RNA-seq: quantification, design, and differential expression
RNA-seq is not just about finding a list of significant genes. The quality of the biological design, the consistency of the library preparation, and the clarity of the model formula usually matter more than the choice of one aligner or one plotting style.
Two common analysis strategies
Strategy
Typical tools
Why choose it
Genome alignment
STAR, HISAT2
Excellent for splice junctions, coverage QC, and novel feature discovery
Transcript quantification
Salmon, Kallisto
Fast, efficient, and robust when a good transcriptome reference exists
Click a stage to see the main question it answers and the most common way learners get misled.
Design decides what you are allowed to conclude
Before thinking about software, define your comparison: treatment vs control, paired vs unpaired, donor effects, time points, and replicates. A weak design cannot be repaired statistically afterward.
Include replicates whenever you want differential expression claims.
Write the experimental factors explicitly before you run the pipeline.
Quantification converts reads into comparable summaries
Gene-level counts are commonly used for DE, while transcript-level abundance can be useful for isoform questions. Choose a summary that matches the question, not just the easiest tool.
Counts are typically preferred for DE models like DESeq2 or edgeR.
TPM is useful for within-sample expression profiles, not as a drop-in replacement for DE counts.
Different libraries have different depths and composition. Normalization tries to make expression more comparable without erasing true biology.
Large composition shifts can distort naive interpretations of raw counts.
Always inspect sample-level plots after normalization.
Batch structure can overwhelm biology
If samples cluster by run date or operator rather than condition, the model and interpretation must account for that. Ignoring batch effects can create elegant but false results.
PCA is often the fastest way to see hidden sample structure.
Do not blindly remove variation you do not understand.
Differential expression is the start of interpretation, not the end
A volcano plot is only an overview. The real work is understanding whether the changing genes fit the biology, known pathways, cell composition shifts, or technical confounding.
Read effect size and FDR together.
Validate key genes with annotation and biological knowledge.
Volcano plot (example)
Genes with large effect size and strong statistical support appear in the upper corners, but biology still decides which of them matter.
Marker overview (example)
A small marker set can be visualized quickly, but this is a communication aid, not a substitute for proper modeling.
PCA-style sample structure (example)
The safest RNA-seq projects show samples separating primarily by biology, not by library date or lane.
MA plot (example)
MA plots help you see whether expression changes are balanced, intensity-dependent, or dominated by low-count noise.
Design checklist before you trust a DE result
Replicates and metadata
Every sample should have condition labels, batch information, and any paired/blocked structure clearly recorded.
QC consistency
Outlier samples with bad QC can dominate DE results if left in without explanation.
Interpretation guardrails
Significant genes should be checked for annotation quality, known pathways, and plausibility within the experiment.