QC is not a beauty contest for plots. The real question is whether your data are trustworthy enough for the biological task ahead, and if not, which intervention is justified. Good QC combines plot reading, sample context, and a bias toward minimal but effective preprocessing.
What to inspect first
Total yield and read count per sample or lane
Per-base quality decay across cycles
Adapter or primer contamination signatures
GC distribution and unusual secondary peaks
Overrepresented sequences and duplication patterns
Common tools
fastqc and multiqc for standardized reports
cutadapt, fastp, and trimmomatic for trimming and filtering
FastQLab for quick on-device FASTQ inspection and educational review when a workstation is not available
Phred scores compress error probabilities into an additive scale:
Q = -10 * log10(p_error)
That means Q30 is not a little better than Q20-it is ten times lower error probability. Still, trimming solely to maximize average Q can destroy useful data. Think in terms of downstream risk, not cosmetic improvement.
Usually safe
Trim adapters decisively because they create artificial sequence content.
Use judgment
Quality trimming should be conservative when read length is already precious.
Interactive FastQC-style report reader
Click through the modules below. The goal is to learn what each panel means, what common failure patterns look like, and when the correct answer is "investigate more" instead of trimming blindly.
QC report navigator
Per-base quality
Adapter content
GC content
Duplication levels
Overrepresented sequences
Per-base quality
Look for where median quality falls and whether the lower tail becomes dangerously broad near the end of reads.
Adapter content
Late-cycle increase usually means inserts were shorter than the read length, so sequencing reads into adapter sequence.
GC content
A single clean peak is not always required, but unexpected extra peaks often justify contamination checks or metadata review.
Duplication levels
High duplication can reflect targeted sequencing, over-amplification, or low-complexity libraries. Interpretation depends on context.
Overrepresented reads
These may be adapters, primers, contamination, or genuinely abundant biological reads. Sequence identity matters.
How to read per-base quality
A gentle end-of-read decline is normal. What matters is whether the low tail becomes severe enough to disrupt mapping, overlap merging, or variant confidence.
If the drop is late and modest, trimming may be minimal or unnecessary.
If the whole run looks weak, trimming will not rescue a fundamentally poor library.
Adapter content is often more actionable than average quality
Adapter sequence introduces false bases. Removing it is usually high value because it directly improves alignment and feature assignment.
Rising late-cycle adapter content suggests inserts shorter than read length.
Choose a minimum length so heavily trimmed fragments do not become misleading noise.
GC content should be interpreted relative to experiment type
Amplicon, capture, metagenomic, and RNA-seq libraries naturally have different GC behavior. The question is whether the observed profile matches the expected biology and protocol.
Unexpected bimodality often suggests mixed organisms or contamination.
Highly shifted distributions can also reflect targeted enrichment bias.
Duplication is a context-dependent warning, not a universal failure
Whole-genome sequencing, RNA-seq, targeted sequencing, and amplicon data have different expectations. Use duplication together with library complexity and biological context.
Extremely high duplication in unbiased libraries often means low input or over-PCR.
In targeted assays, duplication may reflect true enrichment rather than failure.
Overrepresented sequences are clues
Always identify what the sequences are before deciding what to do. Some are technical artifacts; others may be real biology such as rRNA or highly abundant transcripts.
Match overrepresented sequences to adapter or primer databases when possible.
Cross-check with experiment type so you do not trim away expected signal.
Per-base quality summary (example)
Median, upper, and lower quantiles reveal when only a subset of reads is deteriorating versus when the whole run degrades.
Adapter signal across cycles (example)
A late-cycle surge is typical when fragments are shorter than the read length and sequencing enters adapter sequence.
Duplication profile (example)
This view helps separate healthy complexity from libraries dominated by repeated molecules.
Decision matrix
Observation
Likely cause
Good next step
Quality collapses only at the tail
Typical end-of-read decay
Consider light trimming or leave untouched if downstream tools tolerate it
Strong adapter rise late in reads
Short inserts
Trim adapters and confirm post-trim length distribution
Unexpected GC peak
Contamination or mixed library composition
Screen contamination and revisit sample metadata
Very high duplication in unbiased library
Low complexity / over-amplification
Assess whether re-sequencing or better library prep is needed
Same overrepresented sequence across many samples
Technical contaminant
Map the sequence identity before trimming or discarding
When re-sequencing is more honest than heroic trimming
Systemic run failure
If quality is poor from the beginning or oscillates in unusual patterns, no amount of trimming will create trustworthy data.
Library complexity collapse
If most reads are duplicates, your effective unique information is far smaller than the total read count suggests.
Severe contamination
If contaminant signatures dominate large fractions of the library, downstream interpretation may become more misleading than useful.