Practical • Command-line first
Bioinformatics, from raw reads to results
A modern, English tutorial site covering common NGS workflows: QC, alignment, variant calling, RNA-seq, metagenomics, and phylogenetics. Every lesson focuses on what the files mean, what the tools do, and how to sanity-check outputs.
14+
interactive charts
Mental model
Most pipelines repeat the same pattern:
- Validate inputs (format, metadata)
- QC and trim (clean the data)
- Map or assemble
- Call features (variants/genes)
- Interpret with statistics
Key file types
| Type | Use |
|---|---|
FASTQ | raw reads + quality |
BAM/CRAM | aligned reads |
VCF | variants |
GTF | gene models |
What you’ll learn
- How Phred scores relate to error rates
- How mapping quality differs from base quality
- Why duplicates happen and when to care
- How to read a VCF like a pro
- How to avoid common statistical traps
Read length distribution (example)
Trimmed reads often show a peak near the planned read length with a tail of shorter reads.
GC content distribution (example)
Strong deviations from the expected GC profile can indicate contamination or biased libraries.
Where time goes in a typical DNA pipeline
Alignment is often the dominant cost. Profiling helps you choose the best speed/accuracy tradeoff.
Quick start: “hello world” commands
These are safe sanity checks that do not modify files.
# Inspect FASTQ header patterns
zcat sample_R1.fastq.gz | head
# Count reads (FASTQ has 4 lines/read)
expr $(zcat sample_R1.fastq.gz | wc -l) / 4
# Check reference index presence (varies by aligner)
ls -1 reference.*