Glossary
Short definitions of common bioinformatics terms. (A good habit: always map a term to a file type and a QC check.)
| Term | Meaning |
|---|---|
| Phred score | Log-scaled estimate of base-calling error probability: $Q=-10\log_{10}(p)$. |
| MAPQ | Mapping quality: confidence that a read is placed at the correct genomic locus. |
| CIGAR | Compact encoding of alignment operations (matches, insertions, deletions, clipping). |
| Duplicate | Reads likely originating from the same original molecule (PCR/optical); can bias variant calling. |
| Depth (coverage) | Number of reads supporting a genomic position; affects sensitivity and confidence. |
| Variant (SNP/indel) | Difference from reference: single-nucleotide polymorphism or insertion/deletion. |
| Ti/Tv | Transition/transversion ratio; plausibility metric for variant sets. |
| TPM | Transcripts per million; length-normalized expression estimate (useful for within-sample comparisons). |
| Counts | Raw read counts per feature; used for differential expression modeling. |
| FDR | False discovery rate; expected fraction of false positives among declared discoveries. |
| Alpha diversity | Within-sample diversity; often summarized by Shannon/Simpson indices. |
| Beta diversity | Between-sample differences; distances like Bray–Curtis, UniFrac. |
| Compositional data | Data that represent parts of a whole (sum constraint); requires special care in statistics. |
Practical “translate it to a check” examples
- Low MAPQ → inspect repeat regions, try stricter mapping/filtering.
- High duplicates → check library complexity; consider UMIs or deduplication.
- Unexpected GC peak → run contamination screen; validate sample sheet.
- Batch effect → include batch covariates; rerun with balanced design if possible.