Bioinformatics Tutorial

Bioinformatics Tools Reference

A curated catalog of commonly used tools organized by workflow step. Each entry links to the official source so you can evaluate and install what fits your pipeline. Some tools listed here are developed by the same team behind this tutorial site.

Quality Control & Preprocessing

QC is the first step in nearly every pipeline. These tools help you assess raw read quality, detect adapter contamination, and trim/filter reads before downstream analysis.

ToolPlatformDescription
FastQC Desktop / CLI Widely-used QC report generator for FASTQ files โ€” provides per-base quality, GC content, adapter content, and overrepresented sequence analysis.
MultiQC CLI Aggregates results from many tools (FastQC, STAR, Salmon, etc.) into a single interactive HTML report. Essential for multi-sample projects.
fastp CLI All-in-one preprocessor: quality trimming, adapter removal, filtering, and QC reporting in one pass. Very fast (multi-threaded C++).
Cutadapt CLI Flexible adapter and quality trimming tool with a rich filtering API. Great for non-standard adapter schemes.
Trimmomatic CLI (Java) Illumina-specific trimming with sliding-window quality filtering. Mature and widely cited.
FastQLab
Android ยท iOS
Mobile (Android / iOS) On-device FASTQ quality control for quick field or classroom use. Runs entirely on-device with no server needed โ€” useful when you want to preview QC metrics on sequencing data without setting up a full workstation.
Read Alignment

Aligners map sequencing reads to a reference genome or transcriptome. The choice depends on your experiment type (DNA vs RNA), speed requirements, and memory constraints.

ToolTypeDescription
BWA-MEM2 DNA short-read The de-facto standard for whole-genome and exome alignment. Fast, accurate, and widely benchmarked.
Minimap2 Long-read / DNA / RNA Versatile aligner for PacBio, ONT, and short reads. Also supports splice-aware RNA mapping.
STAR RNA-seq (splice-aware) High-throughput splice-aware aligner for RNA-seq. Very fast but requires significant RAM (โ‰ฅ30 GB for human).
HISAT2 RNA-seq / DNA Memory-efficient splice-aware aligner using a graph FM index. Good balance of speed and RAM for RNA-seq.
Bowtie2 DNA short-read Fast, memory-efficient aligner. Common for ChIP-seq, ATAC-seq, and other assays where reads are short.
JapalitySplice
GitHub ยท Android ยท iOS
RNA-seq (splice-aware) A sparse k-mer splice-aware RNA-seq aligner written in Rust, designed for resource-constrained environments. Supports annotation-guided junction detection and runs on laptops or even mobile devices. Particularly suitable for teaching, small genomes (yeast, Arabidopsis), and edge computing scenarios. Also available as a mobile app with on-device deep-learning splice site prediction for multiple model organisms.
Transcript Quantification

Pseudoalignment and lightweight mapping approaches that quantify transcript abundance without full genome alignment.

ToolDescription
Salmon Fast transcript quantification using selective alignment. Widely used for bulk RNA-seq DE analysis via tximeta/DESeq2.
Kallisto Near-optimal pseudoalignment for RNA-seq quantification. Extremely fast with a small memory footprint.
Variant Calling

Tools for identifying SNPs, indels, and structural variants from aligned reads.

ToolDescription
GATK Industry-standard variant calling toolkit (HaplotypeCaller, Mutect2, VQSR). Comprehensive but complex.
BCFtools Lightweight variant caller and VCF manipulation suite. Great for quick analyses and scripting.
DeepVariant CNN-based variant caller from Google. High accuracy, especially on whole-genome short-read data.
SAM/BAM Utilities

Essential utilities for manipulating, filtering, and summarizing alignment files.

ToolDescription
Samtools The standard tool for sorting, indexing, filtering, and inspecting SAM/BAM/CRAM files.
Picard Java-based BAM utilities for deduplication, metrics collection, and format conversion.
BEDTools Genomic interval arithmetic โ€” intersect, merge, subtract intervals across BED/BAM/VCF files.
Differential Expression
ToolDescription
DESeq2 R/Bioconductor package for count-based DE using negative binomial models.
edgeR Alternative DE package using empirical Bayes estimation.
limma-voom Microarray-heritage DE with a voom transformation for RNA-seq count data.
Metagenomics & Taxonomy
ToolDescription
Kraken2 Ultra-fast k-mer-based taxonomic classification for metagenomic reads.
MetaPhlAn Marker-gene based metagenomic profiling at species level.
QIIME 2 Comprehensive microbiome analysis platform for amplicon and shotgun data.
Workflow Managers & Reproducibility

As pipelines grow in complexity, workflow managers help you orchestrate tools, handle dependencies, and ensure reproducibility.

ToolDescription
Nextflow DSL-based workflow manager with excellent container and cloud support. Powers nf-core community pipelines.
Snakemake Python-based workflow engine with automatic dependency resolution and cluster support.
CWL Common Workflow Language โ€” a platform-agnostic specification for describing analysis workflows.
Mobile & On-Device Bioinformatics

A growing category of tools that run directly on phones, tablets, or laptops โ€” no server infrastructure required. Useful for teaching, fieldwork, quick previews, and environments with limited connectivity.

ToolPlatformDescription
FastQLab
Android ยท iOS
Android / iOS Perform FASTQ quality control directly on your phone or tablet. Provides per-base quality, GC content, and read-length distribution charts with zero cloud dependency. Free.
JapalitySplice
GitHub (CLI) ยท Android ยท iOS
CLI (Rust) / Android / iOS Splice-aware RNA-seq aligner with sparse k-mer indexing, built for resource-constrained devices. The mobile version adds deep-learning splice site prediction for multiple model organisms (human, Drosophila, C. elegans, yeast). Open-source CLI under GPLv3.

Note: FastQLab and JapalitySplice are developed by Japality Limited, the same team that maintains this tutorial site.

How to choose a tool
Consider
  • Community & citations โ€” widely used tools have more documentation and community support.
  • Hardware constraints โ€” STAR needs โ‰ฅ30 GB RAM for human; lighter tools exist for laptops/mobile.
  • Experiment type โ€” DNA vs RNA vs metagenomics often dictates the pipeline.
  • Reproducibility โ€” prefer tools with version-pinnable installs (Conda, containers).
Keep in mind
  • No single tool is best for every dataset and question.
  • Benchmarks on one species may not transfer to another.
  • Mobile/lightweight tools are great for teaching and previews, but production-scale work usually runs on servers.
  • Always validate outputs with independent QC checks.