BWA Tutorial: Read Mapping for Genome Analysis

Last updated: March 13, 2026

Introduction

BWA (Burrows-Wheeler Aligner) is a tool for mapping read sequences (FASTQ files) obtained from next-generation sequencers to a reference genome. BWA is widely used for DNA sequencing data mapping and is especially common in genome resequencing and ChIP-Seq analyses.

BWA includes several different algorithms:

BWA-backtrack: An algorithm suited for Illumina reads of 100 bp or shorter.
BWA-SW: Handles reads ranging from 70 bp to 1 Mbp, with support for long reads and split alignment.
BWA-MEM: Also handles reads from 70 bp to 1 Mbp and supports long reads and split alignment like BWA-SW, but offers greater speed and accuracy. It even outperforms BWA-backtrack for 70-100 bp Illumina reads, making it the most recommended algorithm today.

This page primarily explains how to use BWA-MEM, the most widely used algorithm today.

For RNA-Seq mapping, splice-junction-aware tools such as HISAT2 and STAR are typically used. Because BWA does not account for splice junctions, it is mainly used for DNA sequencing data such as whole genome sequencing (WGS), exome sequencing, and ChIP-Seq.

Installation

BWA can be installed via Bioconda.

$ conda install -c bioconda bwa

Let's display the help message to verify.

$ bwa

If you see output like the following, the installation was successful.

Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.7.17-r1188 Contact: Heng Li <lh3@sanger.ac.uk> Usage: bwa <command> [options] Command: index index sequences in the FASTA format mem BWA-MEM algorithm fastmap identify super-maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA-SW for long queries shm manage indices in shared memory fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and Occ Note: To use BWA, you need to first index the genome with `bwa index`. There are three alignment algorithms in BWA: `mem`, `bwasw`, and `aln/samse/sampe`. If you are not sure which to use, try `bwa mem` first. Please `man ./bwa.1` for the manual.

Building the Index

Before performing mapping, you need to build an index of the reference sequence.

$ bwa index genome.fa

genome.fa is the FASTA file of the reference sequence you want to map reads to.

This operation produces five files: genome.fa.amb, genome.fa.ann, genome.fa.bwt, genome.fa.pac, and genome.fa.sa. Index files are essential for fast string searching and must be built in advance for virtually all mapping tools, not just BWA.

Mapping (BWA-MEM)

Run the following command to map paired-end reads.

$ bwa mem -t 4 genome.fa reads1.fastq.gz reads2.fastq.gz > output.sam

The -t option specifies the number of threads. This produces a SAM file.

For single-end reads, specify only one FASTQ file.

$ bwa mem -t 4 genome.fa reads.fastq.gz > output.sam

Since it is convenient to convert the output SAM file to a BAM file and sort it, run the following commands.

$ samtools view -bS output.sam > output.bam $ samtools sort output.bam > output.sorted.bam $ samtools index output.sorted.bam

The index created by samtools index is needed for viewing data in genome browsers like IGV and is also required by many downstream analysis tools.

About BWA-MEM2

BWA-MEM2 is the successor to BWA-MEM and performs mapping significantly faster. Its usage is nearly identical to BWA-MEM.

It can be installed via Bioconda.

$ conda install -c bioconda bwa-mem2

Build the index and perform mapping as follows.

$ bwa-mem2 index genome.fa $ bwa-mem2 mem -t 4 genome.fa reads1.fastq.gz reads2.fastq.gz > output.sam

The results are identical to BWA-MEM, but processing speed is substantially improved. Consider using BWA-MEM2 especially when working with large-scale datasets.

RNA-Seq Data Analysis Software

This is an RNA-Seq Data Analysis Software recommended for those who:

✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.

✔︎ Lacking time to learn RNA-Seq data analysis.

✔︎ Frustrated by the complexity of existing tools.

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.

About the Author

BxINFO LLC

A research support company specializing in bioinformatics.

We provide tools and information to support life science research, with a focus on RNA-Seq analysis.

→ Learn more

Recommended Pages