BWA Tutorial: Read Mapping for Genome Analysis
Introduction
BWA (Burrows-Wheeler Aligner) is a tool for mapping read sequences (FASTQ files) obtained from next-generation sequencers to a reference genome. BWA is widely used for DNA sequencing data mapping and is especially common in genome resequencing and ChIP-Seq analyses.
BWA includes several different algorithms:
- BWA-backtrack: An algorithm suited for Illumina reads of 100 bp or shorter.
- BWA-SW: Handles reads ranging from 70 bp to 1 Mbp, with support for long reads and split alignment.
- BWA-MEM: Also handles reads from 70 bp to 1 Mbp and supports long reads and split alignment like BWA-SW, but offers greater speed and accuracy. It even outperforms BWA-backtrack for 70-100 bp Illumina reads, making it the most recommended algorithm today.
This page primarily explains how to use BWA-MEM, the most widely used algorithm today.
For RNA-Seq mapping, splice-junction-aware tools such as HISAT2 and STAR are typically used. Because BWA does not account for splice junctions, it is mainly used for DNA sequencing data such as whole genome sequencing (WGS), exome sequencing, and ChIP-Seq.
Installation
BWA can be installed via Bioconda.
Let's display the help message to verify.
If you see output like the following, the installation was successful.
Building the Index
Before performing mapping, you need to build an index of the reference sequence.
genome.fa is the FASTA file of the reference sequence you want to map reads to.
This operation produces five files: genome.fa.amb, genome.fa.ann, genome.fa.bwt, genome.fa.pac, and genome.fa.sa. Index files are essential for fast string searching and must be built in advance for virtually all mapping tools, not just BWA.
Mapping (BWA-MEM)
Run the following command to map paired-end reads.
The -t option specifies the number of threads. This produces a SAM file.
For single-end reads, specify only one FASTQ file.
Since it is convenient to convert the output SAM file to a BAM file and sort it, run the following commands.
The index created by samtools index is needed for viewing data in genome browsers like IGV and is also required by many downstream analysis tools.
About BWA-MEM2
BWA-MEM2 is the successor to BWA-MEM and performs mapping significantly faster. Its usage is nearly identical to BWA-MEM.
It can be installed via Bioconda.
Build the index and perform mapping as follows.
The results are identical to BWA-MEM, but processing speed is substantially improved. Consider using BWA-MEM2 especially when working with large-scale datasets.
RNA-Seq Data Analysis Software
This is an RNA-Seq Data Analysis Software recommended for those who:
✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.
✔︎ Lacking time to learn RNA-Seq data analysis.
✔︎ Frustrated by the complexity of existing tools.
Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.
About the Author
BxINFO LLC
A research support company specializing in bioinformatics.
We provide tools and information to support life science research, with a focus on RNA-Seq analysis.