BWA Tutorial: Read Mapping for Genome Analysis
Introduction
BWA (Burrows-Wheeler Aligner) is a software tool for mapping read sequences (FASTQ files) obtained from next-generation sequencers to a reference genome. BWA is widely used for mapping DNA sequencing data, and is particularly common in genome resequencing and ChIP-Seq analysis.
BWA includes several algorithms:
- BWA-backtrack: An algorithm suitable for Illumina reads of 100 bp or shorter.
- BWA-SW: Supports reads from 70 bp to 1 Mbp in length, with support for long reads and split alignment.
- BWA-MEM: Supports reads from 70 bp to 1 Mbp in length, and like BWA-SW, supports long reads and split alignment, but is faster and more accurate. It also outperforms BWA-backtrack for 70-100 bp Illumina reads, making it the most recommended algorithm today.
This page focuses on how to use BWA-MEM, the most commonly used algorithm.
For mapping in RNA-Seq, splice-junction-aware tools such as HISAT2 and STAR are commonly used. Since BWA does not account for splice junctions, it is primarily used for mapping DNA sequencing data such as whole genome sequencing (WGS), exome sequencing, and ChIP-Seq.
Installation
You can install BWA via Bioconda.
Let's display the help message.
If the following output is displayed, the installation was successful.
Building the Index
Before mapping, you need to build an index of the reference sequence.
genome.fa is the FASTA file of the reference sequence you want to map to.
This operation creates five files: genome.fa.amb, genome.fa.ann, genome.fa.bwt, genome.fa.pac, and genome.fa.sa. Index files are necessary for fast string searching, and pre-building them is required for virtually all mapping software, not just BWA.
Mapping (BWA-MEM)
Use the following command to map paired-end reads.
The -t option specifies the number of threads. This operation outputs a SAM file.
For single-end reads, specify only one FASTQ file.
It is convenient to convert the output SAM file to a BAM file and sort it, so run the following commands.
The index created by samtools index is required for viewing in genome browsers such as IGV and for many downstream analysis tools.
About BWA-MEM2
BWA-MEM2 is the successor to BWA-MEM and can perform mapping faster than BWA-MEM. The usage is nearly identical to BWA-MEM.
You can install it via Bioconda.
Build the index and run mapping as follows.
It produces the same results as BWA-MEM, but with significantly improved processing speed. Consider using BWA-MEM2 especially when working with large-scale data.
RNA-Seq Data Analysis Software
This is an RNA-Seq Data Analysis Software recommended for those who:
✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.
✔︎ Lacking time to learn RNA-Seq data analysis.
✔︎ Frustrated by the complexity of existing tools.
Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.
About the Author
BxINFO LLC
A research support company specializing in bioinformatics.
We provide tools and information to support life science research, with a focus on RNA-Seq analysis.