TopHat2 Tutorial: Splice-Aware Mapping for RNA-Seq

Last updated: March 5, 2026

📖 RNA-Seq Data Analysis Workflow — check it out for an overview.

Introduction

When quantifying gene expression levels using sequencing data obtained from RNA-Seq analysis, a mapping step is generally required. Mapping refers to the process of aligning read sequences (FASTQ files) to matching positions on a reference sequence. Commonly used mapping software for RNA-Seq includes HISAT2, STAR, Bowtie2. This page explains how to use TopHat2.

Note that TopHat2 is an older software tool, so in most cases it is recommended to use HISAT2 or STAR instead.

If you would like to understand the overall workflow of RNA-Seq analysis, please refer to the RNA-Seq analysis workflow overview.

Installation

To use TopHat2, a Bowtie2 index is required. If you install TopHat2 using Conda, Bowtie2 will be installed automatically.

$ conda install -c bioconda tophat

Let’s check the Bowtie2 help message.

$ bowtie2 -h

If the following output is displayed, the installation was successful.

bowtie2 -h Bowtie 2 version 2.4.1 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea) Usage: bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i> | -b <bam>} [-S <sam>] ...

Next, display the TopHat2 help message.

$ tophat

If the following output is shown, TopHat2 is ready to use.

tophat tophat: TopHat maps short sequences from spliced transcripts to whole genomes. Usage: tophat [options] <bowtie_index> <reads1[,reads2,...]> [reads1[,reads2,...]] ...

Index Construction (Build)

First, create an index of the reference genome using the following command. Here, Bowtie2 is used.

$ bowtie2-build -f genome.fa genome

genome.fa is the reference sequence you want to map reads to, provided as a FASTA file. Gzip-compressed files can also be used.

This command generates several files such as genome.1.bt2 through genome.4.bt2, as well as genome.rev.1.bt2 and genome.rev.2.bt2. Index files are required for fast sequence searching and must be created in advance for almost all mapping software, not just Bowtie2.

Mapping

Next, map the read sequences to the reference genome using TopHat2.

$ tophat -o output genome reads1.fastq.gz reads2.fastq.gz

As a result, a BAM file named accepted_hits.bam is generated in theoutput directory.

You can visualize the mapping results using a genome browser such as IGV, as shown below.

RNA-Seq Data Analysis Software

This is an RNA-Seq Data Analysis Software recommended for those who:

✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.

✔︎ Lacking time to learn RNA-Seq data analysis.

✔︎ Frustrated by the complexity of existing tools.

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.

About the Author

BxINFO LLC

A research support company specializing in bioinformatics.

We provide tools and information to support life science research, with a focus on RNA-Seq analysis.

→ Learn more

Recommended Pages