STAR Aligner Tutorial: Fast RNA-Seq Read Mapping
📖 RNA-Seq Data Analysis Workflow — check it out for an overview.
Introduction
Quantifying gene expression from RNA-Seq sequencing data typically requires a mapping step. Mapping is the process of aligning read sequences (FASTQ files) to their corresponding positions on a reference sequence. Popular mapping tools for RNA-Seq include HISAT2, STAR, and Bowtie2. This page walks through how to use STAR.
For an overview of the entire RNA-Seq data analysis workflow, see the RNA-Seq analysis workflow guide.
Installation
According to the official documentation, STAR requires at least 16 GB of RAM (ideally 32 GB) to handle mammalian genomes, so keep this in mind before getting started.
You can install STAR through Bioconda:
Verify the installation by displaying the help message:
If you see output similar to the following, STAR has been installed successfully.
Index Construction (Build)
Generate the genome index with the following command:
The reference FASTA file is passed via --genomeFastaFiles, and the gene annotation (GTF file) is passed via --sjdbGTFfile. The resulting index files will be written to the genome directory.
Note that when building the index for the human genome on a machine with only 16 GB of RAM, the command above may fail partway through. In that case, adding the --limitGenomeGenerateRAM and --genomeSAsparseD options can help the process complete successfully.
Index files enable fast sequence lookup and must be generated ahead of time for virtually all mapping tools, not just STAR.
Mapping
Run the alignment with the following command:
This produces the mapping results.
By specifying the --outSAMtype BAM SortedByCoordinate option, STAR directly outputs a coordinate-sorted BAM file, saving you an extra sorting step.
You can inspect the results in a genome browser such as IGV. The aligned reads will appear as shown below.
RNA-Seq Data Analysis Software
This is an RNA-Seq Data Analysis Software recommended for those who:
✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.
✔︎ Lacking time to learn RNA-Seq data analysis.
✔︎ Frustrated by the complexity of existing tools.
Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.
About the Author
BxINFO LLC
A research support company specializing in bioinformatics.
We provide tools and information to support life science research, with a focus on RNA-Seq analysis.