STAR Aligner Tutorial: Fast RNA-Seq Read Mapping

Last updated: February 2, 2026

Introduction

When quantifying gene expression levels using sequencing data obtained from RNA-Seq analysis, a mapping step is generally required. Mapping refers to the process of aligning read sequences (FASTQ files) to matching positions on a reference sequence. Commonly used mapping software for RNA-Seq includes HISAT2, STAR, Bowtie2. This page explains how to use STAR.

If you would like to understand the overall workflow of RNA-Seq analysis, please refer to the RNA-Seq analysis workflow overview.

Installation

According to the official STAR documentation, at least 16 GB of memory (ideally 32 GB) is required to handle mammalian genomes, so please be aware of this requirement.

You can install STAR via Bioconda.

$ conda install -c bioconda star

Let’s check the help message.

$ star --help

If the following output is displayed, the installation was successful.

Usage: STAR [options]... --genomeDir /path/to/genome/index/ --readFilesIn R1.fq R2.fq Spliced Transcripts Alignment to a Reference (c) Alexander Dobin, 2009-2020 STAR version=2.7.10a STAR compilation time,server,dir= :/Users/travis/build/alexdobin/travis-tests/STARcompile/source For more details see: <https://github.com/alexdobin/STAR> <https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf> ### versions versionGenome 2.7.4a string: earliest genome index version compatible with this STAR release. Please do not change this value! ### Parameter Files parametersFiles - string: name of a user-defined parameters file, "-": none. Can only be defined on the command line. ...

Index Construction (Build)

Create the index using the following command.

$ mkdir genome $ star --runThreadN 4 --runMode genomeGenerate --genomeDir genome --genomeFastaFiles genome.fa --sjdbGTFfile annotation.gtf

The reference FASTA file is specified with --genomeFastaFiles, and the annotation (GTF file) is specified with --sjdbGTFfile. The index files will be created in the genome directory.

Note that when running this command on the human genome in an environment with 16 GB of memory, the process failed during execution. In my environment, I was able to successfully generate the index by using the --limitGenomeGenerateRAM and--genomeSAsparseD options.

Index files are required for fast sequence searching and must be created in advance for almost all mapping software, not just STAR.

Mapping

Run the following command to perform the mapping.

$ star --runThreadN 4 --genomeDir genome --readFilesIn read_1.fastq.gz read_2.fastq.gz --readFilesCommand gunzip -c --outSAMtype BAM SortedByCoordinate --outFileNamePrefix sample1

The mapping results were generated.

By adding the --outSAMtype BAM SortedByCoordinate option, a sorted BAM file will be output.

When visualized using a genome browser such as IGV, the mapped reads can be observed as shown below.

RNA-Seq Data Analysis Software

This is an RNA-Seq Data Analysis Software recommended for those who:

✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.

✔︎ Lacking time to learn RNA-Seq data analysis.

✔︎ Frustrated by the complexity of existing tools.

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.

About the Author

BxINFO LLC

A research support company specializing in bioinformatics.

We provide tools and information to support life science research, with a focus on RNA-Seq analysis.

→ Learn more

Recommended Pages