>
How to Use StringTie: Gene Expression Quantification in RNA-Seq Analysis

How to Use StringTie: Gene Expression Quantification in RNA-Seq Analysis

Last updated: February 1, 2026

Introduction

When performing RNA-Seq analysis using next-generation sequencing, you obtain raw data called FASTQ files. After mapping the reads to a reference genome, gene expression levels are quantified by counting the reads mapped to each gene.

This page explains how to use StringTie, a software tool for identifying novel isoforms and estimating expression levels for each isoform from RNA-Seq alignments.

Installation

Precompiled binaries are available here. Download the appropriate binary for your environment. (Example: using StringTie v2.2.1 on macOS)

$ wget http://ccb.jhu.edu/software/stringtie/dl/stringtie-2.2.1.OSX_x86_64.tar.gz $ tar -zxvf stringtie-2.2.1.OSX_x86_64.tar.gz

Check the installation by printing the help message:

$ stringtie -h

If you see output like the following, the installation succeeded:

StringTie v2.2.1 usage: stringtie <in.bam ..> [-G <guide_gff>] [-l <prefix>] [-o <out.gtf>] [-p <cpus>] [-v] [-a <min_anchor_len>] [-m <min_len>] [-j <min_anchor_cov>] [-f <min_iso>] [-c <min_bundle_cov>] [-g <bdist>] [-u] [-L] [-e] [--viral] [-E <err_margin>] [--ptf <f_tab>] [-x <seqid,..>] [-A <gene_abund.out>] [-h] {-B|-b <dir_path>} [--mix] [--conservative] [--rf] [--fr] Assemble RNA-Seq alignments into potential transcripts. Options: ...

Identification of Novel Isoforms

Use the following commands to identify novel isoforms and estimate isoform-level expression. First, run StringTie for each sample (sample1, sample2, sample3, and sample4):

$ stringtie sample1.bam -G annotation.gtf -o sample1.gtf $ stringtie sample2.bam -G annotation.gtf -o sample2.gtf $ stringtie sample3.bam -G annotation.gtf -o sample3.gtf $ stringtie sample4.bam -G annotation.gtf -o sample4.gtf

The -G option provides the reference annotation as a guide during isoform assembly.

You should obtain output files such as sample1.gtf to sample4.gtf.

The output GTF file has the following columns:

ColumnDescription
1stChromosome name
2ndAlways contains "StringTie"
3rdFeature type (exon, transcript, mRNA, 5'UTR, etc.)
4thFeature start position (1-based index)
5thFeature end position (1-based index)
6thAlways contains 1000
7thStrand direction
8thAlways contains "."
9thAdditional attributes separated by semicolons

The attributes in the 9th column include the following (separated by semicolons ";"):

NameDescription
gene_idGene ID
transcript_idTranscript ID
exon_numberExon order within the transcript.
reference_idTranscript ID in the reference annotation
ref_gene_idGene ID in the reference annotation
ref_gene_nameGene name in the reference annotation
covCoverage per base
FPKMFPKM value
TPMTPM value

Merging

Although you obtained results for each sample, isoforms may differ between samples, making it difficult to compare expression across samples. To address this, merge all sample annotations as follows:

$ stringtie --merge -G annotation.gtf -o merged.gtf sample1.gtf sample2.gtf sample3.gtf sample4.gtf

This produces a merged annotation file named merged.gtf.

Estimating Isoform-Level Expression

Finally, estimate isoform-level expression based on the merged annotation file:

$ stringtie sample1.bam -G merged.gtf -o result/sample1/sample1.gtf -e -B $ stringtie sample2.bam -G merged.gtf -o result/sample2/sample2.gtf -e -B $ stringtie sample3.bam -G merged.gtf -o result/sample3/sample3.gtf -e -B $ stringtie sample4.bam -G merged.gtf -o result/sample4/sample4.gtf -e -B

The -e option disables novel isoform assembly and quantifies only the isoforms present in merged.gtf. The -B option outputs files (*.ctab) for Ballgown.

Preparing Files for DESeq2 / edgeR

Although Ballgown-compatible files are generated by the steps above, those files cannot be directly used for differential expression tools such as DESeq2 or edgeR.

To enable analysis in DESeq2/edgeR, create count matrix CSV files that combine results across all samples.

For this, use prepDE.py3. prepDE.py3 is a Python 3 compatible version of prepDE.py, and using prepDE.py will produce the same results.

In this example, the read length is 150 bp, so we specify -l 150:

$ prepDE.py3 -i result -l 150

This generates two files: gene_count_matrix.csv and transcript_count_matrix.csv.

Count matrix files for DESeq2/edgeR

As discussed in https://github.com/gpertea/stringtie/issues/126, for paired-end reads, the summed read counts may not match the actual number of reads. However, it seems this output is typically used as-is.

RNA-Seq Data Analysis Software

This is an RNA-Seq Data Analysis Software recommended for those who:

✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.

✔︎ Lacking time to learn RNA-Seq data analysis.

✔︎ Frustrated by the complexity of existing tools.

overview

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.

BxINFO LLC logo

BxINFO LLC

A research support company specializing in bioinformatics.

We provide tools and information to support life science research, with a focus on RNA-Seq analysis.

→ Learn more