>
How to Use Ballgown: Detecting Differentially Expressed Genes in RNA-Seq Analysis

How to Use Ballgown: Detecting Differentially Expressed Genes in RNA-Seq Analysis

Last updated: February 2, 2026

Introduction

When performing RNA-Seq analysis using next-generation sequencing, raw sequencing data called FASTQ files are obtained. After mapping the reads to a reference genome, gene expression levels are quantified by counting the reads mapped to each gene. By comparing expression levels across multiple samples between groups, differentially expressed genes can be detected.

This page explains how to use Ballgown, a software tool for detecting differentially expressed genes. If you would like to understand the overall workflow of RNA-Seq analysis, please refer to the RNA-Seq analysis workflow overview.

Installation

First, if R is not installed on your system, install it using the following command:

$ brew install r

Launch R and run the following:

> if (!requireNamespace("BiocManager", quietly=TRUE)) > install.packages("BiocManager") > BiocManager::install("ballgown")

If the following command runs without errors, the installation was successful:

> library(ballgown)

In my environment, however, the installation failed with the following error:

Error: package or namespace load failed for ‘RCurl’ in dyn.load(file, DLLpath = DLLpath, ...)

This seemed to be caused by conda.

$ conda deactivate

After deactivating conda and retrying, Ballgown was installed successfully.

Preparing the Data

Use StringTie to generate transcript expression data that can be imported into Ballgown. For more details, please refer to this page about StringTie.

After running the analysis, the output directory will have the following structure. Although GTF files are included, Ballgown uses only the *.ctab files.

. └── result ├── sample1 │ ├── e2t.ctab │ ├── e_data.ctab │ ├── i2t.ctab │ ├── i_data.ctab │ ├── sample1.gtf │ └── t_data.ctab ├── sample2 │ ├── e2t.ctab │ ├── e_data.ctab │ ├── i2t.ctab │ ├── i_data.ctab │ ├── sample2.gtf │ └── t_data.ctab ├── sample3 │ ├── e2t.ctab │ ├── e_data.ctab │ ├── i2t.ctab │ ├── i_data.ctab │ ├── sample3.gtf │ └── t_data.ctab └── sample4 ├── e2t.ctab ├── e_data.ctab ├── i2t.ctab ├── i_data.ctab ├── sample4.gtf └── t_data.ctab

Loading the Data

Run the following command to load the data:

> bg = ballgown(dataDir='./result', samplePattern='sample', meas='all')

Specify the results directory using dataDir, and define the sample folder name pattern using a regular expression with samplePattern.

Let’s display transcript-level FPKM values:

> texpr(bg)

Let’s display gene-level FPKM values:

> gexpr(bg)

Extracting Differentially Expressed Genes

In this example, we perform a comparison between two groups. Add group information as follows:

> pData(bg) <- data.frame(id=sampleNames(bg), group=c(0,0,1,1))

Next, perform the statistical test:

> stat_results = stattest(bg, feature="transcript", meas="FPKM", covariate="group")

Sort the results by p-value:

> stat_results[order(stat_results$pval),]

The results will look like this:

feature id pval qval 71628 transcript 71628 9.481568e-06 0.4774591 230911 transcript 230911 3.129703e-05 0.4774591 50751 transcript 50751 3.244529e-05 0.4774591 ...

RNA-Seq Data Analysis Software

This is an RNA-Seq Data Analysis Software recommended for those who:

✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.

✔︎ Lacking time to learn RNA-Seq data analysis.

✔︎ Frustrated by the complexity of existing tools.

overview

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.

BxINFO LLC logo

BxINFO LLC

A research support company specializing in bioinformatics.

We provide tools and information to support life science research, with a focus on RNA-Seq analysis.

→ Learn more