>
Ballgown Tutorial: Differential Expression Analysis in R

Ballgown Tutorial: Differential Expression Analysis in R

Last updated: March 13, 2026

Introduction

When performing RNA-Seq analysis with a next-generation sequencer, you obtain raw data in the form of FASTQ files. After mapping these reads to a reference genome, gene expression levels are quantified by counting the reads that align to each gene. Differentially expressed genes (DEGs) are then detected by comparing expression levels across multiple samples between groups.

This page explains how to use Ballgown, a tool for detecting differentially expressed genes. For an overview of the complete RNA-Seq analysis workflow, please refer to this page.

Installation

First, if R is not yet installed on your system, install it with the following command:

$ brew install r

Launch R and run the following commands:

> if (!requireNamespace("BiocManager", quietly=TRUE)) > install.packages("BiocManager") > BiocManager::install("ballgown")

If the following command runs without errors, the installation was successful:

> library(ballgown)

Note: in my environment, the installation initially failed with the following error:

Error: package or namespace load failed for 'RCurl' in dyn.load(file, DLLpath = DLLpath, ...)

This appeared to be caused by a conflict with conda.

$ conda deactivate

After deactivating conda and retrying the installation, Ballgown installed successfully.

Preparing the Data

Use StringTie to generate the expression quantification data that Ballgown will import. For detailed instructions, see the StringTie tutorial.

After completing the StringTie analysis, the output directory will have the following structure. While GTF files are included, Ballgown only uses the *.ctab files.

. └── result ├── sample1 │ ├── e2t.ctab │ ├── e_data.ctab │ ├── i2t.ctab │ ├── i_data.ctab │ ├── sample1.gtf │ └── t_data.ctab ├── sample2 │ ├── e2t.ctab │ ├── e_data.ctab │ ├── i2t.ctab │ ├── i_data.ctab │ ├── sample2.gtf │ └── t_data.ctab ├── sample3 │ ├── e2t.ctab │ ├── e_data.ctab │ ├── i2t.ctab │ ├── i_data.ctab │ ├── sample3.gtf │ └── t_data.ctab └── sample4 ├── e2t.ctab ├── e_data.ctab ├── i2t.ctab ├── i_data.ctab ├── sample4.gtf └── t_data.ctab

Loading the Data

Run the following command to load the data into Ballgown:

> bg = ballgown(dataDir='./result', samplePattern='sample', meas='all')

Use dataDir to specify the directory containing the results, and samplePattern to provide a regular expression that matches the sample folder names.

Display transcript-level FPKM values:

> texpr(bg)

Display gene-level FPKM values:

> gexpr(bg)

Extracting Differentially Expressed Genes

In this example, we compare two groups. Add group labels as follows:

> pData(bg) <- data.frame(id=sampleNames(bg), group=c(0,0,1,1))

Run the statistical test:

> stat_results = stattest(bg, feature="transcript", meas="FPKM", covariate="group")

Sort the results by p-value to see the most significant hits:

> stat_results[order(stat_results$pval),]

The output will look like this:

feature id pval qval 71628 transcript 71628 9.481568e-06 0.4774591 230911 transcript 230911 3.129703e-05 0.4774591 50751 transcript 50751 3.244529e-05 0.4774591 ...

RNA-Seq Data Analysis Software

This is an RNA-Seq Data Analysis Software recommended for those who:

✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.

✔︎ Lacking time to learn RNA-Seq data analysis.

✔︎ Frustrated by the complexity of existing tools.

overview

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.

BxINFO LLC logo

BxINFO LLC

A research support company specializing in bioinformatics.

We provide tools and information to support life science research, with a focus on RNA-Seq analysis.

→ Learn more