>
edgeR Tutorial: Differential Expression Analysis in R

edgeR Tutorial: Differential Expression Analysis in R

Last updated: March 13, 2026

📖 RNA-Seq Data Analysis Workflow — check it out for an overview.

What is edgeR?

Performing RNA-Seq analysis with a next-generation sequencer yields expression levels for each gene. By comparing these expression levels across multiple samples, you can detect statistically significant differentially expressed genes (DEGs).

edgeR is a widely used software package for detecting differentially expressed genes, and is one of the most popular tools in this field along with DESeq2.

In this article, we walk through how to install edgeR and how to use it for a basic analysis.

For an overview of the entire RNA-Seq data analysis workflow, see the RNA-Seq analysis workflow guide.

Installing edgeR

First, if you do not already have R installed, you will need to install it. (The example below uses Homebrew.)

$ brew install r

Start R and run the following commands to install BiocManager and edgeR.

> if (!requireNamespace("BiocManager", quietly=TRUE)) > install.packages("BiocManager") > BiocManager::install("edgeR")

Run the command below to load the package. If no errors appear, the installation was successful.

> library(edgeR)

Preparing Your Data

Use a quantification tool such as featureCounts, StringTie, or RSEM to obtain gene expression counts.

Organize the results into a comma-separated (CSV) file like the one shown below. Note that edgeR requires raw read counts as input, not normalized values such as FPKM/RPKM or TPM.

Example count matrix for edgeR

Running edgeR

> counts <- read.csv("counts.csv", sep=",", row.names=1) > group <- factor(c("A", "A", "A", "A", "B", "B", "B", "B")) > y <- DGEList(counts=counts, group=group)

Here we read in the count data and combine it with group information to create a DGEList object. Since we want to perform a two-group comparison between samples 1 through 4 and samples 5 through 8, we assign them to groups A and B.

> nrow(y) [1] 35627 > keep <- filterByExpr(y) > y <- y[keep, , keep.lib.sizes=FALSE] > nrow(y) [1] 14698

We use filterByExpr to filter out lowly expressed genes. In this example, the number of genes was reduced from 35,627 to 14,698.

> y <- calcNormFactors(y) > y$samples group lib.size norm.factors sample1 A 20190676 0.5252614 sample2 A 17815578 0.7264956 sample3 A 16858297 0.8385346 sample4 A 17080450 0.5972758 sample5 B 18305317 1.4576576 sample6 B 23123425 1.5620988 sample7 B 23260262 1.6452239 sample8 B 19145522 1.3967109

We use calcNormFactors to perform TMM normalization, which corrects for systematic biases between samples.

Finally, we run a quasi-likelihood F-test to identify differentially expressed genes.

> design <- model.matrix(~group) > y <- estimateDisp(y, design) > fit <- glmQLFit(y, design) > qlf <- glmQLFTest(fit, coef=2) > topTags(qlf) Coefficient: groupB logFC logCPM F PValue FDR ENSMUSG00000038738 -3.934239 4.352090 191.17683 4.406800e-08 0.0006477115 ENSMUSG00000021250 2.543736 6.859874 111.07055 6.409002e-07 0.0027772690 ENSMUSG00000032487 3.859926 1.891284 103.08209 9.187893e-07 0.0027772690 ENSMUSG00000070495 -3.377550 3.217322 99.99038 1.063542e-06 0.0027772690 ENSMUSG00000064356 -3.325373 11.863375 96.88620 1.237031e-06 0.0027772690 ENSMUSG00000033453 -1.469414 5.458883 95.42623 1.330180e-06 0.0027772690 ENSMUSG00000100862 -15.490014 7.031386 169.49446 1.993728e-06 0.0027772690 ENSMUSG00000096887 -3.332279 8.621508 85.88952 2.194494e-06 0.0027772690 ENSMUSG00000054942 -1.453327 5.837918 83.01079 2.577827e-06 0.0027772690 ENSMUSG00000004842 -5.072918 2.248245 82.51494 2.651651e-06 0.0027772690

The differentially expressed genes have been successfully identified. For a detailed explanation of logFC, see our logFC explanation page. The CPM in logCPM stands for Counts Per Million.

You can extract all genes with FDR < 0.05 as follows:

> result <- as.data.frame(topTags(qlf, n=nrow(y))) > result[result$FDR<0.05,] logFC logCPM F PValue FDR ENSMUSG00000038738 -3.9342393 4.35208965 191.17683 4.406800e-08 0.0006477115 ENSMUSG00000021250 2.5437356 6.85987389 111.07055 6.409002e-07 0.0027772690 ENSMUSG00000032487 3.8599263 1.89128388 103.08209 9.187893e-07 0.0027772690 ENSMUSG00000070495 -3.3775501 3.21732235 99.99038 1.063542e-06 0.0027772690 ENSMUSG00000064356 -3.3253735 11.86337535 96.88620 1.237031e-06 0.0027772690 ENSMUSG00000033453 -1.4694139 5.45888339 95.42623 1.330180e-06 0.0027772690 ENSMUSG00000100862 -15.4900144 7.03138596 169.49446 1.993728e-06 0.0027772690 ENSMUSG00000096887 -3.3322793 8.62150780 85.88952 2.194494e-06 0.0027772690 ENSMUSG00000054942 -1.4533272 5.83791761 83.01079 2.577827e-06 0.0027772690 ENSMUSG00000004842 -5.0729179 2.24824466 82.51494 2.651651e-06 0.0027772690 ...

FDR stands for False Discovery Rate. When filtering with FDR < 0.05, this means that among the extracted genes, the proportion of genes that are not truly differentially expressed (false positives) is expected to be 5%.

What is TMM Normalization?

TMM normalization is one of the methods for correcting gene expression levels in RNA-Seq analysis, and it is the approach implemented in edgeR.

What RNA-Seq measures is not absolute expression levels but relative expression levels. Because of this, when a small number of genes are highly expressed, the expression levels of other genes can appear to decrease in relative terms. TMM normalization addresses this by applying corrections that minimize expression differences between samples. This method produces reliable corrections as long as the majority of genes in the dataset are not differentially expressed across samples.

Note that TMM normalization does not correct for factors that are common across all samples. For example, gene length is known to correlate with read counts -- longer genes tend to accumulate more reads -- but TMM normalization does not adjust for this. Since edgeR is focused on identifying differentially expressed genes between groups, corrections for differences between genes are not necessary, making TMM normalization sufficient for this purpose.

In contrast, normalization methods such as FPKM/RPKM and TPM do include gene length corrections, as they are designed with cross-gene expression comparisons in mind.

RNA-Seq Data Analysis Software

This is an RNA-Seq Data Analysis Software recommended for those who:

✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.

✔︎ Lacking time to learn RNA-Seq data analysis.

✔︎ Frustrated by the complexity of existing tools.

overview

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.

BxINFO LLC logo

BxINFO LLC

A research support company specializing in bioinformatics.

We provide tools and information to support life science research, with a focus on RNA-Seq analysis.

→ Learn more