edgeR Tutorial: Differential Expression Analysis in R
📖 RNA-Seq Data Analysis Workflow — check it out for an overview.
What is edgeR?
Performing RNA-Seq analysis with a next-generation sequencer yields expression levels for each gene. By comparing these expression levels across multiple samples, you can detect statistically significant differentially expressed genes (DEGs).
edgeR is a widely used software package for detecting differentially expressed genes, and is one of the most popular tools in this field along with DESeq2.
In this article, we walk through how to install edgeR and how to use it for a basic analysis.
For an overview of the entire RNA-Seq data analysis workflow, see the RNA-Seq analysis workflow guide.
Installing edgeR
First, if you do not already have R installed, you will need to install it. (The example below uses Homebrew.)
Start R and run the following commands to install BiocManager and edgeR.
Run the command below to load the package. If no errors appear, the installation was successful.
Preparing Your Data
Use a quantification tool such as featureCounts, StringTie, or RSEM to obtain gene expression counts.
Organize the results into a comma-separated (CSV) file like the one shown below. Note that edgeR requires raw read counts as input, not normalized values such as FPKM/RPKM or TPM.
Running edgeR
Here we read in the count data and combine it with group information to create a DGEList object. Since we want to perform a two-group comparison between samples 1 through 4 and samples 5 through 8, we assign them to groups A and B.
We use filterByExpr to filter out lowly expressed genes. In this example, the number of genes was reduced from 35,627 to 14,698.
We use calcNormFactors to perform TMM normalization, which corrects for systematic biases between samples.
Finally, we run a quasi-likelihood F-test to identify differentially expressed genes.
The differentially expressed genes have been successfully identified. For a detailed explanation of logFC, see our logFC explanation page. The CPM in logCPM stands for Counts Per Million.
You can extract all genes with FDR < 0.05 as follows:
FDR stands for False Discovery Rate. When filtering with FDR < 0.05, this means that among the extracted genes, the proportion of genes that are not truly differentially expressed (false positives) is expected to be 5%.
What is TMM Normalization?
TMM normalization is one of the methods for correcting gene expression levels in RNA-Seq analysis, and it is the approach implemented in edgeR.
What RNA-Seq measures is not absolute expression levels but relative expression levels. Because of this, when a small number of genes are highly expressed, the expression levels of other genes can appear to decrease in relative terms. TMM normalization addresses this by applying corrections that minimize expression differences between samples. This method produces reliable corrections as long as the majority of genes in the dataset are not differentially expressed across samples.
Note that TMM normalization does not correct for factors that are common across all samples. For example, gene length is known to correlate with read counts -- longer genes tend to accumulate more reads -- but TMM normalization does not adjust for this. Since edgeR is focused on identifying differentially expressed genes between groups, corrections for differences between genes are not necessary, making TMM normalization sufficient for this purpose.
In contrast, normalization methods such as FPKM/RPKM and TPM do include gene length corrections, as they are designed with cross-gene expression comparisons in mind.
RNA-Seq Data Analysis Software
This is an RNA-Seq Data Analysis Software recommended for those who:
✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.
✔︎ Lacking time to learn RNA-Seq data analysis.
✔︎ Frustrated by the complexity of existing tools.
Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.
About the Author
BxINFO LLC
A research support company specializing in bioinformatics.
We provide tools and information to support life science research, with a focus on RNA-Seq analysis.