DESeq2: A Tutorial for Differential Expression Analysis in RNA-Seq Data

What is DESeq2?

RNA-Seq analysis using next-generation sequencing allows for the measurement of gene expression levels for each gene. By comparing these quantitative results of gene expression across multiple samples, differentially expressed genes can be identified through comparisons between sample groups.

This page explains how to use and install DESeq2, a software for identifying differentially expressed genes.

If you find the following procedures difficult, we also offer a web-based software that allows you to easily identify differentially expressed genes.

Installing DESeq2

First, if R is not already installed, install R. (The following is an example of installation using Homebrew.)

$ brew install r

Launch R and execute the following to install BiocManager and DESeq2.

> if (!requireNamespace("BiocManager", quietly=TRUE)) > install.packages("BiocManager") > BiocManager::install("DESeq2")

Execute the following, and if no errors are displayed, the installation is successful.

> library(DESeq2)

Preparing Data

Using software such as featureCounts, StringTie, and RSEM, obtain quantitative results of gene expression levels.

Ultimately, the data was organized into a comma-separated file (CSV file) as shown below. Please note that the file input into DESeq2 should be raw read counts, not normalized data such as FPKM/RPKM or TPM.

DESeq2のデータの準備

How to Use DESeq2

> counts <- read.csv("counts.csv", sep=",", row.names=1) > coldata <- data.frame(condition = factor(c("A", "A", "A", "A", "B", "B", "B", "B"))) > dds <- DESeqDataSetFromMatrix(countData = counts, colData = coldata, design = ~ condition) > dds <- DESeq(dds) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing

The count data and sample information were passed to the DESeqDataSetFromMatrix function. In this analysis, the samples were divided into two groups for comparison: samples 1 to 4 as Group A and samples 5 to 8 as Group B.

The results can be displayed with the following.

> res <- results(dds) > res log2 fold change (MLE): condition B vs A Wald test p-value: condition B vs A DataFrame with 62696 rows and 8 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> ENSG00000290825.1 0.0000 NA NA NA NA NA ENSG00000223972.6 0.0000 NA NA NA NA NA ENSG00000227232.5 11.4438 -0.619619 0.758925 -0.816443 0.414247 NA ENSG00000278267.1 1.6048 -0.826152 1.974423 -0.418427 0.675635 NA ENSG00000243485.5 0.0000 NA NA NA NA NA ... ... ... ... ... ... ... ENSG00000198695.2 0 NA NA NA NA NA ENSG00000210194.1 0 NA NA NA NA NA ENSG00000198727.2 0 NA NA NA NA NA ENSG00000210195.2 0 NA NA NA NA NA ENSG00000210196.2 0 NA NA NA NA NA

RNA-Seq Data Analysis Software

For those who don't have the time to study analysis methods or lack a high-spec computer necessary for the analysis, please consider using our RNA-Seq data analysis software.

概要

Starting with either raw RNA-Seq data (FASTQ files/public data) or expression tables (CSV/TSV files), users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.