>
PCA in RNA-Seq: How to Interpret Principal Component Analysis

PCA in RNA-Seq: How to Interpret Principal Component Analysis

Last updated: March 13, 2026

📖 RNA-Seq Data Analysis Workflow — check it out for an overview.

In RNA-Seq analysis, Principal Component Analysis (PCA) is frequently used to visualize how similar or different samples are in terms of their gene expression profiles.

What is Principal Component Analysis (PCA)?

PCA is a dimensionality reduction technique that projects high-dimensional data into fewer dimensions while preserving as much information as possible.

The method works by finding the axis along which the data varies the most -- this becomes the first principal component (PC1). PC2 is then the axis orthogonal to PC1 that captures the most remaining variance. PC3 is orthogonal to both PC1 and PC2 and captures the next largest share of variance, and so on for PC4, PC5, etc.

Each principal component has an associated "explained variance ratio," which indicates what fraction of the total data variance it accounts for. The sum of the explained variance ratios up to the m-th component is called the "cumulative explained variance ratio." For example, if PC1 explains 50% and PC2 explains 30% of the variance, the cumulative explained variance ratio through PC2 is 80%. This means the first two principal components together capture 80% of the information in the original data.

High-dimensional data is inherently difficult to visualize. However, when the cumulative explained variance ratio for the first two principal components is high, you can create a two-dimensional scatter plot that faithfully represents the data.

PCA in RNA-Seq Analysis

An RNA-Seq analysis produces a gene expression table like the one below:

Example of a gene expression table

Each sample has as many values as there are genes, making the data very high-dimensional. (The image shows only 10 genes, but a real dataset can contain tens of thousands of genes depending on the species.)

By running PCA on this data and plotting PC1 on the horizontal axis and PC2 on the vertical axis, you can visualize how similar the samples are to one another. In this plot, samples 1 through 3 cluster together, suggesting they have similar gene expression profiles.

The explained variance ratios are shown in parentheses next to PC1 and PC2. The cumulative explained variance ratio through PC2 is 38.57% + 19.55% = 58.12%, meaning this scatter plot captures 58.12% of the information in the original data.

主成分分析の例

RNA-Seq Data Analysis Software

This is an RNA-Seq Data Analysis Software recommended for those who:

✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.

✔︎ Lacking time to learn RNA-Seq data analysis.

✔︎ Frustrated by the complexity of existing tools.

overview

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.

BxINFO LLC logo

BxINFO LLC

A research support company specializing in bioinformatics.

We provide tools and information to support life science research, with a focus on RNA-Seq analysis.

→ Learn more