Using featureCounts for Quantification of Gene Expression in RNA-seq Analysis

Introduction

When performing RNA-seq analysis using next-generation sequencing, raw data called FASTQ files (reads) are obtained. After mapping each read to a reference sequence, gene expression levels are quantified by counting the reads that are mapped to each gene.

This page explains how to use featureCounts, a software for counting reads.

Installing featureCounts

It is easy to install using Bioconda. Since featureCounts is included in the Subread package, it is necessary to install Subread.

$ conda install -c bioconda subread

Let's display the help.

$ featureCounts

If it displays as follows, the installation is successful.

Version 2.0.1 Usage: featureCounts [options] -a <annotation_file> -o <output_file> input_file1 [input_file2] ... ## Mandatory arguments: -a <string> Name of an annotation file. GTF/GFF format by default. See -F option for more format information. Inbuilt annotations (SAF format) is available in 'annotation' directory of the package. Gzipped file is also accepted. ...

Conducting Read Counting

Read counting is performed using the following command. The analysis is conducted on four samples: sample1, sample2, sample3, and sample4.

$ featureCounts -p -t exon -g gene_id -a annotation.gtf -o counts.txt sample1.bam sample2.bam sample3.bam sample4.bam

Explanation of Options

OptionDescription
-pThis option is used in the case of paired reads when counting fragments instead of individual reads.
-tThis option specifies the feature type in the GTF file that is targeted for read counting. The default is 'exon'.
-gThis option specifies the attribute in the GTF file to be used as the unit of read counting. The default is 'gene_id'.

In this example, read counting is done using fragments instead of reads, and reads mapped to exons are targeted for aggregation, with the aggregation performed by gene_id.

Results

The following results were obtained.

featureCountsの結果

In the first line, the version of featureCounts used and the command are noted, and starting from the seventh column, the results of the read counts are displayed.

The contents of columns 1 to 6 are as follows.

Column NumberColumn NameDescription
1GeneidGene ID
2ChrChromosom
3StartThe starting positions of exons; listed for all exons separated by semicolons.
4EndThe ending positions of exons; listed for all exons separated by semicolons.
5StrandThe orientation of exons; listed for all exons separated by semicolons.
6LengthThe length of the gene; if there is overlap among exons, it is shorter than the total length of all exons.

RNA-Seq Data Analysis Software

For those who don't have the time to study analysis methods or lack a high-spec computer necessary for the analysis, please consider using our RNA-Seq data analysis software.

概要

Starting with either raw RNA-Seq data (FASTQ files/public data) or expression tables (CSV/TSV files), users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.