>
FastQC Tutorial: Quality Control for FASTQ Files

FastQC Tutorial: Quality Control for FASTQ Files

Last updated: March 13, 2026

📖 RNA-Seq Data Analysis Workflow — check it out for an overview.

Introduction

When you perform sequencing with a next-generation sequencer (NGS), you obtain raw data in the form of FASTQ files, which contain the base sequences of reads along with their quality scores. After running NGS, the first step is to perform a quality check on your FASTQ files to make sure there are no problems with read quality. FastQC is the most widely used tool for checking the quality of FASTQ files.

This page explains how to run FastQC from the command line to perform quality checks on your sequencing data.

Installation

FastQC can be downloaded from here.

Installion FastQC

If you are on macOS and want to use FastQC from the command line, select the Win/Linux zip file.

You can also download it directly from the command line as shown below (adjust the version number as needed).

$ wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.12.1.zip
Unzip the downloaded file and grant execution permissions.
$ unzip fastqc_v0.12.1.zip $ cd FastQC/ $ chmod u+x fastqc

Verify that FastQC is working correctly by displaying the help message.

$ ./fastqc -h

If you see the following output, the installation was successful. It is recommended to add FastQC to your system's PATH for easier access.

FastQC - A high throughput sequence QC analysis tool SYNOPSIS fastqc seqfile1 seqfile2 .. seqfileN fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] [-c contaminant file] seqfile1 .. seqfileN DESCRIPTION FastQC reads a set of sequence files and produces from each one a quality control report consisting of a number of different modules, each one of which will help to identify a different potential type of problem in your data.

Running the Quality Check

Run FastQC with the following commands.
$ mkdir results $ fastqc -o results/ *.fastq

If an HTML file and a ZIP file appear in the results folder, the quality check completed successfully.

Understanding the FastQC Report

The HTML report contains the following sections.

Basic Statistics

Provides an overview of basic information about the input file, such as total sequences, sequence length, and GC content.

Basic Statistics

Per base sequence quality

Shows the quality score at each position along the reads. The horizontal axis represents the position within the read, and the vertical axis represents the quality score.

Per base sequence quality

Per sequence quality scores

Shows the distribution of average quality scores across all reads. The horizontal axis represents the mean quality score, and the vertical axis represents the number of reads.

Per sequence quality scores

Per base sequence content

Shows the proportion of each base (A, T, G, C) at every position along the reads. The horizontal axis represents the position within the read, and the vertical axis represents the base proportion.

Per base sequence content

Per sequence GC content

Shows the distribution of GC content across all reads. The horizontal axis represents the GC percentage, and the vertical axis represents the number of reads.

Per sequence GC content

Per base N content

Shows the percentage of ambiguous bases (N) at each position along the reads. The horizontal axis represents the position within the read, and the vertical axis represents the proportion of N calls.

Per base N content

Sequence Length Distribution

Shows the distribution of read lengths in the dataset. The horizontal axis represents the read length, and the vertical axis represents the number of reads.

Sequence Length Distribution

Sequence Duplication Levels

Shows the degree of duplication among reads. The horizontal axis represents how many times a sequence is duplicated, and the vertical axis represents the percentage of reads at each duplication level.

Sequence Duplication Levels

Overrepresented sequences

Lists sequences that appear at unusually high frequency in the dataset.

Overrepresented sequences

Adapter Content

Shows the proportion of adapter sequences detected at each position along the reads. The horizontal axis represents the position within the read, and the vertical axis represents the adapter proportion.

Adapter Content

RNA-Seq Data Analysis Software

This is an RNA-Seq Data Analysis Software recommended for those who:

✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.

✔︎ Lacking time to learn RNA-Seq data analysis.

✔︎ Frustrated by the complexity of existing tools.

overview

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.

BxINFO LLC logo

BxINFO LLC

A research support company specializing in bioinformatics.

We provide tools and information to support life science research, with a focus on RNA-Seq analysis.

→ Learn more