How to Use fastp: Preprocessing FASTQ Files

Introduction

Raw data output from next-generation sequencers (FASTQ files) contains reads with adapter sequences and low-quality reads. Therefore, before proceeding with various analyses, it is necessary to first trim adapter sequences and filter out low-quality reads from the FASTQ files.

One of the software tools used for preprocessing FASTQ files is fastp. While there are other tools available for FASTQ file preprocessing, fastp stands out for its high speed, as it is implemented in C++ and supports multithreading.

Installing fastp

If you are using CentOS or Ubuntu, it can be installed with the following command.

$ wget http://opengene.org/fastp/fastp $ chmod a+x ./fastp

On a Mac, attempting to use the above command may result in the following.

./fastp: cannot execute binary file

In that case, it can be installed via Bioconda.

$ conda install -c bioconda fastp

Let's display the help.

$ fastp --help

If the following is displayed, the installation was successful.

usage: fastp [options] ... options: -i, --in1 read1 input file name (string [=]) -o, --out1 read1 output file name (string [=]) -I, --in2 read2 input file name (string [=]) -O, --out2 read2 output file name (string [=]) -D, --dedup enable deduplication to drop the duplicated reads/pairs...

Performing Preprocessing

For single-end reads, preprocessing can be performed with the following command.

$ fastp -i raw.fastq -o filtered.fastq.gz

For paired-end reads, preprocessing can be performed with the following command.

$ fastp -i raw_1.fastq -I raw_2.fastq -o filtered_1.fastq.gz -O filtered_2.fastq.gz

The FASTQ files have been output in GZIP format.

A report like the following is also generated。

fastpサマリー

You can view the details of the report here.

RNA-Seq Data Analysis Software

With our RNA-Seq data analysis software, you won't need to outsource or rely on collaborators. You can start analyzing the data yourself right away, without the need for high-spec computers or knowledge of Linux commands.

概要

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.