fastp Tutorial: Fast FASTQ Preprocessing & Quality Control
Introduction
Raw data from next-generation sequencers (FASTQ files) often contains reads with adapter sequences and reads of poor quality. Before proceeding with downstream analyses, you need to trim adapter sequences and filter out low-quality reads from your FASTQ files.
fastp is one of the tools available for preprocessing FASTQ files. While other preprocessing tools exist, fastp is particularly fast because it is written in C++ and supports multithreading.
Installing fastp
On CentOS or Ubuntu, you can install fastp with the following commands.
On macOS, running the binary downloaded above may produce the following error.
In that case, you can install fastp through Bioconda instead.
Try displaying the help message to verify the installation.
If you see output similar to the following, the installation was successful.
Running Preprocessing
For single-end reads, run the following command to preprocess your data.
For paired-end reads, use the following command instead.
The preprocessed FASTQ files are output in GZIP-compressed format.
fastp also generates an HTML report like the one shown below.
You can view a full sample report here.
RNA-Seq Data Analysis Software
This is an RNA-Seq Data Analysis Software recommended for those who:
✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.
✔︎ Lacking time to learn RNA-Seq data analysis.
✔︎ Frustrated by the complexity of existing tools.
Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.
About the Author
BxINFO LLC
A research support company specializing in bioinformatics.
We provide tools and information to support life science research, with a focus on RNA-Seq analysis.