fasterq-dump: A Tutorial for Retrieving FASTQ Files from a Public Database
Introduction
When a research paper using sequence data obtained from next-generation sequencers is submitted, it is common for the sequence data to be registered in a public database. This page explains the steps to retrieve FASTQ files from a public database using the fasterq-dump command from the SRA Toolkit.
Installing SRA Toolkit
The binary is provided here, so let's proceed with the download.
The following steps will guide you through downloading and extracting the file (for Mac).
It is recommended to add sratoolkit.*-mac64/bin to your PATH.
To retrieve FASTQ files, we will use fasterq-dump, so let's display the help for fasterq-dump
If the following output is displayed, the installation is successful.
Retrieving Accession Numbers
First, search for the data you want to download on NCBI SRA. If you already know the accession number, this step is not necessary.
Make sure to note down the accession number displayed on the following screen.
Retrieving FASTQ Files
To retrieve FASTQ files, we use fasterq-dump. There is also a tool called fastq-dump, but fasterq-dump is its faster version.
Use the following command to retrieve the FASTQ file.
If the following message is displayed, the retrieval is complete.
"SRR20791120.fastq" has been created in the current directory.
For paired-end reads, you can use the following option to retrieve the files separately.
"SRR20791120_1.fastq" and "SRR20791120_2.fastq" have been created.
RNA-Seq Data Analysis Software
With our RNA-Seq data analysis software, you won't need to outsource or rely on collaborators. You can start analyzing the data yourself right away, without the need for high-spec computers or knowledge of Linux commands.
Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.