fasterq-dump: A Tutorial for Retrieving FASTQ Files from a Public Database

Introduction

When a research paper using sequence data obtained from next-generation sequencers is submitted, it is common for the sequence data to be registered in a public database. This page explains the steps to retrieve FASTQ files from a public database using the fasterq-dump command from the SRA Toolkit.

Installing SRA Toolkit

The binary is provided here, so let's proceed with the download.

The following steps will guide you through downloading and extracting the file (for Mac).

$ wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-mac64.tar.gz $ tar -vxzf sratoolkit.current-mac64.tar.gz

It is recommended to add sratoolkit.*-mac64/bin to your PATH.

To retrieve FASTQ files, we will use fasterq-dump, so let's display the help for fasterq-dump

$ fasterq-dump -h

If the following output is displayed, the installation is successful.

Usage: fasterq-dump [ options ] [ accessions(s)... ] Parameters: accessions(s) list of accessions to process Options: -o|--outfile <path> full path of outputfile (overrides usage of current directory and given accession) -O|--outdir <path> path for outputfile (overrides usage of current directory, but uses given accession) ...

Retrieving Accession Numbers

First, search for the data you want to download on NCBI SRA. If you already know the accession number, this step is not necessary.

Make sure to note down the accession number displayed on the following screen.

アクセッション番号

Retrieving FASTQ Files

To retrieve FASTQ files, we use fasterq-dump. There is also a tool called fastq-dump, but fasterq-dump is its faster version.

Use the following command to retrieve the FASTQ file.

$ fasterq-dump SRR20791120

If the following message is displayed, the retrieval is complete.

spots read : 24,448,654 reads read : 48,897,308 reads written : 24,448,654 reads 0-length : 24,448,654

"SRR20791120.fastq" has been created in the current directory.

For paired-end reads, you can use the following option to retrieve the files separately.

$ fasterq-dump --split-files SRR20791120

"SRR20791120_1.fastq" and "SRR20791120_2.fastq" have been created.

RNA-Seq Data Analysis Software

With our RNA-Seq data analysis software, you won't need to outsource or rely on collaborators. You can start analyzing the data yourself right away, without the need for high-spec computers or knowledge of Linux commands.

概要

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.