>
SnpEff Tutorial: Variant Annotation & Effect Prediction

SnpEff Tutorial: Variant Annotation & Effect Prediction

Last updated: February 16, 2026

Introduction

Whole genome sequencing (WGS) and whole exome sequencing (WES) typically identify a large number of variants. To find particularly important variants among them, it is necessary to perform variant annotation.

One of the most commonly used software tools for variant annotation is SnpEff. By providing SnpEff with variant calling results (VCF files), it outputs a VCF file with annotations added to the variants.

Installation

Install SnpEff using the following commands.
$ wget https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip $ unzip snpEff_latest_core.zip
Let's display the help message to verify the installation.
$ java -jar snpEff.jar --help
If the following output is displayed, the installation was successful.
SnpEff version SnpEff 5.1d (build 2022-04-19 15:49), by Pablo Cingolani Usage: snpEff [command] [options] [files] Run 'java -jar snpEff.jar command' for help on each specific command Available commands: [eff|ann] : Annotate variants / calculate effects (you can use either 'ann' or 'eff', they mean the same). Default: ann (no command or 'ann'). build : Build a SnpEff database. buildNextProt : Build a SnpEff for NextProt (using NextProt's XML files). cds : Compare CDS sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness. closest : Annotate the closest genomic region. count : Count how many intervals (from a BAM, BED or VCF file) overlap with each genomic interval. ...

Adding Annotations

Run the following command to add annotations to variants.
$ java -Xmx8g -jar snpEff.jar -v GRCh37.75 test.vcf > test.ann.vcf

Before Annotation (test.vcf)

#CHROM POS ID REF ALT QUAL FILTER INFO 22 17071756 . T C . . . 22 17072035 . C T . . . 22 17072258 . C A . . . 22 17072674 . G A . . . 22 17072747 . T C . . . 22 17072781 . C T . . .

After Annotation (test.ann.vcf)

##SnpEffVersion="5.1d (build 2022-04-19 15:49), by Pablo Cingolani" ##SnpEffCmd="SnpEff GRCh37.75 snpEff/examples/test.chr22.vcf " ##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' "> ##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'"> ##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'"> #CHROM POS ID REF ALT QUAL FILTER INFO 22 17071756 . T C . . ANN=C|3_prime_UTR_variant|MODIFIER|CCT8L2|ENSG00000198445|transcript|ENST00000359963|protein_coding|1/1|c.*11A>G|||||11|,C|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*4223A>G|||||4223| 22 17072035 . C T . . ANN=T|missense_variant|MODERATE|CCT8L2|ENSG00000198445|transcript|ENST00000359963|protein_coding|1/1|c.1406G>A|p.Gly469Glu|1666/2034|1406/1674|469/557||,T|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*3944G>A|||||3944| 256/557||,A|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*3305C>T|||||3305| 22 17072747 . T C . . ANN=C|missense_variant|MODERATE|CCT8L2|ENSG00000198445|transcript|ENST00000359963|protein_coding|1/1|c.694A>G|p.Met232Val|954/2034|694/1674|232/557||,C|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*3232A>G|||||3232| 22 17072781 . C T . . ANN=T|synonymous_variant|LOW|CCT8L2|ENSG00000198445|transcript|ENST00000359963|protein_coding|1/1|c.660G>A|p.Pro220Pro|920/2034|660/1674|220/557||,T|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*3198G>A|||||3198|

Headers have been added to the beginning of the VCF file, and annotations have been added to the variants.

RNA-Seq Data Analysis Software

This is an RNA-Seq Data Analysis Software recommended for those who:

✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.

✔︎ Lacking time to learn RNA-Seq data analysis.

✔︎ Frustrated by the complexity of existing tools.

overview

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.

BxINFO LLC logo

BxINFO LLC

A research support company specializing in bioinformatics.

We provide tools and information to support life science research, with a focus on RNA-Seq analysis.

→ Learn more