>
SnpEff Tutorial: Variant Annotation & Effect Prediction

SnpEff Tutorial: Variant Annotation & Effect Prediction

Last updated: March 13, 2026

Introduction

Whole genome sequencing (WGS) and whole exome sequencing (WES) typically uncover a large number of variants. To identify the most biologically significant variants among them, you need to perform variant annotation.

SnpEff is one of the most widely used tools for variant annotation. Given a set of variant calls in VCF format, SnpEff produces an annotated VCF file with functional information added to each variant.

Installation

Install SnpEff by running the following commands.
$ wget https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip $ unzip snpEff_latest_core.zip
Display the help message to confirm that the installation was successful.
$ java -jar snpEff.jar --help
If you see output similar to the following, the installation is complete.
SnpEff version SnpEff 5.1d (build 2022-04-19 15:49), by Pablo Cingolani Usage: snpEff [command] [options] [files] Run 'java -jar snpEff.jar command' for help on each specific command Available commands: [eff|ann] : Annotate variants / calculate effects (you can use either 'ann' or 'eff', they mean the same). Default: ann (no command or 'ann'). build : Build a SnpEff database. buildNextProt : Build a SnpEff for NextProt (using NextProt's XML files). cds : Compare CDS sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness. closest : Annotate the closest genomic region. count : Count how many intervals (from a BAM, BED or VCF file) overlap with each genomic interval. ...

Annotating Variants

Run the following command to annotate the variants in your VCF file.
$ java -Xmx8g -jar snpEff.jar -v GRCh37.75 test.vcf > test.ann.vcf

Before Annotation (test.vcf)

#CHROM POS ID REF ALT QUAL FILTER INFO 22 17071756 . T C . . . 22 17072035 . C T . . . 22 17072258 . C A . . . 22 17072674 . G A . . . 22 17072747 . T C . . . 22 17072781 . C T . . .

After Annotation (test.ann.vcf)

##SnpEffVersion="5.1d (build 2022-04-19 15:49), by Pablo Cingolani" ##SnpEffCmd="SnpEff GRCh37.75 snpEff/examples/test.chr22.vcf " ##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' "> ##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'"> ##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'"> #CHROM POS ID REF ALT QUAL FILTER INFO 22 17071756 . T C . . ANN=C|3_prime_UTR_variant|MODIFIER|CCT8L2|ENSG00000198445|transcript|ENST00000359963|protein_coding|1/1|c.*11A>G|||||11|,C|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*4223A>G|||||4223| 22 17072035 . C T . . ANN=T|missense_variant|MODERATE|CCT8L2|ENSG00000198445|transcript|ENST00000359963|protein_coding|1/1|c.1406G>A|p.Gly469Glu|1666/2034|1406/1674|469/557||,T|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*3944G>A|||||3944| 256/557||,A|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*3305C>T|||||3305| 22 17072747 . T C . . ANN=C|missense_variant|MODERATE|CCT8L2|ENSG00000198445|transcript|ENST00000359963|protein_coding|1/1|c.694A>G|p.Met232Val|954/2034|694/1674|232/557||,C|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*3232A>G|||||3232| 22 17072781 . C T . . ANN=T|synonymous_variant|LOW|CCT8L2|ENSG00000198445|transcript|ENST00000359963|protein_coding|1/1|c.660G>A|p.Pro220Pro|920/2034|660/1674|220/557||,T|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*3198G>A|||||3198|

Header lines have been added at the top of the VCF file, and each variant now includes functional annotations.

RNA-Seq Data Analysis Software

This is an RNA-Seq Data Analysis Software recommended for those who:

✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.

✔︎ Lacking time to learn RNA-Seq data analysis.

✔︎ Frustrated by the complexity of existing tools.

overview

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.

BxINFO LLC logo

BxINFO LLC

A research support company specializing in bioinformatics.

We provide tools and information to support life science research, with a focus on RNA-Seq analysis.

→ Learn more