Whole genome sequencing (WGS) and whole exome sequencing (WES) typically identify a large number of variants. To find particularly important variants among them, it is necessary to perform variant annotation.
One of the most commonly used software tools for variant annotation is SnpEff. By providing SnpEff with variant calling results (VCF files), it outputs a VCF file with annotations added to the variants.
Let's display the help message to verify the installation.
$ java -jar snpEff.jar --help
If the following output is displayed, the installation was successful.
SnpEff version SnpEff 5.1d (build 2022-04-19 15:49), by Pablo Cingolani
Usage: snpEff [command] [options] [files]
Run 'java -jar snpEff.jar command' for help on each specific command
Available commands:
[eff|ann] : Annotate variants / calculate effects (you can use either 'ann' or 'eff', they mean the same). Default: ann (no command or 'ann').
build : Build a SnpEff database.
buildNextProt : Build a SnpEff for NextProt (using NextProt's XML files).
cds : Compare CDS sequences calculated form a SnpEff database to the one in a FASTA file. Used for checking databases correctness.
closest : Annotate the closest genomic region.
count : Count how many intervals (from a BAM, BED or VCF file) overlap with each genomic interval.
...
Adding Annotations
Run the following command to add annotations to variants.
#CHROM POS ID REF ALT QUAL FILTER INFO
22 17071756 . T C . . .
22 17072035 . C T . . .
22 17072258 . C A . . .
22 17072674 . G A . . .
22 17072747 . T C . . .
22 17072781 . C T . . .
After Annotation (test.ann.vcf)
##SnpEffVersion="5.1d (build 2022-04-19 15:49), by Pablo Cingolani"
##SnpEffCmd="SnpEff GRCh37.75 snpEff/examples/test.chr22.vcf "
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
#CHROM POS ID REF ALT QUAL FILTER INFO
22 17071756 . T C . . ANN=C|3_prime_UTR_variant|MODIFIER|CCT8L2|ENSG00000198445|transcript|ENST00000359963|protein_coding|1/1|c.*11A>G|||||11|,C|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*4223A>G|||||4223|
22 17072035 . C T . . ANN=T|missense_variant|MODERATE|CCT8L2|ENSG00000198445|transcript|ENST00000359963|protein_coding|1/1|c.1406G>A|p.Gly469Glu|1666/2034|1406/1674|469/557||,T|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*3944G>A|||||3944|
256/557||,A|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*3305C>T|||||3305|
22 17072747 . T C . . ANN=C|missense_variant|MODERATE|CCT8L2|ENSG00000198445|transcript|ENST00000359963|protein_coding|1/1|c.694A>G|p.Met232Val|954/2034|694/1674|232/557||,C|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*3232A>G|||||3232|
22 17072781 . C T . . ANN=T|synonymous_variant|LOW|CCT8L2|ENSG00000198445|transcript|ENST00000359963|protein_coding|1/1|c.660G>A|p.Pro220Pro|920/2034|660/1674|220/557||,T|downstream_gene_variant|MODIFIER|FABP5P11|ENSG00000240122|transcript|ENST00000430910|processed_pseudogene||n.*3198G>A|||||3198|
Headers have been added to the beginning of the VCF file, and annotations have been added to the variants.
✔︎ Seeking to avoid outsourcing or collaboration for RNA-Seq data analysis.
✔︎ Lacking time to learn RNA-Seq data analysis.
✔︎ Frustrated by the complexity of existing tools.
Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.
About the Author
BxINFO LLC
A research support company specializing in bioinformatics.
We provide tools and information to support life science research, with a focus on RNA-Seq analysis.