What are SAM and BAM Files?
Introduction
SAM and BAM files are formats used to represent the results of mapping reads (nucleotide sequences) generated by a sequencer to a reference sequence. They describe where each read was mapped and how it was mapped.
Reads are generally represented in FASTQ files, and reference sequences are represented in FASTA files. Using these as input, mapping software such as HISAT2, STAR, Bowtie2, or BWA produces SAM or BAM files. SAM files are text-based, while BAM files are binary files that contain equivalent information. Because BAM files are smaller in size, mapping results are usually stored in BAM format.
SAM / BAM format
Let us consider the following example of mapping results.
| Position | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | * | * | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Reference | A | G | C | A | T | G | T | T | A | G | A | T | A | A | * | * | G | A | T | A | G | C | T | G | T | G | C | T | A | G | T | A | G | G | C | A | G | T | C | A | G | C | G | C | C | A | T |
| +r001/1 | T | T | A | G | A | T | A | A | A | G | G | A | T | A | * | C | T | G | |||||||||||||||||||||||||||||
| +r002 | a | a | a | A | G | A | T | A | A | * | G | G | A | T | A | ||||||||||||||||||||||||||||||||
| +r003 | g | c | c | t | a | A | G | C | T | A | A | ||||||||||||||||||||||||||||||||||||
| +r004 | A | T | A | G | C | T | . | . | . | . | . | . | . | . | . | . | . | . | . | . | T | C | A | G | C | ||||||||||||||||||||||
| -r003 | t | t | a | g | c | t | T | A | G | G | C | ||||||||||||||||||||||||||||||||||||
| -r001/2 | C | A | G | C | G | G | C | A | T |
Bases written in lowercase indicate regions at the ends of reads that do not match the reference sequence. r001/1 and r001/2 are paired reads, r003 is a chimeric read, and r004 represents a split alignment.
The corresponding SAM file looks like the following.
Example of a SAM file
Lines starting with @ are header lines.
The following lines represent the mapping results. Each line consists of 11 required tab-separated columns, followed by optional additional columns. The contents of each column are as follows.
| Column name | Description | |
|---|---|---|
| Column 1 | QNAME | Read name |
| Column 2 | FLAG | Flags describing the mapping result |
| Column 3 | RNAME | Reference sequence name |
| Column 4 | POS | Mapping position |
| Column 5 | MAPQ | Mapping quality |
| Column 6 | CIGAR | String representation of the alignment |
| Column 7 | RNEXT | Name of the paired read |
| Column 8 | PNEXT | Mapping position of the paired read |
| Column 9 | TLEN | Insert length |
| Column 10 | SEQ | Nucleotide sequence |
| Column 11 | QUAL | Base quality scores |
For more detailed information about the FLAG and CIGAR fields, please refer to this document.
What is a Sorted BAM?
BAM files output directly from mapping software are usually ordered by the order in which reads were processed. A Sorted BAM file is a BAM file that has been reordered by reference coordinate. This sorting step is almost always required before proceeding to the next stage of analysis.
Whether a BAM file is sorted can be determined by checking the SO tag in the header. If the file is sorted, it will be labeled as SO:coordinate. In practice, filenames such as .sorted.bam are often used to make this clear.
RNA-Seq Data Analysis Software: Accelerate Your Publication
With our RNA-Seq data analysis software, you won't need to outsource or rely on collaborators. You can start analyzing the data yourself right away, without the need for high-spec computers or knowledge of Linux commands.

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.
About the Author
BxINFO LLC
A research support company specializing in bioinformatics.
We provide tools and information to support life science research, with a focus on RNA-Seq analysis.