>
What are SAM and BAM Files?

What are SAM and BAM Files?

Last updated: January 28, 2026

Introduction

SAM and BAM files are formats used to represent the results of mapping reads (nucleotide sequences) generated by a sequencer to a reference sequence. They describe where each read was mapped and how it was mapped.

Reads are generally represented in FASTQ files, and reference sequences are represented in FASTA files. Using these as input, mapping software such as HISAT2, STAR, Bowtie2, or BWA produces SAM or BAM files. SAM files are text-based, while BAM files are binary files that contain equivalent information. Because BAM files are smaller in size, mapping results are usually stored in BAM format.

SAM / BAM format

Let us consider the following example of mapping results.

Position1234567891011121314**15161718192021222324252627282930313233343536373839404142434445
ReferenceAGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT
+r001/1TTAGATAAAGGATA*CTG
+r002aaaAGATAA*GGATA
+r003gcctaAGCTAA
+r004ATAGCT..............TCAGC
-r003ttagctTAGGC
-r001/2CAGCGGCAT

Bases written in lowercase indicate regions at the ends of reads that do not match the reference sequence. r001/1 and r001/2 are paired reads, r003 is a chimeric read, and r004 represents a split alignment.

The corresponding SAM file looks like the following.

Example of a SAM file

@HD VN:1.6 SO:coordinate @SQ SN:ref LN:45 r001 99 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG * r002 0 ref 9 30 3S6M1P1I4M * 0 0 AAAAGATAAGGATA * r003 0 ref 9 30 5S6M * 0 0 GCCTAAGCTAA * SA:Z:ref,29,-,6H5M,17,0; r004 0 ref 16 30 6M14N5M * 0 0 ATAGCTTCAGC * r003 2064 ref 29 17 6H5M * 0 0 TAGGC * SA:Z:ref,9,+,5S6M,30,1; r001 147 ref 37 30 9M = 7 -39 CAGCGGCAT * NM:i:1

Lines starting with @ are header lines.

The following lines represent the mapping results. Each line consists of 11 required tab-separated columns, followed by optional additional columns. The contents of each column are as follows.

Column nameDescription
Column 1QNAMERead name
Column 2FLAGFlags describing the mapping result
Column 3RNAMEReference sequence name
Column 4POSMapping position
Column 5MAPQMapping quality
Column 6CIGARString representation of the alignment
Column 7RNEXTName of the paired read
Column 8PNEXTMapping position of the paired read
Column 9TLENInsert length
Column 10SEQNucleotide sequence
Column 11QUALBase quality scores

For more detailed information about the FLAG and CIGAR fields, please refer to this document.

What is a Sorted BAM?

BAM files output directly from mapping software are usually ordered by the order in which reads were processed. A Sorted BAM file is a BAM file that has been reordered by reference coordinate. This sorting step is almost always required before proceeding to the next stage of analysis.

Whether a BAM file is sorted can be determined by checking the SO tag in the header. If the file is sorted, it will be labeled as SO:coordinate. In practice, filenames such as .sorted.bam are often used to make this clear.

RNA-Seq Data Analysis Software: Accelerate Your Publication

With our RNA-Seq data analysis software, you won't need to outsource or rely on collaborators. You can start analyzing the data yourself right away, without the need for high-spec computers or knowledge of Linux commands.

overview

Users can perform gene expression quantification, identification of differentially expressed genes, gene ontology(GO) analysis, pathway analysis, as well as drawing volcano plots, MA plots, and heatmaps.

BxINFO LLC logo

BxINFO LLC

A research support company specializing in bioinformatics.

We provide tools and information to support life science research, with a focus on RNA-Seq analysis.

→ Learn more