-.TH samtools 1 "21 May 2009" "samtools-0.1.4" "Bioinformatics tools"
+.TH samtools 1 "10 November 2009" "samtools-0.1.7" "Bioinformatics tools"
.SH NAME
.PP
samtools - Utilities for the Sequence Alignment/Map (SAM) format
.SH DESCRIPTION
.PP
Samtools is a set of utilities that manipulate alignments in the BAM
-format. It imports from and exports to the SAM (Sequence
-Alignment/Map) format, does sorting, merging and indexing, and
-allows to retrieve reads in any regions swiftly.
+format. It imports from and exports to the SAM (Sequence Alignment/Map)
+format, does sorting, merging and indexing, and allows to retrieve reads
+in any regions swiftly.
+
+Samtools is designed to work on a stream. It regards an input file `-'
+as the standard input (stdin) and an output file `-' as the standard
+output (stdout). Several commands can thus be combined with Unix
+pipes. Samtools always output warning and error messages to the standard
+error output (stderr).
+
+Samtools is also able to open a BAM (not SAM) file on a remote FTP or
+HTTP server if the BAM file name starts with `ftp://' or `http://'.
+Samtools checks the current working directory for the index file and
+will download the index upon absence. Samtools does not retrieve the
+entire alignment file unless it is asked to do so.
.SH COMMANDS AND OPTIONS
+
.TP 10
.B import
samtools import <in.ref_list> <in.sam> <out.bam>
.TP
.B merge
-samtools merge [-n] <out.bam> <in1.bam> <in2.bam> [...]
-
-Merge multiple sorted alignments. The header of
-.I <in1.bam>
+samtools merge [-h inh.sam] [-n] <out.bam> <in1.bam> <in2.bam> [...]
+
+Merge multiple sorted alignments.
+The header reference lists of all the input BAM files, and the @SQ headers of
+.IR inh.sam ,
+if any, must all refer to the same set of reference sequences.
+The header reference list and (unless overridden by
+.BR -h )
+`@' headers of
+.I in1.bam
will be copied to
-.I <out.bam>
+.IR out.bam ,
and the headers of other files will be ignored.
.B OPTIONS:
.RS
.TP 8
+.B -h FILE
+Use the lines of
+.I FILE
+as `@' headers to be copied to
+.IR out.bam ,
+replacing any header lines that would otherwise be copied from
+.IR in1.bam .
+.RI ( FILE
+is actually in SAM format, though any alignment records it may contain
+are ignored.)
+.TP
.B -n
The input alignments are sorted by read names rather than by chromosomal
coordinates
.TP
.B view
-samtools view [-bhHS] [-t in.refList] [-o output] [-f reqFlag] [-F
-skipFlag] [-q minMapQ] <in.bam> [region1 [...]]
+samtools view [-bhuHS] [-t in.refList] [-o output] [-f reqFlag] [-F
+skipFlag] [-q minMapQ] [-l library] [-r readGroup] <in.bam>|<in.sam> [region1 [...]]
Extract/print all or sub alignments in SAM or BAM format. If no region
is specified, all the alignments will be printed; otherwise only
-alignments overlapping with the specified regions will be output. An
+alignments overlapping the specified regions will be output. An
alignment may be given multiple times if it is overlapping several
regions. A region can be presented, for example, in the following
-format: `chr2', `chr2:1000000' or `chr2:1,000,000-2,000,000'.
+format: `chr2' (the whole chr2), `chr2:1000000' (region starting from
+1,000,000bp) or `chr2:1,000,000-2,000,000' (region between 1,000,000 and
+2,000,000bp including the end points). The coordinate is 1-based.
.B OPTIONS:
.RS
.B -b
Output in the BAM format.
.TP
+.B -u
+Output uncompressed BAM. This option saves time spent on
+compression/decomprssion and is thus preferred when the output is piped
+to another samtools command.
+.TP
.B -h
Include the header in the output.
.TP
.TP
.B -q INT
Skip alignments with MAPQ smaller than INT [0]
+.TP
+.B -l STR
+Only output reads in library STR [null]
+.TP
+.B -r STR
+Only output reads in read group STR [null]
.RE
.TP
.TP
.B pileup
samtools pileup [-f in.ref.fasta] [-t in.ref_list] [-l in.site_list]
-[-iscg] [-T theta] [-N nHap] [-r pairDiffRate] <in.alignment>
+[-iscgS2] [-T theta] [-N nHap] [-r pairDiffRate] <in.bam>|<in.sam>
Print the alignment in the pileup format. In the pileup format, each
line represents a genomic position, consisting of chromosome name,
If option
.B -c
-is applied, the consensus base, consensus quality, SNP quality and RMS
-mapping quality of the reads covering the site will be inserted between
-the `reference base' and the `read bases' columns. An indel occupies an
-additional line. Each indel line consists of chromosome name,
-coordinate, a star, the genotype, consensus quality, SNP quality, RMS
-mapping quality, # covering reads, the first alllele, the second allele,
-# reads supporting the first allele, # reads supporting the second
-allele and # reads containing indels different from the top two alleles.
+is applied, the consensus base, Phred-scaled consensus quality, SNP
+quality (i.e. the Phred-scaled probability of the consensus being
+identical to the reference) and root mean square (RMS) mapping quality
+of the reads covering the site will be inserted between the `reference
+base' and the `read bases' columns. An indel occupies an additional
+line. Each indel line consists of chromosome name, coordinate, a star,
+the genotype, consensus quality, SNP quality, RMS mapping quality, #
+covering reads, the first alllele, the second allele, # reads supporting
+the first allele, # reads supporting the second allele and # reads
+containing indels different from the top two alleles.
.B OPTIONS:
.RS
Print the mapping quality as the last column. This option makes the
output easier to parse, although this format is not space efficient.
+.TP
+.B -S
+The input file is in SAM.
+
.TP
.B -i
Only output pileup lines containing indels.
will be created if
absent.
+.TP
+.B -M INT
+Cap mapping quality at INT [60]
+
.TP
.B -t FILE
List of reference names ane sequence lengths, in the format described
.B -c
Call the consensus sequence using MAQ consensus model. Options
.B -T,
-.B -N
+.B -N,
+.B -I
and
.B -r
are only effective when
.B -c
+or
+.B -g
is in use.
.TP
.B -r FLOAT
Expected fraction of differences between a pair of haplotypes [0.001]
+.TP
+.B -I INT
+Phred probability of an indel in sequencing/prep. [40]
+
.RE
.TP
Text alignment viewer (based on the ncurses library). In the viewer,
press `?' for help and press `g' to check the alignment start from a
-region in the format like `chr10:10,000,000'. Note that if the region
-showed on the screen contains no mapped reads, a blank screen will be
-seen. This is a known issue and will be improved later.
-
-.RE
+region in the format like `chr10:10,000,000'.
.TP
.B fixmate
.B ONLY
works with FR orientation and requires ISIZE is correctly set.
-.RE
-
.TP
.B rmdupse
samtools rmdupse <input.srt.bam> <out.bam>
Remove potential duplicates for single-ended reads. This command will
treat all reads as single-ended even if they are paired in fact.
-.RE
-
.TP
.B fillmd
samtools fillmd [-e] <aln.bam> <ref.fasta>
.RE
-
-.SH SAM FORFAM
+.SH SAM FORMAT
SAM is TAB-delimited. Apart from the header lines, which are started
with the `@' symbol, each alignment line consists of:
Unaligned words used in bam_import.c, bam_endian.h, bam.c and bam_aux.c.
.IP o 2
CIGAR operation P is not properly handled at the moment.
+.IP o 2
+In merging, the input files are required to have the same number of
+reference sequences. The requirement can be relaxed. In addition,
+merging does not reconstruct the header dictionaries
+automatically. Endusers have to provide the correct header. Picard is
+better at merging.
+.IP o 2
+Samtools' rmdup does not work for single-end data and does not remove
+duplicates across chromosomes. Picard is better.
.SH AUTHOR
.PP
.SH SEE ALSO
.PP
-Samtools website: http://samtools.sourceforge.net
+Samtools website: <http://samtools.sourceforge.net>