X-Git-Url: https://git.donarmstrong.com/?a=blobdiff_plain;f=README;h=498f4be06dc0ecc00162c43cdaa65841cea5dff8;hb=f7d86e0fa7161081f69c5c178ee0141bea599f71;hp=6e1a3ab700e8f2054370edfe545836249d7780cf;hpb=89028c85b3db7b4eb55c40fabfcb9c55a0d168d9;p=bamtools.git diff --git a/README b/README index 6e1a3ab..498f4be 100644 --- a/README +++ b/README @@ -2,387 +2,43 @@ README : BAMTOOLS -------------------------------------------------------------------------------- -BamTools: a C++ API & toolkit for reading/writing/manipulating BAM files. - -I. Introduction - a. The API - b. The Toolkit - -II. Usage - a. The API - b. The Toolkit - -III. License - -IV. Acknowledgements - -V. Contact - --------------------------------------------------------------------------------- -I. Introduction: --------------------------------------------------------------------------------- - BamTools provides both a programmer's API and an end-user's toolkit for handling -BAM files. - ----------------------------------------- -Ia. The API: ----------------------------------------- - -The API consists of 2 main modules: BamReader and BamWriter. As you would -expect, BamReader provides read-access to BAM files, while BamWriter handles -writing data to BAM files. BamReader provides the interface for random-access -(jumping) in a BAM file, as well as generating BAM index files. - -BamMultiReader is an extra module that allows you to manage multiple open BAM -files for reading. It provides some validation & bookkeeping under the hood to -keep all files sync'ed up for you. - -Additional files used by the API: +BAM files. - - BamAlignment.* : implements the BamAlignment data structure +I. Learn More - - BamAux.h : contains various constants, data structures and utility - methods used throught the API. +II. License - - BamIndex.* : implements both the standard BAM format index (".bai") as - well as a new BamTools-specific index (".bti"). +III. Acknowledgements - - BGZF.* : contains our implementation of the Broad Institute's BGZF - compression format. - ----------------------------------------- -Ib. The Toolkit: ----------------------------------------- - -If you've been using the BamTools since the early days, you'll notice that our -'toy' API examples (BamConversion, BamDump, BamTrim,...) are now gone. We have -dumped these in favor of a suite of small utilities that we hope both -developers and end-users find useful: - -usage: bamtools [--help] COMMAND [ARGS] - -Available bamtools commands: - - convert Converts between BAM and a number of other formats - count Prints number of alignments in BAM file(s) - coverage Prints coverage statistics from the input BAM file - filter Filters BAM file(s) by user-specified criteria - header Prints BAM header information - index Generates index for BAM file - merge Merge multiple BAM files into single file - random Select random alignments from existing BAM file(s) - sort Sorts the BAM file according to some criteria - split Splits a BAM file on user-specifed property, creating a - new BAM output file for each value found - stats Prints some basic statistics from input BAM file(s) - -See 'bamtools help COMMAND' for more information on a specific command. +IV. Contact -------------------------------------------------------------------------------- -II. Usage : +I. Learn More: -------------------------------------------------------------------------------- -** General usage information - perhaps explain common terms, point to SAM/BAM -spec, etc ** - ----------------------------------------- -IIa. The API ----------------------------------------- - -The API, as noted above, contains 2 main modules - BamReader & BamWriter - for -dealing with BAM files. Alignment data is made available through the -BamAlignment data structure. - -A simple (read-only) scenario for accessing BAM data would look like the -following: - - // open our BamReader - BamReader reader; - reader.Open("someData.bam", "someData.bam.bai"); - - // define our region of interest - // in this example: bases 0-500 on the reference "chrX" - int id = reader.GetReferenceID("chrX"); - BamRegion region(id, 0, id, 500); - reader.SetRegion(region); - - // iterate through alignments in this region, - // ignoring alignments with a MQ below some cutoff - BamAlignment al; - while ( reader.GetNextAlignment(al) ) { - if ( al.MapQuality >= 50 ) - // do something - } - - // close the reader - reader.Close(); - -To use this API in your application, you simply need to do 3 things: - - 1 - Drop the BamTools API files somewhere the compiler can find them. - - 2 - Import BamTools API with the following lines of code - #include "BamReader.h" // (or "BamMultiReader.h") as needed - #include "BamWriter.h" // as needed - using namespace BamTools; // all of BamTools classes/methods live in - // this namespace - - 3 - Link with '-lz' ('l' as in Lima) to access ZLIB compression library - (For MSVC users, I can provide you modified zlib headers - just contact - me if needed). - -See any included programs and Makefile for more specific compiling/usage -examples. See comments in the header files for more detailed API documentation. - ----------------------------------------- -IIb. The Toolkit ----------------------------------------- - -BamTools provides a small, but powerful suite of command-line utility programs -for manipulating and querying BAM files for data. - --------------------- -Input/Output --------------------- - -All BamTools utilities handle I/O operations using a common set of arguments. -These include: - - -in - -The input BAM files(s). - - If a tool accepts multiple BAM files as input, each file gets its own "-in" - option on the command line. If no "-in" is provided, the tool will attempt - to read BAM data from stdin. - - To read a single BAM file, use a single "-in" option: - > bamtools *tool* -in myData1.bam ...ARGS... - - To read multiple BAM files, use multiple "-in" options: - > bamtools *tool* -in myData1.bam -in myData2.bam ...ARGS... - - To read from stdin (if supported), omit the "-in" option: - > bamtools *tool* ...ARGS... - - -out - -The output BAM file. - - If a tool outputs a result BAM file, specify the filename using this option. - If none is provided, the tool will typically write to stdout. - - *Note: Not all tools output BAM data (e.g. count, header, etc.) - - -region - -A region of interest. See below for accepted 'REGION string' formats. - - Many of the tools accept this option, which allows a user to only consider - alignments that overlap this region (whether counting, filtering, merging, - etc.). - - An alignment is considered to overlap a region if any part of the alignments - intersects the left/right boundaries. Thus, a 50bp alignment at position 70 - will overlap a region beginning at position 100. - - REGION string format - ---------------------- - A proper REGION string can be formatted like any of the following examples: - where 'chr1' is the name of a reference (not its ID)and '' is any valid - integer position within that reference. +Installation steps, tutorial, API documentation, etc. are all now available +through the BamTools project wiki: - To read - chr1 - only alignments on (entire) reference 'chr1' - chr1:500 - only alignments overlapping the region starting at - chr1:500 and continuing to the end of chr1 - chr1:500..1000 - only alignments overlapping the region starting at - chr1:500 and continuing to chr1:1000 - chr1:500..chr3:750 - only alignments overlapping the region starting at - chr1:500 and continuing to chr3:750. This 'spanning' - region assumes that the reference specified as the - right boundary will occur somewhere in the file after - the left boundary. On a sorted BAM, a REGION of - 'chr4:500..chr2:1500' will produce undefined - (incorrect) results. So don't do it. :) +https://github.com/pezmaster31/bamtools/wiki - *Note: Most of the tools that accept a REGION string will perform without an - index file, but typically at great cost to performance (having to - plow through the entire file until the region of interest is found). - For optimum speed, be sure that index files are available for your - data. - - -forceCompression - -Force compression of BAM output. - - When tools are piped together (see details below), the default behavior is - to turn off compression. This can greatly increase performance when the data - does not have to be constantly decompressed and recompressed. This is - ignored any time an output BAM file is specified using "-out". - --------------------- -Piping --------------------- - -Many of the tools in BamTools can be chained together by piping. Any tool that -accepts stdin can be piped into, and any that can output stdout can be piped -from. For example: - -> bamtools filter -in data1.bam -in data2.bam -mapQuality ">50" | bamtools count - -will give a count of all alignments in your 2 BAM files with a mapQuality of -greater than 50. And of course, any tool writing to stdout can be piped into -other utilities. - --------------------- -The Tools --------------------- - - convert Converts between BAM and a number of other formats - count Prints number of alignments in BAM file(s) - coverage Prints coverage statistics from the input BAM file - filter Filters BAM file(s) by user-specified criteria - header Prints BAM header information - index Generates index for BAM file - merge Merge multiple BAM files into single file - random Select random alignments from existing BAM file(s) - sort Sorts the BAM file according to some criteria - split Splits a BAM file on user-specifed property, creating a new - BAM output file for each value found - stats Prints some basic statistics from input BAM file(s) - ----------- -convert ----------- - -Description: converts BAM to a number of other formats - -Usage: bamtools convert -format [-in -in ...] - [-out ] [other options] - -Input & Output: - -in the input BAM file(s) [stdin] - -out the output BAM file [stdout] - -format the output file format - see below for - supported formats - -Filters: - -region genomic region. Index file is recommended for - better performance, and is read - automatically if it exists. See 'bamtools - help index' for more details on creating - one. - -Pileup Options: - -fasta FASTA reference file - -mapqual print the mapping qualities - -SAM Options: - -noheader omit the SAM header from output - -Help: - --help, -h shows this help text - -** Notes ** - - - Currently supported output formats ( BAM -> X ) - - Format type FORMAT (command-line argument) - ------------ ------------------------------- - BED bed - FASTA fasta - FASTQ fastq - JSON json - Pileup pileup - SAM sam - YAML yaml - - Usage example: - > bamtools convert -format json -in myData.bam -out myData.json - - - Pileup Options have no effect on formats other than "pileup" - SAM Options have no effect on formats other than "sam" - ----------- -count ----------- - -Description: prints number of alignments in BAM file(s). - -Usage: bamtools count [-in -in ...] [-region ] - -Input & Output: - -in the input BAM file(s) [stdin] - -Filters: - -region genomic region. Index file is required and - is read automatically if it exists. See - 'bamtools help index' for more details - on creating one. - -Help: - --help, -h shows this help text - - ----------- -coverage ----------- - - ----------- -filter ----------- - - ----------- -header ----------- - - ----------- -index ----------- - - ----------- -merge ----------- - - ----------- -random ----------- - - ----------- -sort ----------- - - ----------- -split ----------- - - ----------- -stats ----------- +Join the mailing list(s) to stay informed of updates or get involved with +contributing: +https://github.com/pezmaster31/bamtools/wiki/Mailing-lists -------------------------------------------------------------------------------- -III. License : +II. License : -------------------------------------------------------------------------------- Both the BamTools API and toolkit are released under the MIT License. -Copyright (c) 2009-2010 Derek Barnett, Erik Garrison, Gabor Marth, +Copyright (c) 2009-2010 Derek Barnett, Erik Garrison, Gabor Marth, Michael Stromberg See included file LICENSE for details. -------------------------------------------------------------------------------- -IV. Acknowledgements : +III. Acknowledgements : -------------------------------------------------------------------------------- * Aaron Quinlan for several key feature ideas and bug fix contributions @@ -390,17 +46,15 @@ IV. Acknowledgements : * Heng Li, author of SAMtools - the original C-language BAM API/toolkit. -------------------------------------------------------------------------------- -V. Contact : +IV. Contact : -------------------------------------------------------------------------------- -Feel free to contact me with any questions, comments, suggestions, bug reports, +Feel free to contact me with any questions, comments, suggestions, bug reports, etc. - + Derek Barnett Marth Lab Biology Dept., Boston College -Email: barnetde@bc.edu -Project Websites: http://github.com/pezmaster31/bamtools (ACTIVE SUPPORT) - http://sourceforge.net/projects/bamtools (major updates only) - +Email: derekwbarnett@gmail.com +Project Website: http://github.com/pezmaster31/bamtools