Erik Garrison [Fri, 18 Jun 2010 19:13:46 +0000 (15:13 -0400)]
integration of SetRegion into BamMultiReader
Also includes update to bamtools_count which uses the BamMultiReader by
default and no longer requires the specification of an index file on the
command line, as this would be very cumbersome to parse for multiple
input files. Added method to check for file existence using stat to
bamtools_utilities.cpp
Derek [Thu, 17 Jun 2010 21:35:27 +0000 (17:35 -0400)]
Modified Jump() scheme to take better account of specified region and drill down closer to region beginning. Introduced RegionState to BRP in order to allow LoadNextAlignment to quit once an alignment is found beyond region.
Derek [Thu, 17 Jun 2010 04:01:56 +0000 (00:01 -0400)]
Added concept of a fully specified region of interest to the BamReader API. Added BamRegion struct to BamAux.h. Added SetRegion() methods to BamReader. Reorganized or modified these existing BamReaderPrivate functions: BamReaderPrivate(), Close(), GetNextAlignment/Core(), IsOverlap(), Jump(), & Rewind(). Cleans up a lot of region-checking client code.
Erik Garrison [Thu, 10 Jun 2010 17:33:28 +0000 (13:33 -0400)]
change merger to use GetNextAlignmentCore
This provides a modest performance boost to the merger. A small change
to the BamAlignment copy constructor was required (to copy
BamAlignmentSupportData).
Derek [Thu, 10 Jun 2010 04:50:37 +0000 (00:50 -0400)]
Moved BamAlignmentSupportData into BamAlignment data type. This continues the read/write speedup mentioned in prior commits, but removes the need for clients to manage this additional auxilary data object. The 'BamAlignment lite' is accessed by calling BamReader::GetNextAlignmentCore() and written by BamWriter::SaveAlignment() which checks to see how much parsing & packing is needed before writing.
Erik Garrison [Wed, 9 Jun 2010 13:39:07 +0000 (09:39 -0400)]
fixed potential bug with previous commit
The previous commit made assumptions about the ordering of subtags
within @RG header lines. This commit only assumes that the read group
ID is specified by "ID", thus following spec.
Erik Garrison [Wed, 9 Jun 2010 13:31:01 +0000 (09:31 -0400)]
fixed bug with @RG handling
Prior to this commit files merged with bamtools merge would have one @RG
tag for each file. This is undesirable behavior. This commit fixes the
issue by tracking unique @RG tags in our unified header
(BamMultiReader::GetHeaderText) and prevents the MultiReader from
observing more than one @RG tag in the header. Future merges will have
the correct header.
Derek [Wed, 9 Jun 2010 03:29:45 +0000 (23:29 -0400)]
Added GetNextAlignmentCore() to BamReader API as well as a corresponding SaveAlignment() in BamWriter. Both utilitze the BamAlignmentSupportData structure which contains the raw character data and lengths, and which has been bumped to BamAux.h. Exposing these methods should allow for quicker read/writes for tools that are only concerned with alignment/positional data, not the actual sequences.
Erik Garrison [Tue, 8 Jun 2010 20:28:58 +0000 (16:28 -0400)]
BamMultiReader data structure rewrite
Rewrite to improve performance of the MultiReader on large sets of
files. Move tracking of readers, alignments, and positions from several
decoupled vectors into a single multimap, allowing rapid acquisition of
the lowest 'current' alignment among the set of open readers. Expect
some performance boost when running the MultReader on large numbers of
files, as prior to this rewrite each alignment required roughly 3 x N
ops (where N is the number of files) checking all these vectors.
Derek [Mon, 7 Jun 2010 20:18:10 +0000 (16:18 -0400)]
This fixes the out-of-range exception. Though there's still a discrepancy with some (but, maddeningly, not all) tags following a string tag. Will look into to more detail, but at least it shouldn't crash in the meantime
Derek [Wed, 2 Jun 2010 02:54:30 +0000 (22:54 -0400)]
Implemented Mosaik-style command line parser, instead of former GetOpt parser. Setup an AbstractTool base class for all subtools. Split tools into .h/.cpp pairs
Derek [Wed, 26 May 2010 20:05:15 +0000 (16:05 -0400)]
Reorganization of toolkit. Split subtools out to own headers. Added custom getopt functionality for subtools arguments. Provided or extended rough implementations for most subtools.
Erik Garrison [Fri, 21 May 2010 21:07:31 +0000 (17:07 -0400)]
Complete prior commit
In this commit, addition of verification that reference sequences
are identical among readers opened by the BamMultiReader. Without this
check the behavior of the MultiReader is undefined.
Erik Garrison [Fri, 21 May 2010 20:53:26 +0000 (16:53 -0400)]
bamtools executable
Merge a number of useful tools into a single executable.
Also in this commit, addition of verification that reference sequences
are identical among readers opened by the BamMultiReader. Without this
check the behavior of the MultiReader is undefined.
Moved BamReaderPrivate::CalculateAlignmentEnd() to BamAlignment::GetEndPosition() to expose it to the public API. Reorganized BamAux.h to look cleaner and facilitate quick lookup of available data and methods
barnett [Mon, 11 Jan 2010 15:11:15 +0000 (15:11 +0000)]
Fixed fread() related compiler warnings. Fixed std types [u]intX_t errors (used, but not defined in BamAux.h). Added Aaron's stdin/stdout read/write feature.