In standard indexed BAM files with with sparce coverage (our test case was a roughly 1M read RNAseq BAM file), queries made to intervals may not have any of the candidate offesets present in the index as the BAM index only contains bins that have reads.
Without this bail out, we would get a crash. Returning false silently is the preferred behavior in our view as it allows our read logic to go to the next query and does not add noise to stderr.
Removed 'core mode' concept from BamMultiReader internals
* Now char data is only generated if needed by multi-merger
implementation or on-demand by client call to
BamMultiReader::GetNextAlignment()
Basic internal implementation of BamFile & BamPipe
* BgzfStream now working on IBamIODevice instead of FILE*
* BamReaderPrivate now queries stream's IsOpen() method instead of
touching member variable directly
* Empty implementations of BamHttp & BamFtp
* Added global BT_ASSERT_X macro for convenience
Bug discovered. The chunkStop was not being read from the correct offset (rather always being read as the first chunkStart value for the # alignment chunks in that bin of the index.
The result of this is that chunkStop will never be >= minOffset (or maybe rarely, since it always equals the first chunkStart for the first chunk) and thus the linear index doesn't really help in reducing the number of seeks performed.
derek [Tue, 28 Jun 2011 16:31:25 +0000 (12:31 -0400)]
Added unique-alignment checks for ResolveTool
* Unique-ness determined by comparing MapQuality to 0
* Only pairs with both mates unique are used for the 'makeStats' median
fragment size calculation.
Alec Chapman [Tue, 28 Jun 2011 01:58:30 +0000 (21:58 -0400)]
Fix Visual Studio compiler errors.
Don't use dynamic stack allocation (variable length arrays).
Rename bamtools target to bamtools_cmd to not conflict with BamTools target (they differ only in case).
bamtools_cmd only compiles if I remove bamtools_filter.cpp, which I haven't committed.
I also had to manually configure the include directory for zlib,
but that's probably due to having multiple copies floating around my machine.
derek [Thu, 23 Jun 2011 19:35:35 +0000 (15:35 -0400)]
Fixed -fPIC issue for CentOS users.
* Forced compiler flag that was not being automatically set by CMake on
that OS. Had previously set this on API library. Got feedback that it
worked there, so I added the flag to Utils & JsonCPP libs as well.
derek [Fri, 17 Jun 2011 04:09:49 +0000 (00:09 -0400)]
Removed pessimistic warnings when jumping to regions with no data, using
the standard index format (not actually an error case, so no need to
alarm users with scary messages)
derek [Fri, 17 Jun 2011 02:13:26 +0000 (22:13 -0400)]
Added re-calculation of BamAlignment's BinID during
BamWriter::SaveAlignment() in all cases
* Previously, the bin IDs of purely "core-only" alignments were simply
written directly out to output BAM. However, in cases where alignment
Position is changed (re-alignment), the original bin ID may no longer be
correct.
derek [Tue, 14 Jun 2011 17:41:56 +0000 (13:41 -0400)]
Implemented better coupling of unmapped reads with mates during sorting
(assuming assigned same coordinates)
* Used std::stable_sort instead of std::sort, to preserve order
* Add checks at buffer boundary to keep mates from being split into
different temp files. This makes the buffer boundary "softer", but in
practice, shouldn't differ much if at all.
derek [Sat, 11 Jun 2011 21:05:43 +0000 (17:05 -0400)]
Created 3 modes for ResolveTool: makeStats, markPairs, & twoPass
* "TwoPass" mode (the initial implementation of the tool) effectively
eliminates piped BAMs as an input option, since you can't exactly rewind
stdin and start reading from the beginning.
* To get around this, I separated the two passes into separate "modes"
(-makeStats & -markPairs), that communicate via a simple, human-readable
stats summary file. Data can then be merged, filtered, etc and piped
into each mode if you don't mind the runtime of preprocessing twice but
don't want to physically store the unresolved intermediate BAM file.
Brought API up to compliance with recent SAM Format Spec (v1.4-r962)
* Added support for new "binary array" tag type
* Added support for '=' and 'X' CIGAR ops
* Added support for multiple PG entries in header
* Added support for new RG fields
Major performance boost to startup & random-access - especially for the
use cases involving multiple (hundreds) of BAMs with BAI index files.
* This did require some changes to the BamIndex interface. I doubt man
y people are writing custom index format classes, but if you are one of
them and have any problems, feel free to contact me with questions.
derek [Fri, 24 Dec 2010 03:33:33 +0000 (22:33 -0500)]
Added SAM header-handling classes for read/write/validate.
* Not fully connected to the BamReader/Writer API yet, but will be
phased in soon.
* Will enable clients to query, modify & validate a BAM file's SAM
header data using the BamTools API, instead of having to use hand-rolled
string-parsing code on the result of BamReader::GetHeaderText().
derek [Fri, 24 Dec 2010 02:14:49 +0000 (21:14 -0500)]
Implemented proper -byname sorting (finally).
* BamMultiReader used to merge the "next" alignment based on (refID,
position). Extracted this and generalized to support merging on either
position OR alignment name.
derek [Wed, 15 Dec 2010 20:39:54 +0000 (15:39 -0500)]
Added creation of include/ folder in bamtools root directory at build time.
* API-related headers are copied here to provide an explicit target for client code.
derek [Mon, 13 Dec 2010 20:28:33 +0000 (15:28 -0500)]
Made BamAlignment flag queries symmetrical
* For example: there is now a SetIsMapped() setter to match the
IsMapped() getter. Before you had to reverse your logic on a few of the
flags (in this case, using SetIsUnmapped()). Not impossible to use, but
not immediately obvious and intuitive, and hard to remember when to use
the opposite setter. These older methods will remain available, but
should be considered deprecated.
derek [Mon, 6 Dec 2010 04:11:03 +0000 (23:11 -0500)]
Added new RevertTool to the toolkit
* "$ bamtools revert ... " will clear the IsDuplicate flag on
BamAlignments and replace the Qualities with the contents of the OQ tag.
* Suggested by and draft implementation contributed by Al Ward.
derek [Fri, 19 Nov 2010 17:41:54 +0000 (12:41 -0500)]
Extracted BamReaderPrivate & BamWriterPrivate from inner classes.
First step in breaking up the API's monolithic classes. Should allow easier maintenance, testing, and adding features as we go forward.
derek [Fri, 19 Nov 2010 15:32:45 +0000 (10:32 -0500)]
Migrated to CMake build system.
* Please see README: Installation for help in building BamTools toolkit & API and integrating the new shared library into your application
Derek [Thu, 21 Oct 2010 04:19:40 +0000 (00:19 -0400)]
Implemented index cache mode for both BAI & BTI formats
* Client code can now decide between 3 index cache modes:
Full : save entire index data in memory
Limited (default) : save only index data for current reference
None : save no index data - only load data necessary for a single-
* Required a major overhaul to BamIndex interface and derived classes.
Lots of refactoring to move common code up to BamIndex.
Derived classes now share much of the same method names &
organization. Only implementation details differ, as needed.
* Miscellaneous: moved BAMTOOLS_LFS definitions into BamAux.h & cleaned
up BGZF.h
Derek [Thu, 21 Oct 2010 04:19:40 +0000 (00:19 -0400)]
Implemented index cache mode for both BAI & BTI formats
* Client code can now decide between 3 index cache modes:
Full : save entire index data in memory
Limited (default) : save only index data for current reference
None : save no index data - only load data necessary for a single-
* Required a major overhaul to BamIndex interface and derived classes.
Lots of refactoring to move common code up to BamIndex.
Derived classes now share much of the same method names &
organization. Only implementation details differ, as needed.
* Miscellaneous: moved BAMTOOLS_LFS definitions into BamAux.h & cleaned
up BGZF.h
Derek [Sat, 9 Oct 2010 23:28:42 +0000 (19:28 -0400)]
Fixed: bug(s) related to empty references and regions.
* NOTE - This fix does introduce a slight modification to the *.bti index format.
So any existing BTI index files will need to be rebuilt to support the bug fix (apologies).