--- /dev/null
+=Biopiece: rename_keys=
+
+==Synopsis==
+
+Rename keys of records in stream.
+
+==Description==
+
+Sometimes it is necessary to rename record keys to allow biopieces, who require
+specific record keys, to operate on the records. This could e.g. be BLAST records
+where there is both a subject ID (S_ID) and a subject sequence (S_SEQ) as well as
+a query ID (Q_ID) and and a query sequence (Q_SEQ). If you want to write either the
+query sequence or the subject sequence as FASTA output, you will have to rename the
+record keys accordingly.
+
+==Usage==
+
+{{{
+... | rename_keys [options]
+}}}
+
+==Options==
+
+{{{
+[-k <search,replace> | --keys=<search,replace>] - Keys to find and replace.
+[-I <file> | --stream_in=<file>] - Read input from stream file - Default=STDIN
+[-O <file> | --stream_out=<file>] - Write output to stream file - Default=STDOUT
+}}}
+
+==Examples==
+
+To rename all record key Q_ID to SEQ_NAME do:
+
+{{{
+... | rename_keys -k Q_ID,SEQ_NAME
+}}}
+
+If you need to rename more than one key, then pipe the stream though [rename_keys] twice:
+
+{{{
+... | rename_keys -k Q_ID,SEQ_NAME | rename_keys -k Q_SEQ,SEQ
+}}}
+
+==See also==
+
+[blast_seq]
+
+==Author==
+
+Martin Asser Hansen - Copyright (C) - All rights reserved.
+
+mail@maasha.dk
+
+August 2007
+
+==License==
+
+GNU General Public License version 2
+
+http://www.gnu.org/copyleft/gpl.html
+
+==Help==
+
+[rename_keys] is part of the Biopieces framework.
+
+http://code.google.com/p/biopieces/
--- /dev/null
+=Biopiece: tile_seq=
+
+==Synopsis==
+
+Using the first sequence in the stream as reference, tile all subsequent sequences
+based on pairwise alignments.
+
+==Description==
+
+[tile_seq] can create an alignment of several sequences based on pairwise alignments.
+This is useful for e.g. matching short sequences such as ESTs or deep sequencing reads
+against a reference sequence. [tile_seq] is more precise than a multiple alignment, where
+the introduction of indels in the reference sequence will most likely ruin the alignment.
+Also, [tile_seq] is capable of dealing with thousands of sequences.
+
+[tile_seq] currently uses Muscle as alignment engine, and Muscle must be installed in
+order for [tile_seq] to work.
+
+For more about Muscle:
+
+http://www.drive5.com/muscle/
+
+==Usage==
+
+{{{
+... | tile_seq [options]
+}}}
+
+==Options==
+
+{{{
+[-i <int> | --identity=<int>] - Minimum identity (%) for pairwise alignment - Default=70
+[-s | --supress_indels] - Supress insertions in query sequence.
+[-I <file> | --stream_in=<file>] - Read input from stream file - Default=STDIN
+[-O <file> | --stream_out=<file>] - Write output to stream file - Default=STDOUT
+}}}
+
+==Examples==
+
+Consider the following file `test.fna` containing these FASTA entries:
+
+{{{
+>ref
+ACGACTAGCATCGACTGACA
+>test1
+CTAGCTTCGACT
+>test2
+GAATCGACT
+>test3
+ACGAAACTAGCATC
+>test4
+AGCATCGACT
+>test5TAACAGGCACT
+}}}
+
+In order to tile the test1, test2 ... test5 sequences against the reference sequence,
+first read in the sequence using [read_fasta] and then pipe through [tile_seq]:
+
+{{{
+read_fasta -i test.fna | tile_seq
+
+SEQ: ACGACTAGCATCGACTGACA
+SEQ_NAME: ref
+---
+SEQ: ACGAAACTAGCATC------
+SEQ_NAME: test3_+_85.71
+---
+SEQ: ----CTAGCTTCGACT----
+SEQ_NAME: test1_+_91.67
+---
+SEQ: ------AGCATCGACT----
+SEQ_NAME: test4_+_100.00
+---
+SEQ: -------GAATCGACT----
+SEQ_NAME: test2_+_88.89
+---
+}}}
+
+The resulting tiled sequences show the reference sequence as the first sequence, and then
+the subsequence sequences sorted alphabetically by the sequence itself, thus giving the
+tiled output. To pieces of information is added to the SEQ_NAME key, namely the orientation
+of the pairwise alignment that gave the highest similarity, and a global identity score that
+is calculated as the number of matches over the length of the shortest sequence in the pairwise
+alignment. Use the `-i` switch to change the identity cutoff for the inclusion of alignments:
+
+{{{
+read_fasta -i test.fna | tile_seq -i 60
+
+SEQ: ACGACTAGCATCGACTGACA
+SEQ_NAME: ref
+---
+SEQ: ACGAAACTAGCATC------
+SEQ_NAME: test3_+_85.71
+---
+SEQ: ----CTAGCTTCGACT----
+SEQ_NAME: test1_+_91.67
+---
+SEQ: -----TAACAGGCACT----
+SEQ_NAME: test5_+_63.64
+---
+SEQ: ------AGCATCGACT----
+SEQ_NAME: test4_+_100.00
+---
+SEQ: -------GAATCGACT----
+SEQ_NAME: test2_+_88.89
+---
+}}}
+
+Now test5 is part of the alignment, and the tiled sequences can be written using [write_align]:
+
+{{{
+read_fasta -i test.fna | tile_seq -i 60 | write_align -x
+
+ . .
+ref ACGACTAGCATCGACTGACA
+test3_+_85.71 ACGAAACTAGCATC------
+test1_+_91.67 ----CTAGCTTCGACT----
+test5_+_63.64 -----TAACAGGCACT----
+test4_+_100.00 ------AGCATCGACT----
+test2_+_88.89 -------GAATCGACT----
+Consensus: 50% -------------ACT----
+}}}
+
+To better illustrate mismatches in the alignment use [invert_align]:
+
+{{{
+read_fasta -i test.fna | tile_seq -i 60 | invert_align | write_align -x
+
+ . .
+ref ACGACTAGCATCGACTGACA
+test3_+_85.71 ----AACTAGCATC______
+test1_+_91.67 ____-----T------____
+test5_+_63.64 _____--A--GGC---____
+test4_+_100.00 ______----------____
+test2_+_88.89 _______-A-------____
+Consensus: 50% --------------------
+}}}
+
+Now we clearly see that an insertion in test3 offsets the alignment. This can behaviour can be
+suppressed using the `-s` switch to [tile_seq]:
+
+{{{
+read_fasta -i test.fna | tile_seq -i 60 -s | invert_align | write_align -x
+
+ . .
+ref ACGACTAGCATCGACTGACA
+test3_+_100.00 ------------________
+test1_+_91.67 ____-----T------____
+test5_+_63.64 _____--A--GGC---____
+test4_+_100.00 ______----------____
+test2_+_88.89 _______-A-------____
+Consensus: 50% --------------------
+}}}
+
+Note that the identity score of test3 changes dramatically with the use of the `-s` switch.
+
+==See also==
+
+[read_fasta]
+
+[invert_align]
+
+[write_align]
+
+[align_seq]
+
+[write_fasta]
+
+==Author==
+
+Martin Asser Hansen - Copyright (C) - All rights reserved.
+
+mail@maasha.dk
+
+August 2007
+
+==License==
+
+GNU General Public License version 2
+
+http://www.gnu.org/copyleft/gpl.html
+
+==Help==
+
+[tile_seq] is part of the Biopieces framework.
+
+http://code.google.com/p/biopieces/