+++ /dev/null
-Author: Martin Asser Hansen - Copyright (C) - All rights reserved
-
-Contact: mail@maasha.dk
-
-Date: August 2007
-
-License: GNU General Public License version 2 (http://www.gnu.org/copyleft/gpl.html)
-
-Description: Create a weight matrix of the residue composition of an alignment in the stream.
-
-Usage: ... | $script [options]
-
-Options: [-p | --percent] - Output the result in percent - Default=absolute
-Options: [-I <file> | --stream_in=<file>] - Read input from stream file - Default=STDIN
-Options: [-O <file> | --stream_out=<file>] - Write output to stream file - Default=STDOUT
-
-Examples: ... | $script -p - Creates a weight matrix in percent.
-
-Keys out: V0, V1, V2, Vn - Weight for each position.
--- /dev/null
+=Biopiece: create_weight_matrix=
+
+==Synopsis==
+
+Create a residue composition weight matrix of an alignment in the stream.
+
+==Description==
+
+[create_weight_matrix] calculates the frequency of all residues per column in aligned
+sequences from the stream - either as exact residue counts or percentages.
+
+==Usage==
+
+{{{
+... | create_weight_matrix [options]
+}}}
+
+==Options==
+
+{{{
+[-p | --percent] - Output the result in percent - Default=absolute
+[-I <file> | --stream_in=<file>] - Read input from stream file - Default=STDIN
+[-O <file> | --stream_out=<file>] - Write output to stream file - Default=STDOUT
+}}}
+
+==Examples==
+
+Consider the following alignment in the file `aln.fna` in FASTA format:
+
+{{{
+>test5
+---TAACAGGCACT
+>test2
+-----GAATCGACT
+>test1
+--CTAGCTTCGACT
+>test3
+ACGAAACTAGCATC
+>test4
+----AGCATCGACT
+}}}
+
+To create a weight matrix from the above alignment, read it in with [read_fasta] and pipe the
+stream through [create_weight_matrix]:
+
+{{{
+read_fasta -i aln.fna | create_weight_matrix
+}}}
+
+The resulting five records will look the first one below, which is not really understandable:
+
+{{{
+V13: 0
+V11: 0
+V7: 0
+V4: 2
+V3: 3
+V9: 0
+V0: -
+V2: 4
+V8: 0
+V12: 0
+V5: 1
+V10: 0
+V1: 4
+V6: 0
+V14: 0
+---
+}}}
+
+To make sense pipe the result through [write_tab] like this:
+
+{{{
+read_fasta -i aln.fna | create_weight_matrix | write_tab -x
+
+- 4 4 3 2 1 0 0 0 0 0 0 0 0 0
+A 1 0 0 1 4 2 1 3 1 0 0 5 0 0
+C 0 1 1 0 0 0 4 0 0 3 2 0 4 1
+G 0 0 1 0 0 3 0 0 1 2 3 0 0 0
+T 0 0 0 2 0 0 0 2 3 0 0 0 1 4
+}}}
+
+The above weight matrix shows the frequencies of all residue types (1st column) found at
+all positions throughout the alignment.
+
+To obtain the percentwise frequencies use the `-p` switch to [create_weight_matrix]:
+
+{{{
+read_fasta -i aln.fna | create_weight_matrix -p | write_tab -x
+
+- 80 80 60 40 20 0 0 0 0 0 0 0 0 0
+A 20 0 0 20 80 40 20 60 20 0 0 100 0 0
+C 0 20 20 0 0 0 80 0 0 60 40 0 80 20
+G 0 0 20 0 0 60 0 0 20 40 60 0 0 0
+T 0 0 0 40 0 0 0 40 60 0 0 0 20 80
+}}}
+
+==See also==
+
+[read_fasta]
+
+[write_tab]
+
+==Author==
+
+Martin Asser Hansen - Copyright (C) - All rights reserved.
+
+mail@maasha.dk
+
+August 2007
+
+==License==
+
+GNU General Public License version 2
+
+http://www.gnu.org/copyleft/gpl.html
+
+==Help==
+
+[create_weight_matrix] is part of the Biopieces framework.
+
+http://code.google.com/p/biopieces/
+++ /dev/null
-Author: Martin Asser Hansen - Copyright (C) - All rights reserved
-
-Contact: mail@maasha.dk
-
-Date: December 2007
-
-License: GNU General Public License version 2 (http://www.gnu.org/copyleft/gpl.html)
-
-Description: Extract subsequence from genome sequence either explicitly or using BED/PSL/BLAST entries in stream.
-
-Usage: $script [options] -g <genome>
-Usage: ... | $script [options] -g <genome>
-
-Options: [-g <genome> | --genome=<genome>] - Genome to get subsequence from.
-Options: [-c <string> | --chr=<string>] - Chromosome with requested subsequence.
-Options: [-b <int> | --beg=<int>] - Begin position of subsequence (first residue=1).
-Options: [-e <int> | --end=<int>] - End position of subsequence.
-Options: [-l <int> | --len=<int>] - Length of subsequence.
-Options: [-f <int> | --flank=<int>] - Include flanking sequence.
-Options: [-m | --mask] - Softmask non-exonic sequence.
-Options: [-I <file> | --stream_in=<file>] - Read input from stream file - Default=STDIN
-Options: [-O <file> | --stream_out=<file>] - Write output to stream file - Default=STDOUT
-
-Examples: $script -g hg18 -c chr1 -b 1 -e 10 - Get the first 10 nucleotides of human genome chr1.
-Examples: $script -g hg18 -c chr1 -b 1 -l 10 - Get the first 10 nucleotides of human genome chr1.
-Examples: ... | $script -g mm8 -f 50 - Get subsequences including 50nt flanks of mouse BED/PSL/BLAST entries.
-
-Keys in: REC_TYPE - Optional record type (BED, PSL, or BLAST).
-Keys in: CHR - Chromosome (for use with BED record type).
-Keys in: CHR_BEG - Chromosome begin.
-Keys in: CHR_END - Chromosome end.
-Keys in: S_ID - Chromosome (for use with PSL and BLAST record type).
-Keys in: S_BEG - Chromosome begin (for use with PSL and BLAST record type).
-Keys in: S_END - Chromosome end (for use with PSL and BLAST record type).
-Keys in: STRAND - Sequence strand.
-
-Keys out: CHR - Chromosome.
-Keys out: CHR_BEG - Chromosome begin.
-Keys out: CHR_END - Chromosome end.
-Keys out: SEQ - Sequence.
-Keys out: SEQ_LEN - Sequence length.
--- /dev/null
+=Biopiece: get_genome_seq=
+
+==Synopsis==
+
+Extract subsequences from a genome sequence.
+
+==Description==
+
+[get_genome_seq] can be used to get subsequences from a specified genome that have been
+indexed with [index_genome_seq]. The subsequence can be obtained explicitly or from
+BED/PSL/BLAST entries in the stream.
+
+Use [list_genomes] to see available genome sequences.
+
+==Usage==
+
+{{{
+... | get_genome_seq [options]
+}}}
+
+==Options==
+
+{{{
+[-g <genome> | --genome=<genome>] - Genome to get subsequence from.
+[-c <string> | --chr=<string>] - Chromosome with requested subsequence.
+[-b <int> | --beg=<int>] - Begin position of subsequence (first residue=1).
+[-e <int> | --end=<int>] - End position of subsequence.
+[-l <int> | --len=<int>] - Length of subsequence.
+[-f <int> | --flank=<int>] - Include flanking sequence.
+[-m | --mask] - Softmask non-exonic sequence.
+[-I <file> | --stream_in=<file>] - Read input from stream file - Default=STDIN
+[-O <file> | --stream_out=<file>] - Write output to stream file - Default=STDOUT
+}}}
+
+==Examples==
+
+To get an explicit subsequence from the human genome (currently hg18) do:
+
+{{{
+get_genome_seq -g hg18 -c chrX -b 12000 -l 20
+
+CHR_END: 12018
+SEQ: GGTGCAGTAACACCTGCCGT
+CHR_BEG: 11999
+SEQ_LEN: 20
+CHR: chrX
+---
+}}}
+
+If you have a stream with BED, PSL or BLAST records, you obtain the subsequence
+simple by piping the stream through [get_genome_seq] and the sequence will be
+added to the record. Below is an example for a BED entry:
+
+{{{
+read_bed -i <BED file> -n 1 | get_genome_seq -g hg18
+
+STRAND: +
+CHR_END: 95127728
+Q_ID: gi|108087685|gb|DQ594132.1|_Homo_sapiens_piRNA_piR-60244,_complete_sequence
+CHR_BEG: 95127700
+SCORE: 1
+REC_TYPE: BED
+BED_LEN: 29
+CHR: chr15
+SEQ: TTCACTTCTCCCATGTAGTTCCTGAGTGC
+BED_COLS: 6
+---
+}}}
+
+Note that the sequence is based on the CHR_BEG and CHR_END positions. There is currently
+no switch that obtains subsequence based on record blocks.
+
+==See also==
+
+[index_genome_seq]
+
+[list_genomes]
+
+[read_bed]
+
+[extract_seq]
+
+==Author==
+
+Martin Asser Hansen - Copyright (C) - All rights reserved.
+
+mail@maasha.dk
+
+August 2007
+
+==License==
+
+GNU General Public License version 2
+
+http://www.gnu.org/copyleft/gpl.html
+
+==Help==
+
+[get_genome_seq] is part of the Biopieces framework.
+
+http://code.google.com/p/biopieces/
+++ /dev/null
-Author: Martin Asser Hansen - Copyright (C) - All rights reserved
-
-Contact: mail@maasha.dk
-
-Date: August 2007
-
-License: GNU General Public License version 2 (http://www.gnu.org/copyleft/gpl.html)
-
-Description: Inverts an alignment showing only non-mathing residues using the first sequence as reference.
-
-Usage: ... | $script [options]
-
-Options: [-s | --soft] - Use soft inversion instead of hard inversion.
-Options: [-I <file> | --stream_in=<file>] - Read input from stream file - Default=STDIN
-Options: [-O <file> | --stream_out=<file>] - Write output to stream file - Default=STDOUT
-
-Examples: ... | $script - Invert alignment in stream.
-Examples: ... | $script -s - Soft invert alignment in stream.
-
--- /dev/null
+=Biopiece: invert_align=
+
+==Synopsis==
+
+Inverts an alignment showing only non-mathing residues using the first sequence
+as reference.
+
+==Description==
+
+[invert_align] is useful to locate mismatches or other differences in an alignment
+between the reference sequence (the first sequence in the alignment) and the remaining
+sequences. Invertion can be 'hard' where matching residues are shown as `-` or 'soft'
+where matching residues are shown in lower case. In both cases, mismatches are shown as
+capital letters and gaps or missing sequence is shown as `_`.
+
+==Usage==
+
+{{{
+... | invert_align [options]
+}}}
+
+==Options==
+
+{{{
+[-s | --soft] - Use soft inversion instead of hard inversion.
+[-I <file> | --stream_in=<file>] - Read input from stream file - Default=STDIN
+[-O <file> | --stream_out=<file>] - Write output to stream file - Default=STDOUT
+}}}
+
+==Examples==
+
+Consider the alignment in the file `aln.fna` in FASTA format:
+
+{{{
+>test1
+CTAGC-TTCGACT
+>test2
+--AGC-TTCGA--
+>test3
+--AGCTTTCGA--
+>test4
+--AG--CTCGA--
+>test5
+--AG--TTCGAC-
+}}}
+
+Reading the alignment using [read_fasta] and writing it using [write_align] results in:
+
+{{{
+read_fasta -i aln.fna | write_align -x
+
+ .
+test1 CTAGC-TTCGACT
+test2 --AGC-TTCGA--
+test3 --AGCTTTCGA--
+test4 --AG--CTCGA--
+test5 --AG--TTCGAC-
+Consensus: 50% --AG--TTCGA--
+}}}
+
+However, if we insert an instance of [invert_align] it is clear where the sequence differences are:
+
+{{{
+read_fasta -i aln.fna | invert_align | write_align -x
+
+ .
+test1 CTAGC_TTCGACT
+test2 __---------__
+test3 __---T-----__
+test4 __--_-C----__
+test5 __--_-------_
+Consensus: 50% -------------
+}}}
+
+And if we instead of hard inverting the sequence uses the `-s` switch of [invert_align] to obtain soft
+inverted alignment, where the matching residues are in lower case letters instead of represented as `-`, we get:
+
+{{{
+read_fasta -i aln.fna | invert_align -s | write_align -x
+
+ .
+test1 CTAGC_TTCGACT
+test2 __agc_ttcga__
+test3 __agcTttcga__
+test4 __ag__Ctcga__
+test5 __ag__ttcgac_
+Consensus: 50% --AG--TTCGA--
+}}}
+
+==See also==
+
+[read_fasta]
+
+[write_align]
+
+==Author==
+
+Martin Asser Hansen - Copyright (C) - All rights reserved.
+
+mail@maasha.dk
+
+August 2007
+
+==License==
+
+GNU General Public License version 2
+
+http://www.gnu.org/copyleft/gpl.html
+
+==Help==
+
+[invert_align] is part of the Biopieces framework.
+
+http://code.google.com/p/biopieces/