]> git.donarmstrong.com Git - biopieces.git/commitdiff
more wikis
authormartinahansen <martinahansen@74ccb610-7750-0410-82ae-013aeee3265d>
Tue, 1 Jul 2008 04:48:23 +0000 (04:48 +0000)
committermartinahansen <martinahansen@74ccb610-7750-0410-82ae-013aeee3265d>
Tue, 1 Jul 2008 04:48:23 +0000 (04:48 +0000)
git-svn-id: http://biopieces.googlecode.com/svn/trunk@91 74ccb610-7750-0410-82ae-013aeee3265d

bp_usage/create_weight_matrix [deleted file]
bp_usage/create_weight_matrix.wiki [new file with mode: 0644]
bp_usage/get_genome_seq [deleted file]
bp_usage/get_genome_seq.wiki [new file with mode: 0644]
bp_usage/invert_align [deleted file]
bp_usage/invert_align.wiki [new file with mode: 0644]

diff --git a/bp_usage/create_weight_matrix b/bp_usage/create_weight_matrix
deleted file mode 100644 (file)
index f1c56ee..0000000
+++ /dev/null
@@ -1,19 +0,0 @@
-Author:         Martin Asser Hansen - Copyright (C) - All rights reserved
-
-Contact:        mail@maasha.dk
-
-Date:           August 2007
-
-License:        GNU General Public License version 2 (http://www.gnu.org/copyleft/gpl.html)
-
-Description:    Create a weight matrix of the residue composition of an alignment in the stream.
-
-Usage:          ... | $script [options]
-
-Options:   [-p        | --percent]            -  Output the result in percent  -  Default=absolute
-Options:   [-I <file> | --stream_in=<file>]   -  Read input from stream file   -  Default=STDIN
-Options:   [-O <file> | --stream_out=<file>]  -  Write output to stream file   -  Default=STDOUT
-
-Examples:  ... | $script -p  -  Creates a weight matrix in percent.
-
-Keys out:   V0, V1, V2, Vn  -  Weight for each position.
diff --git a/bp_usage/create_weight_matrix.wiki b/bp_usage/create_weight_matrix.wiki
new file mode 100644 (file)
index 0000000..6a0a244
--- /dev/null
@@ -0,0 +1,122 @@
+=Biopiece: create_weight_matrix=
+
+==Synopsis==
+
+Create a residue composition weight matrix of an alignment in the stream.
+
+==Description==
+
+[create_weight_matrix] calculates the frequency of all residues per column in aligned
+sequences from the stream - either as exact residue counts or percentages.
+
+==Usage==
+
+{{{
+... | create_weight_matrix [options]
+}}}
+
+==Options==
+
+{{{
+[-p        | --percent]            -  Output the result in percent  -  Default=absolute
+[-I <file> | --stream_in=<file>]   -  Read input from stream file   -  Default=STDIN
+[-O <file> | --stream_out=<file>]  -  Write output to stream file   -  Default=STDOUT
+}}}
+
+==Examples==
+
+Consider the following alignment in the file `aln.fna` in FASTA format:
+
+{{{
+>test5
+---TAACAGGCACT
+>test2
+-----GAATCGACT
+>test1
+--CTAGCTTCGACT
+>test3
+ACGAAACTAGCATC
+>test4
+----AGCATCGACT
+}}}
+
+To create a weight matrix from the above alignment, read it in with [read_fasta] and pipe the
+stream through [create_weight_matrix]:
+
+{{{
+read_fasta -i aln.fna | create_weight_matrix
+}}}
+
+The resulting five records will look the first one below, which is not really understandable:
+
+{{{
+V13: 0
+V11: 0
+V7: 0
+V4: 2
+V3: 3
+V9: 0
+V0: -
+V2: 4
+V8: 0
+V12: 0
+V5: 1
+V10: 0
+V1: 4
+V6: 0
+V14: 0
+---
+}}}
+
+To make sense pipe the result through [write_tab] like this:
+
+{{{
+read_fasta -i aln.fna | create_weight_matrix | write_tab -x
+
+-   4   4   3   2   1   0   0   0   0   0   0   0   0   0
+A   1   0   0   1   4   2   1   3   1   0   0   5   0   0
+C   0   1   1   0   0   0   4   0   0   3   2   0   4   1
+G   0   0   1   0   0   3   0   0   1   2   3   0   0   0
+T   0   0   0   2   0   0   0   2   3   0   0   0   1   4
+}}}
+
+The above weight matrix shows the frequencies of all residue types (1st column) found at
+all positions throughout the alignment.
+
+To obtain the percentwise frequencies use the `-p` switch to [create_weight_matrix]:
+
+{{{
+read_fasta -i aln.fna | create_weight_matrix -p | write_tab -x
+
+-    80   80   60   40   20   0    0    0    0    0    0    0    0    0
+A    20   0    0    20   80   40   20   60   20   0    0    100  0    0
+C    0    20   20   0    0    0    80   0    0    60   40   0    80   20
+G    0    0    20   0    0    60   0    0    20   40   60   0    0    0
+T    0    0    0    40   0    0    0    40   60   0    0    0    20   80
+}}}
+
+==See also==
+
+[read_fasta]
+
+[write_tab]
+
+==Author==
+
+Martin Asser Hansen - Copyright (C) - All rights reserved.
+
+mail@maasha.dk
+
+August 2007
+
+==License==
+
+GNU General Public License version 2
+
+http://www.gnu.org/copyleft/gpl.html
+
+==Help==
+
+[create_weight_matrix] is part of the Biopieces framework.
+
+http://code.google.com/p/biopieces/
diff --git a/bp_usage/get_genome_seq b/bp_usage/get_genome_seq
deleted file mode 100644 (file)
index 5586b3b..0000000
+++ /dev/null
@@ -1,41 +0,0 @@
-Author:         Martin Asser Hansen - Copyright (C) - All rights reserved
-
-Contact:        mail@maasha.dk
-
-Date:           December 2007
-
-License:        GNU General Public License version 2 (http://www.gnu.org/copyleft/gpl.html)
-
-Description:    Extract subsequence from genome sequence either explicitly or using BED/PSL/BLAST entries in stream.
-
-Usage:          $script [options] -g <genome>
-Usage:          ... | $script [options] -g <genome>
-
-Options:   [-g <genome> | --genome=<genome>]    -  Genome to get subsequence from.
-Options:   [-c <string> | --chr=<string>]       -  Chromosome with requested subsequence.
-Options:   [-b <int>    | --beg=<int>]          -  Begin position of subsequence (first residue=1).
-Options:   [-e <int>    | --end=<int>]          -  End position of subsequence.
-Options:   [-l <int>    | --len=<int>]          -  Length of subsequence.
-Options:   [-f <int>    | --flank=<int>]        -  Include flanking sequence.
-Options:   [-m          | --mask]               -  Softmask non-exonic sequence.
-Options:   [-I <file>   | --stream_in=<file>]   -  Read input from stream file  -  Default=STDIN
-Options:   [-O <file>   | --stream_out=<file>]  -  Write output to stream file  -  Default=STDOUT
-
-Examples:  $script -g hg18 -c chr1 -b 1 -e 10  -  Get the first 10 nucleotides of human genome chr1.
-Examples:  $script -g hg18 -c chr1 -b 1 -l 10  -  Get the first 10 nucleotides of human genome chr1.
-Examples:  ... | $script -g mm8 -f 50          -  Get subsequences including 50nt flanks of mouse BED/PSL/BLAST entries.
-
-Keys in:  REC_TYPE  -  Optional record type (BED, PSL, or BLAST).
-Keys in:  CHR       -  Chromosome (for use with BED record type).
-Keys in:  CHR_BEG   -  Chromosome begin.
-Keys in:  CHR_END   -  Chromosome end.
-Keys in:  S_ID      -  Chromosome       (for use with PSL and BLAST record type).
-Keys in:  S_BEG     -  Chromosome begin (for use with PSL and BLAST record type).
-Keys in:  S_END     -  Chromosome end   (for use with PSL and BLAST record type).
-Keys in:  STRAND    -  Sequence strand.
-
-Keys out: CHR      -  Chromosome.
-Keys out: CHR_BEG  -  Chromosome begin.
-Keys out: CHR_END  -  Chromosome end.
-Keys out: SEQ      -  Sequence.
-Keys out: SEQ_LEN  -  Sequence length.
diff --git a/bp_usage/get_genome_seq.wiki b/bp_usage/get_genome_seq.wiki
new file mode 100644 (file)
index 0000000..e6425a9
--- /dev/null
@@ -0,0 +1,101 @@
+=Biopiece: get_genome_seq=
+
+==Synopsis==
+
+Extract subsequences from a genome sequence.
+
+==Description==
+
+[get_genome_seq] can be used to get subsequences from a specified genome that have been
+indexed with [index_genome_seq]. The subsequence can be obtained explicitly or from 
+BED/PSL/BLAST entries in the stream.
+
+Use [list_genomes] to see available genome sequences.
+
+==Usage==
+
+{{{
+... | get_genome_seq [options]
+}}}
+
+==Options==
+
+{{{
+[-g <genome> | --genome=<genome>]    -  Genome to get subsequence from.
+[-c <string> | --chr=<string>]       -  Chromosome with requested subsequence.
+[-b <int>    | --beg=<int>]          -  Begin position of subsequence (first residue=1).
+[-e <int>    | --end=<int>]          -  End position of subsequence.
+[-l <int>    | --len=<int>]          -  Length of subsequence.
+[-f <int>    | --flank=<int>]        -  Include flanking sequence.
+[-m          | --mask]               -  Softmask non-exonic sequence.
+[-I <file>   | --stream_in=<file>]   -  Read input from stream file  -  Default=STDIN
+[-O <file>   | --stream_out=<file>]  -  Write output to stream file  -  Default=STDOUT
+}}}
+
+==Examples==
+
+To get an explicit subsequence from the human genome (currently hg18) do:
+
+{{{
+get_genome_seq -g hg18 -c chrX -b 12000 -l 20
+
+CHR_END: 12018
+SEQ: GGTGCAGTAACACCTGCCGT
+CHR_BEG: 11999
+SEQ_LEN: 20
+CHR: chrX
+---
+}}}
+
+If you have a stream with BED, PSL or BLAST records, you obtain the subsequence
+simple by piping the stream through [get_genome_seq] and the sequence will be
+added to the record. Below is an example for a BED entry:
+
+{{{
+read_bed -i <BED file> -n 1 | get_genome_seq -g hg18
+
+STRAND: +
+CHR_END: 95127728
+Q_ID: gi|108087685|gb|DQ594132.1|_Homo_sapiens_piRNA_piR-60244,_complete_sequence
+CHR_BEG: 95127700
+SCORE: 1
+REC_TYPE: BED
+BED_LEN: 29
+CHR: chr15
+SEQ: TTCACTTCTCCCATGTAGTTCCTGAGTGC
+BED_COLS: 6
+---
+}}}
+
+Note that the sequence is based on the CHR_BEG and CHR_END positions. There is currently
+no switch that obtains subsequence based on record blocks.
+
+==See also==
+
+[index_genome_seq]
+
+[list_genomes]
+
+[read_bed]
+
+[extract_seq]
+
+==Author==
+
+Martin Asser Hansen - Copyright (C) - All rights reserved.
+
+mail@maasha.dk
+
+August 2007
+
+==License==
+
+GNU General Public License version 2
+
+http://www.gnu.org/copyleft/gpl.html
+
+==Help==
+
+[get_genome_seq] is part of the Biopieces framework.
+
+http://code.google.com/p/biopieces/
diff --git a/bp_usage/invert_align b/bp_usage/invert_align
deleted file mode 100644 (file)
index 9c99b33..0000000
+++ /dev/null
@@ -1,19 +0,0 @@
-Author:         Martin Asser Hansen - Copyright (C) - All rights reserved
-
-Contact:        mail@maasha.dk
-
-Date:           August 2007
-
-License:        GNU General Public License version 2 (http://www.gnu.org/copyleft/gpl.html)
-
-Description:    Inverts an alignment showing only non-mathing residues using the first sequence as reference.
-
-Usage:          ... | $script [options]
-
-Options:        [-s          | --soft]               -  Use soft inversion instead of hard inversion.
-Options:        [-I <file>   | --stream_in=<file>]   -  Read input from stream file  -  Default=STDIN
-Options:        [-O <file>   | --stream_out=<file>]  -  Write output to stream file  -  Default=STDOUT
-
-Examples:       ... | $script     -  Invert alignment in stream.
-Examples:       ... | $script -s  -  Soft invert alignment in stream.
-
diff --git a/bp_usage/invert_align.wiki b/bp_usage/invert_align.wiki
new file mode 100644 (file)
index 0000000..15409c4
--- /dev/null
@@ -0,0 +1,114 @@
+=Biopiece: invert_align=
+
+==Synopsis==
+
+Inverts an alignment showing only non-mathing residues using the first sequence
+as reference.
+
+==Description==
+
+[invert_align] is useful to locate mismatches or other differences in an alignment
+between the reference sequence (the first sequence in the alignment) and the remaining
+sequences. Invertion can be 'hard' where matching residues are shown as `-` or 'soft' 
+where matching residues are shown in lower case. In both cases, mismatches are shown as
+capital letters and gaps or missing sequence is shown as `_`.
+
+==Usage==
+
+{{{
+... | invert_align [options]
+}}}
+
+==Options==
+
+{{{
+[-s         | --soft]               -  Use soft inversion instead of hard inversion.
+[-I <file>  | --stream_in=<file>]   -  Read input from stream file  -  Default=STDIN
+[-O <file>  | --stream_out=<file>]  -  Write output to stream file  -  Default=STDOUT
+}}}
+
+==Examples==
+
+Consider the alignment in the file `aln.fna` in FASTA format:
+
+{{{
+>test1
+CTAGC-TTCGACT
+>test2
+--AGC-TTCGA--
+>test3
+--AGCTTTCGA--
+>test4
+--AG--CTCGA--
+>test5
+--AG--TTCGAC-
+}}}
+
+Reading the alignment using [read_fasta] and writing it using [write_align] results in:
+
+{{{
+read_fasta -i aln.fna | write_align -x
+
+                          .   
+test1            CTAGC-TTCGACT
+test2            --AGC-TTCGA--
+test3            --AGCTTTCGA--
+test4            --AG--CTCGA--
+test5            --AG--TTCGAC-
+Consensus: 50%   --AG--TTCGA--
+}}}
+
+However, if we insert an instance of [invert_align] it is clear where the sequence differences are:
+
+{{{
+read_fasta -i aln.fna | invert_align | write_align -x
+
+                          .   
+test1            CTAGC_TTCGACT
+test2            __---------__
+test3            __---T-----__
+test4            __--_-C----__
+test5            __--_-------_
+Consensus: 50%   -------------
+}}}
+
+And if we instead of hard inverting the sequence uses the `-s` switch of [invert_align] to obtain soft
+inverted alignment, where the matching residues are in lower case letters instead of represented as `-`, we get:
+
+{{{
+read_fasta -i aln.fna | invert_align -s | write_align -x
+
+                          .   
+test1            CTAGC_TTCGACT
+test2            __agc_ttcga__
+test3            __agcTttcga__
+test4            __ag__Ctcga__
+test5            __ag__ttcgac_
+Consensus: 50%   --AG--TTCGA--
+}}}
+
+==See also==
+
+[read_fasta]
+
+[write_align]
+
+==Author==
+
+Martin Asser Hansen - Copyright (C) - All rights reserved.
+
+mail@maasha.dk
+
+August 2007
+
+==License==
+
+GNU General Public License version 2
+
+http://www.gnu.org/copyleft/gpl.html
+
+==Help==
+
+[invert_align] is part of the Biopieces framework.
+
+http://code.google.com/p/biopieces/