From: martinahansen Date: Tue, 1 Jul 2008 04:48:23 +0000 (+0000) Subject: more wikis X-Git-Url: https://git.donarmstrong.com/?a=commitdiff_plain;h=4307b2332c54076784fe935a2cfafe6cbcfbd889;p=biopieces.git more wikis git-svn-id: http://biopieces.googlecode.com/svn/trunk@91 74ccb610-7750-0410-82ae-013aeee3265d --- diff --git a/bp_usage/create_weight_matrix b/bp_usage/create_weight_matrix deleted file mode 100644 index f1c56ee..0000000 --- a/bp_usage/create_weight_matrix +++ /dev/null @@ -1,19 +0,0 @@ -Author: Martin Asser Hansen - Copyright (C) - All rights reserved - -Contact: mail@maasha.dk - -Date: August 2007 - -License: GNU General Public License version 2 (http://www.gnu.org/copyleft/gpl.html) - -Description: Create a weight matrix of the residue composition of an alignment in the stream. - -Usage: ... | $script [options] - -Options: [-p | --percent] - Output the result in percent - Default=absolute -Options: [-I | --stream_in=] - Read input from stream file - Default=STDIN -Options: [-O | --stream_out=] - Write output to stream file - Default=STDOUT - -Examples: ... | $script -p - Creates a weight matrix in percent. - -Keys out: V0, V1, V2, Vn - Weight for each position. diff --git a/bp_usage/create_weight_matrix.wiki b/bp_usage/create_weight_matrix.wiki new file mode 100644 index 0000000..6a0a244 --- /dev/null +++ b/bp_usage/create_weight_matrix.wiki @@ -0,0 +1,122 @@ +=Biopiece: create_weight_matrix= + +==Synopsis== + +Create a residue composition weight matrix of an alignment in the stream. + +==Description== + +[create_weight_matrix] calculates the frequency of all residues per column in aligned +sequences from the stream - either as exact residue counts or percentages. + +==Usage== + +{{{ +... | create_weight_matrix [options] +}}} + +==Options== + +{{{ +[-p | --percent] - Output the result in percent - Default=absolute +[-I | --stream_in=] - Read input from stream file - Default=STDIN +[-O | --stream_out=] - Write output to stream file - Default=STDOUT +}}} + +==Examples== + +Consider the following alignment in the file `aln.fna` in FASTA format: + +{{{ +>test5 +---TAACAGGCACT +>test2 +-----GAATCGACT +>test1 +--CTAGCTTCGACT +>test3 +ACGAAACTAGCATC +>test4 +----AGCATCGACT +}}} + +To create a weight matrix from the above alignment, read it in with [read_fasta] and pipe the +stream through [create_weight_matrix]: + +{{{ +read_fasta -i aln.fna | create_weight_matrix +}}} + +The resulting five records will look the first one below, which is not really understandable: + +{{{ +V13: 0 +V11: 0 +V7: 0 +V4: 2 +V3: 3 +V9: 0 +V0: - +V2: 4 +V8: 0 +V12: 0 +V5: 1 +V10: 0 +V1: 4 +V6: 0 +V14: 0 +--- +}}} + +To make sense pipe the result through [write_tab] like this: + +{{{ +read_fasta -i aln.fna | create_weight_matrix | write_tab -x + +- 4 4 3 2 1 0 0 0 0 0 0 0 0 0 +A 1 0 0 1 4 2 1 3 1 0 0 5 0 0 +C 0 1 1 0 0 0 4 0 0 3 2 0 4 1 +G 0 0 1 0 0 3 0 0 1 2 3 0 0 0 +T 0 0 0 2 0 0 0 2 3 0 0 0 1 4 +}}} + +The above weight matrix shows the frequencies of all residue types (1st column) found at +all positions throughout the alignment. + +To obtain the percentwise frequencies use the `-p` switch to [create_weight_matrix]: + +{{{ +read_fasta -i aln.fna | create_weight_matrix -p | write_tab -x + +- 80 80 60 40 20 0 0 0 0 0 0 0 0 0 +A 20 0 0 20 80 40 20 60 20 0 0 100 0 0 +C 0 20 20 0 0 0 80 0 0 60 40 0 80 20 +G 0 0 20 0 0 60 0 0 20 40 60 0 0 0 +T 0 0 0 40 0 0 0 40 60 0 0 0 20 80 +}}} + +==See also== + +[read_fasta] + +[write_tab] + +==Author== + +Martin Asser Hansen - Copyright (C) - All rights reserved. + +mail@maasha.dk + +August 2007 + +==License== + +GNU General Public License version 2 + +http://www.gnu.org/copyleft/gpl.html + +==Help== + +[create_weight_matrix] is part of the Biopieces framework. + +http://code.google.com/p/biopieces/ diff --git a/bp_usage/get_genome_seq b/bp_usage/get_genome_seq deleted file mode 100644 index 5586b3b..0000000 --- a/bp_usage/get_genome_seq +++ /dev/null @@ -1,41 +0,0 @@ -Author: Martin Asser Hansen - Copyright (C) - All rights reserved - -Contact: mail@maasha.dk - -Date: December 2007 - -License: GNU General Public License version 2 (http://www.gnu.org/copyleft/gpl.html) - -Description: Extract subsequence from genome sequence either explicitly or using BED/PSL/BLAST entries in stream. - -Usage: $script [options] -g -Usage: ... | $script [options] -g - -Options: [-g | --genome=] - Genome to get subsequence from. -Options: [-c | --chr=] - Chromosome with requested subsequence. -Options: [-b | --beg=] - Begin position of subsequence (first residue=1). -Options: [-e | --end=] - End position of subsequence. -Options: [-l | --len=] - Length of subsequence. -Options: [-f | --flank=] - Include flanking sequence. -Options: [-m | --mask] - Softmask non-exonic sequence. -Options: [-I | --stream_in=] - Read input from stream file - Default=STDIN -Options: [-O | --stream_out=] - Write output to stream file - Default=STDOUT - -Examples: $script -g hg18 -c chr1 -b 1 -e 10 - Get the first 10 nucleotides of human genome chr1. -Examples: $script -g hg18 -c chr1 -b 1 -l 10 - Get the first 10 nucleotides of human genome chr1. -Examples: ... | $script -g mm8 -f 50 - Get subsequences including 50nt flanks of mouse BED/PSL/BLAST entries. - -Keys in: REC_TYPE - Optional record type (BED, PSL, or BLAST). -Keys in: CHR - Chromosome (for use with BED record type). -Keys in: CHR_BEG - Chromosome begin. -Keys in: CHR_END - Chromosome end. -Keys in: S_ID - Chromosome (for use with PSL and BLAST record type). -Keys in: S_BEG - Chromosome begin (for use with PSL and BLAST record type). -Keys in: S_END - Chromosome end (for use with PSL and BLAST record type). -Keys in: STRAND - Sequence strand. - -Keys out: CHR - Chromosome. -Keys out: CHR_BEG - Chromosome begin. -Keys out: CHR_END - Chromosome end. -Keys out: SEQ - Sequence. -Keys out: SEQ_LEN - Sequence length. diff --git a/bp_usage/get_genome_seq.wiki b/bp_usage/get_genome_seq.wiki new file mode 100644 index 0000000..e6425a9 --- /dev/null +++ b/bp_usage/get_genome_seq.wiki @@ -0,0 +1,101 @@ +=Biopiece: get_genome_seq= + +==Synopsis== + +Extract subsequences from a genome sequence. + +==Description== + +[get_genome_seq] can be used to get subsequences from a specified genome that have been +indexed with [index_genome_seq]. The subsequence can be obtained explicitly or from +BED/PSL/BLAST entries in the stream. + +Use [list_genomes] to see available genome sequences. + +==Usage== + +{{{ +... | get_genome_seq [options] +}}} + +==Options== + +{{{ +[-g | --genome=] - Genome to get subsequence from. +[-c | --chr=] - Chromosome with requested subsequence. +[-b | --beg=] - Begin position of subsequence (first residue=1). +[-e | --end=] - End position of subsequence. +[-l | --len=] - Length of subsequence. +[-f | --flank=] - Include flanking sequence. +[-m | --mask] - Softmask non-exonic sequence. +[-I | --stream_in=] - Read input from stream file - Default=STDIN +[-O | --stream_out=] - Write output to stream file - Default=STDOUT +}}} + +==Examples== + +To get an explicit subsequence from the human genome (currently hg18) do: + +{{{ +get_genome_seq -g hg18 -c chrX -b 12000 -l 20 + +CHR_END: 12018 +SEQ: GGTGCAGTAACACCTGCCGT +CHR_BEG: 11999 +SEQ_LEN: 20 +CHR: chrX +--- +}}} + +If you have a stream with BED, PSL or BLAST records, you obtain the subsequence +simple by piping the stream through [get_genome_seq] and the sequence will be +added to the record. Below is an example for a BED entry: + +{{{ +read_bed -i -n 1 | get_genome_seq -g hg18 + +STRAND: + +CHR_END: 95127728 +Q_ID: gi|108087685|gb|DQ594132.1|_Homo_sapiens_piRNA_piR-60244,_complete_sequence +CHR_BEG: 95127700 +SCORE: 1 +REC_TYPE: BED +BED_LEN: 29 +CHR: chr15 +SEQ: TTCACTTCTCCCATGTAGTTCCTGAGTGC +BED_COLS: 6 +--- +}}} + +Note that the sequence is based on the CHR_BEG and CHR_END positions. There is currently +no switch that obtains subsequence based on record blocks. + +==See also== + +[index_genome_seq] + +[list_genomes] + +[read_bed] + +[extract_seq] + +==Author== + +Martin Asser Hansen - Copyright (C) - All rights reserved. + +mail@maasha.dk + +August 2007 + +==License== + +GNU General Public License version 2 + +http://www.gnu.org/copyleft/gpl.html + +==Help== + +[get_genome_seq] is part of the Biopieces framework. + +http://code.google.com/p/biopieces/ diff --git a/bp_usage/invert_align b/bp_usage/invert_align deleted file mode 100644 index 9c99b33..0000000 --- a/bp_usage/invert_align +++ /dev/null @@ -1,19 +0,0 @@ -Author: Martin Asser Hansen - Copyright (C) - All rights reserved - -Contact: mail@maasha.dk - -Date: August 2007 - -License: GNU General Public License version 2 (http://www.gnu.org/copyleft/gpl.html) - -Description: Inverts an alignment showing only non-mathing residues using the first sequence as reference. - -Usage: ... | $script [options] - -Options: [-s | --soft] - Use soft inversion instead of hard inversion. -Options: [-I | --stream_in=] - Read input from stream file - Default=STDIN -Options: [-O | --stream_out=] - Write output to stream file - Default=STDOUT - -Examples: ... | $script - Invert alignment in stream. -Examples: ... | $script -s - Soft invert alignment in stream. - diff --git a/bp_usage/invert_align.wiki b/bp_usage/invert_align.wiki new file mode 100644 index 0000000..15409c4 --- /dev/null +++ b/bp_usage/invert_align.wiki @@ -0,0 +1,114 @@ +=Biopiece: invert_align= + +==Synopsis== + +Inverts an alignment showing only non-mathing residues using the first sequence +as reference. + +==Description== + +[invert_align] is useful to locate mismatches or other differences in an alignment +between the reference sequence (the first sequence in the alignment) and the remaining +sequences. Invertion can be 'hard' where matching residues are shown as `-` or 'soft' +where matching residues are shown in lower case. In both cases, mismatches are shown as +capital letters and gaps or missing sequence is shown as `_`. + +==Usage== + +{{{ +... | invert_align [options] +}}} + +==Options== + +{{{ +[-s | --soft] - Use soft inversion instead of hard inversion. +[-I | --stream_in=] - Read input from stream file - Default=STDIN +[-O | --stream_out=] - Write output to stream file - Default=STDOUT +}}} + +==Examples== + +Consider the alignment in the file `aln.fna` in FASTA format: + +{{{ +>test1 +CTAGC-TTCGACT +>test2 +--AGC-TTCGA-- +>test3 +--AGCTTTCGA-- +>test4 +--AG--CTCGA-- +>test5 +--AG--TTCGAC- +}}} + +Reading the alignment using [read_fasta] and writing it using [write_align] results in: + +{{{ +read_fasta -i aln.fna | write_align -x + + . +test1 CTAGC-TTCGACT +test2 --AGC-TTCGA-- +test3 --AGCTTTCGA-- +test4 --AG--CTCGA-- +test5 --AG--TTCGAC- +Consensus: 50% --AG--TTCGA-- +}}} + +However, if we insert an instance of [invert_align] it is clear where the sequence differences are: + +{{{ +read_fasta -i aln.fna | invert_align | write_align -x + + . +test1 CTAGC_TTCGACT +test2 __---------__ +test3 __---T-----__ +test4 __--_-C----__ +test5 __--_-------_ +Consensus: 50% ------------- +}}} + +And if we instead of hard inverting the sequence uses the `-s` switch of [invert_align] to obtain soft +inverted alignment, where the matching residues are in lower case letters instead of represented as `-`, we get: + +{{{ +read_fasta -i aln.fna | invert_align -s | write_align -x + + . +test1 CTAGC_TTCGACT +test2 __agc_ttcga__ +test3 __agcTttcga__ +test4 __ag__Ctcga__ +test5 __ag__ttcgac_ +Consensus: 50% --AG--TTCGA-- +}}} + +==See also== + +[read_fasta] + +[write_align] + +==Author== + +Martin Asser Hansen - Copyright (C) - All rights reserved. + +mail@maasha.dk + +August 2007 + +==License== + +GNU General Public License version 2 + +http://www.gnu.org/copyleft/gpl.html + +==Help== + +[invert_align] is part of the Biopieces framework. + +http://code.google.com/p/biopieces/