From: martinahansen Date: Tue, 1 Jul 2008 00:32:18 +0000 (+0000) Subject: update wiki X-Git-Url: https://git.donarmstrong.com/?a=commitdiff_plain;h=ba0e144d5544e9fc570095b6773888ff9d25147a;p=biopieces.git update wiki git-svn-id: http://biopieces.googlecode.com/svn/trunk@85 74ccb610-7750-0410-82ae-013aeee3265d --- diff --git a/bp_usage/grab.wiki b/bp_usage/grab.wiki index 4d969b9..5ffbb98 100644 --- a/bp_usage/grab.wiki +++ b/bp_usage/grab.wiki @@ -6,7 +6,9 @@ Grab records in stream. ==Description== -*grab* selects records from the stream by matching keys or values using a pattern, a regular expression, or a numerical evaluation. *grab* is biopieces' equivalent of Unix' grep, however, *grab* is much more versatile. +[grab] selects records from the stream by matching keys or values using a pattern, +a regular expression, or a numerical evaluation. [grab] is biopieces' equivalent of +Unix' grep, however, [grab] is much more versatile. ==Usage== @@ -33,13 +35,15 @@ Grab records in stream. ==Examples== -To easily *grab* all records in the stream that has any mentioning of the pattern 'human' just pipe the data stream through *grab* like this: +To easily [grab] all records in the stream that has any mentioning of the pattern 'human' +just pipe the data stream through [grab] like this: {{{ ... | grab -p human }}} -This will search for the pattern 'human' in all keys and all values. The `-p` switch takes a comma separated list of patterns, so in order to match multiple patterns do: +This will search for the pattern 'human' in all keys and all values. The `-p` switch takes +a comma separated list of patterns, so in order to match multiple patterns do: {{{ ... | grab -p human,mouse @@ -51,31 +55,42 @@ It is also possible to use the `-P` switch instead of `-p`. `-P` is used to read ... | grab -P patterns.txt }}} -If you want the opposite result - to find all records that does not match the patterns, add the `-i` switch, which not only works with the `-p` and `-P` switch, but also with `-r` and `-e`: +If you want the opposite result - to find all records that does not match the patterns, +add the `-i` switch, which not only works with the `-p` and `-P` switch, but also with `-r` and `-e`: {{{ ... | grab -p human -i }}} -If you want to search the record keys only, e.g. to find all records containing the key SEQ you can add the `-K` switch. This will prevent matching of SEQ in any record value, and in fact SEQ is a not uncommon peptide sequence you could get an unwanted record. Also, this will give an increase in speed since only the keys are searched: +If you want to search the record keys only, e.g. to find all records containing the key SEQ +you can add the `-K` switch. This will prevent matching of SEQ in any record value, and in +fact SEQ is a not uncommon peptide sequence you could get an unwanted record. Also, this will +give an increase in speed since only the keys are searched: {{{ ... | grab -p SEQ -K }}} -However, if you are interested in finding the peptide sequence SEQ and not the SEQ key, just add the `-V` switch instead: +However, if you are interested in finding the peptide sequence SEQ and not the SEQ key, just +add the `-V` switch instead: {{{ ... | grab -p SEQ -V }}} -Also, if you want to *grab* for certain key/value pairs you can supply a comma separated list of keys whos values will then be searched using the `-k` switch. This is handy if your records contain large genomic sequences and you don't want to search the entire sequence for e.g. the organism name - it is much faster to tell *grab* which keys to search the value for: +Also, if you want to [grab] for certain key/value pairs you can supply a comma separated list +of keys whos values will then be searched using the `-k` switch. This is handy if your records +contain large genomic sequences and you don't want to search the entire sequence for e.g. the +organism name - it is much faster to tell [grab] which keys to search the value for: {{{ ... | grab -p human -k SEQ_NAME }}} -It is also possible to invoke flexible matching using regex (regular expressions) instead of simple pattern matching. In *grab* the regex engine is Perl based, and allows use of different type of wild cards, alternatives, etc. If you want to *grab* records withs the sequence ATCG or GCTA you can do this: +It is also possible to invoke flexible matching using regex (regular expressions) instead of +simple pattern matching. In [grab] the regex engine is Perl based, and allows use of different +type of wild cards, alternatives, etc. If you want to [grab] records withs the sequence ATCG +or GCTA you can do this: {{{ ... | grab -r 'ATCG|GCTA' @@ -87,7 +102,9 @@ Or if you want to find sequences beginning with ATCG: ... | grab -r '^ATCG' }}} -You can also use *grab* to locate records that fulfill a numerical property using the `-e` switch witch takes an expression in three parts. The first part is the key that holds the value we want to evaluate, the second part holds one if these eight operators: +You can also use [grab] to locate records that fulfill a numerical property using the `-e` switch +witch takes an expression in three parts. The first part is the key that holds the value we want +to evaluate, the second part holds one if these eight operators: # Greater than: > # Greater than or equal to: >= @@ -98,27 +115,32 @@ You can also use *grab* to locate records that fulfill a numerical property usin # String wise equal to: eq # String wise not equal to: ne -And finally comes the number used in the evaluation. So to *grab* all records with a sequence length greater than 30: +And finally comes the number used in the evaluation. So to [grab] all records with a sequence +length greater than 30: {{{ ... | grab -e 'SEQ_LEN > 30' }}} -If you want to locate all records containing the pattern 'human' and where the sequence length is greater that 30, you do this by running the stream through *grab* twice: +If you want to locate all records containing the pattern 'human' and where the sequence length +is greater that 30, you do this by running the stream through [grab] twice: {{{ ... | grab -p 'human' | grab -e 'SEQ_LEN > 30' }}} -Finally, it is possible to do fast matching of expressions from a file using the `-E` switch. Each of these expressions has to be matched exactly over the entrie length, which if useful if you e.g. have a file with accession numbers, that you want to locate in the stream: +Finally, it is possible to do fast matching of expressions from a file using the `-E` switch. +Each of these expressions has to be matched exactly over the entrie length, which if useful if +you e.g. have a file with accession numbers, that you want to locate in the stream: {{{ ... | grab -E acc_no.txt }}} -Using `-E` is much faster than using `-P`, because with `-E` the expression has to be complete matches, where `-P` looks for subpatterns. +Using `-E` is much faster than using `-P`, because with `-E` the expression has to be complete +matches, where `-P` looks for subpatterns. -NB! To get the best speed performance, use the most restrictive *grab* first. +NB! To get the best speed performance, use the most restrictive [grab] first. ==See also== @@ -126,7 +148,9 @@ NB! To get the best speed performance, use the most restrictive *grab* first. ==Author== Martin Asser Hansen - Copyright (C) - All rights reserved. + mail@maasha.dk + August 2007 ==License== @@ -137,6 +161,6 @@ http://www.gnu.org/copyleft/gpl.html ==Help== -*grab* is part of the Biopieces framework. +[grab] is part of the Biopieces framework. http://code.google.com/p/biopieces/ diff --git a/bp_usage/print_usage.wiki b/bp_usage/print_usage.wiki index 15af0d1..6f9d7a1 100644 --- a/bp_usage/print_usage.wiki +++ b/bp_usage/print_usage.wiki @@ -37,7 +37,9 @@ print_usage -i ~/biopieces/bp_usage/print_usage -h ==Author== Martin Asser Hansen - Copyright (C) - All rights reserved. + mail@maasha.dk + August 2007 ==License== @@ -48,6 +50,6 @@ http://www.gnu.org/copyleft/gpl.html ==Help== -*print_usage* is part of the Biopieces framework. +[print_usage] is part of the Biopieces framework. http://code.google.com/p/biopieces/ diff --git a/bp_usage/read_fasta.wiki b/bp_usage/read_fasta.wiki index c76e810..087c8cf 100644 --- a/bp_usage/read_fasta.wiki +++ b/bp_usage/read_fasta.wiki @@ -6,8 +6,10 @@ Read FASTA entries from one or more files. ==Description== -*read_fasta* read in sequence entries from FASTA files. Each sequence entry consists of a sequence name prefixed by a '>' followed by the sequence name on a line of -its own, followed by one or my lines of sequence until the next entry or the end of the file. The resulting biopiece record consists of the following record type: +[read_fasta] read in sequence entries from FASTA files. Each sequence entry consists of a +sequence name prefixed by a '>' followed by the sequence name on a line of its own, followed +by one or my lines of sequence until the next entry or the end of the file. The resulting +biopiece record consists of the following record type: {{{ SEQ_NAME: test @@ -65,12 +67,15 @@ read_fasta -i '*.fna' ==See also== [read_align] + [write_fasta] ==Author== Martin Asser Hansen - Copyright (C) - All rights reserved. + mail@maasha.dk + August 2007 ==License== @@ -81,7 +86,7 @@ http://www.gnu.org/copyleft/gpl.html ==Help== -*read_fasta* is part of the Biopieces framework. +[read_fasta] is part of the Biopieces framework. http://code.google.com/p/biopieces/ diff --git a/bp_usage/read_tab.wiki b/bp_usage/read_tab.wiki index 4774a46..7fe0527 100644 --- a/bp_usage/read_tab.wiki +++ b/bp_usage/read_tab.wiki @@ -6,7 +6,8 @@ Read tabular data. ==Description== -Tabular input can be read with *read_tab* which will read in chosen rows and chosen columns (separated by a given delimiter) from a table in ASCII text format. +Tabular input can be read with [read_tab] which will read in chosen rows and +chosen columns (separated by a given delimiter) from a table in ASCII text format. ==Usage== @@ -70,7 +71,8 @@ V1: AAATGCA --- }}} -However, the first line is a comment line that can be skipped using the `-s` switch which will skip a specified number of lines before reading. So to get the rows with data do: +However, the first line is a comment line that can be skipped using the `-s` switch which +will skip a specified number of lines before reading. So to get the rows with data do: {{{ read_tab -i test.tab -s 1 @@ -93,7 +95,9 @@ V1: AAATGCA --- }}} -It is possible to select a subset of columns to read by using the `-c` switch which takes a comma separated list of columns numbers (first column is designated 0) as argument. So to read in only the sequence and the count so that the count comes before the sequence do: +It is possible to select a subset of columns to read by using the `-c` switch which takes +a comma separated list of columns numbers (first column is designated 0) as argument. +So to read in only the sequence and the count so that the count comes before the sequence do: {{{ read_tab -i test.tab -s 1 -c 2,1 @@ -138,7 +142,9 @@ COUNT: 2342 ==Author== Martin Asser Hansen - Copyright (C) - All rights reserved. + mail@maasha.dk + August 2007 ==License== @@ -149,6 +155,6 @@ http://www.gnu.org/copyleft/gpl.html ==Help== -*read_tab* is part of the Biopieces framework. +[read_tab] is part of the Biopieces framework. http://code.google.com/p/biopieces/ diff --git a/bp_usage/write_fasta.wiki b/bp_usage/write_fasta.wiki index dce18b4..bbad19f 100644 --- a/bp_usage/write_fasta.wiki +++ b/bp_usage/write_fasta.wiki @@ -6,7 +6,7 @@ Write sequences from stream in FASTA format. ==Description== -*write_fasta* writes sequence from the data stream in FASTA format. However, +[write_fasta] writes sequence from the data stream in FASTA format. However, a FASTA entry will only be written if a SEQ key and a SEQ_NAME key is present. If a SEQ key is present and not SEQ_NAME, then the Q_ID key will be used as SEQ_NAME - if such a key is found. @@ -74,7 +74,7 @@ http://www.gnu.org/copyleft/gpl.html ==Help== -*write_fasta* is part of the Biopieces framework. +[write_fasta] is part of the Biopieces framework. http://code.google.com/p/biopieces/ diff --git a/bp_usage/write_tab.wiki b/bp_usage/write_tab.wiki index f66af45..2c80c95 100644 --- a/bp_usage/write_tab.wiki +++ b/bp_usage/write_tab.wiki @@ -6,7 +6,8 @@ Write tabular output from stream. ==Description== -Outputting the data stream as a table can be done with *write_tab*, which will write generate one row per record with the values as columns. +Outputting the data stream as a table can be done with [write_tab] +which will write generate one row per record with the values as columns. ==Usage== @@ -48,7 +49,8 @@ You can also change the delimiter from the default (tab) to e.g. ',': ... | write_tab -d ',' }}} -If you want the values output in a specific order you have to supply a comma separated list using the `-k` switch that will print only those keys in that order: +If you want the values output in a specific order you have to supply a comma +separated list using the `-k` switch that will print only those keys in that order: {{{ ... | write_tab -k SEQ_NAME,COUNT @@ -56,13 +58,19 @@ If you want the values output in a specific order you have to supply a comma sep Keys from e.g. [read_tab] V0, V1, V2 ... Vn, is automagically sorted numerically. -Alternatively, if you have some keys that you don't want in the tabular output, use the `-K` switch. So to print all keys except SEQ and SEQ_TYPE do: +Alternatively, if you have some keys that you don't want in the tabular output, +use the `-K` switch. So to print all keys except SEQ and SEQ_TYPE do: {{{ ... | write_tab -K SEQ,SEQ_TYPE }}} -Finally, if you have a stream containing a mix of different records types, e.g. records with sequences and records with matches, then you can use *write_tab* to output all the records in tabluar format, however, the `-c`, `-k`, and `-K` switches will only respond to records of the first type encountered. The reason is that outputting mixed records is probably not what you want anyway, and you should remove all the unwanted records from the stream before outputting the table: [grab] is your friend. +Finally, if you have a stream containing a mix of different records types, e.g. +records with sequences and records with matches, then you can use [write_tab] to +output all the records in tabluar format, however, the `-c`, `-k`, and `-K` switches +will only respond to records of the first type encountered. The reason is that outputting +mixed records is probably not what you want anyway, and you should remove all the unwanted +records from the stream before outputting the table: [grab] is your friend. ==See also== @@ -73,7 +81,9 @@ Finally, if you have a stream containing a mix of different records types, e.g. ==Author== Martin Asser Hansen - Copyright (C) - All rights reserved. + mail@maasha.dk + August 2007 ==License== @@ -84,6 +94,6 @@ http://www.gnu.org/copyleft/gpl.html ==Help== -*write_tab* is part of the Biopieces framework. +[write_tab] is part of the Biopieces framework. http://code.google.com/p/biopieces/