==Description==
-*grab* selects records from the stream by matching keys or values using a pattern, a regular expression, or a numerical evaluation. *grab* is biopieces' equivalent of Unix' grep, however, *grab* is much more versatile.
+[grab] selects records from the stream by matching keys or values using a pattern,
+a regular expression, or a numerical evaluation. [grab] is biopieces' equivalent of
+Unix' grep, however, [grab] is much more versatile.
==Usage==
==Examples==
-To easily *grab* all records in the stream that has any mentioning of the pattern 'human' just pipe the data stream through *grab* like this:
+To easily [grab] all records in the stream that has any mentioning of the pattern 'human'
+just pipe the data stream through [grab] like this:
{{{
... | grab -p human
}}}
-This will search for the pattern 'human' in all keys and all values. The `-p` switch takes a comma separated list of patterns, so in order to match multiple patterns do:
+This will search for the pattern 'human' in all keys and all values. The `-p` switch takes
+a comma separated list of patterns, so in order to match multiple patterns do:
{{{
... | grab -p human,mouse
... | grab -P patterns.txt
}}}
-If you want the opposite result - to find all records that does not match the patterns, add the `-i` switch, which not only works with the `-p` and `-P` switch, but also with `-r` and `-e`:
+If you want the opposite result - to find all records that does not match the patterns,
+add the `-i` switch, which not only works with the `-p` and `-P` switch, but also with `-r` and `-e`:
{{{
... | grab -p human -i
}}}
-If you want to search the record keys only, e.g. to find all records containing the key SEQ you can add the `-K` switch. This will prevent matching of SEQ in any record value, and in fact SEQ is a not uncommon peptide sequence you could get an unwanted record. Also, this will give an increase in speed since only the keys are searched:
+If you want to search the record keys only, e.g. to find all records containing the key SEQ
+you can add the `-K` switch. This will prevent matching of SEQ in any record value, and in
+fact SEQ is a not uncommon peptide sequence you could get an unwanted record. Also, this will
+give an increase in speed since only the keys are searched:
{{{
... | grab -p SEQ -K
}}}
-However, if you are interested in finding the peptide sequence SEQ and not the SEQ key, just add the `-V` switch instead:
+However, if you are interested in finding the peptide sequence SEQ and not the SEQ key, just
+add the `-V` switch instead:
{{{
... | grab -p SEQ -V
}}}
-Also, if you want to *grab* for certain key/value pairs you can supply a comma separated list of keys whos values will then be searched using the `-k` switch. This is handy if your records contain large genomic sequences and you don't want to search the entire sequence for e.g. the organism name - it is much faster to tell *grab* which keys to search the value for:
+Also, if you want to [grab] for certain key/value pairs you can supply a comma separated list
+of keys whos values will then be searched using the `-k` switch. This is handy if your records
+contain large genomic sequences and you don't want to search the entire sequence for e.g. the
+organism name - it is much faster to tell [grab] which keys to search the value for:
{{{
... | grab -p human -k SEQ_NAME
}}}
-It is also possible to invoke flexible matching using regex (regular expressions) instead of simple pattern matching. In *grab* the regex engine is Perl based, and allows use of different type of wild cards, alternatives, etc. If you want to *grab* records withs the sequence ATCG or GCTA you can do this:
+It is also possible to invoke flexible matching using regex (regular expressions) instead of
+simple pattern matching. In [grab] the regex engine is Perl based, and allows use of different
+type of wild cards, alternatives, etc. If you want to [grab] records withs the sequence ATCG
+or GCTA you can do this:
{{{
... | grab -r 'ATCG|GCTA'
... | grab -r '^ATCG'
}}}
-You can also use *grab* to locate records that fulfill a numerical property using the `-e` switch witch takes an expression in three parts. The first part is the key that holds the value we want to evaluate, the second part holds one if these eight operators:
+You can also use [grab] to locate records that fulfill a numerical property using the `-e` switch
+witch takes an expression in three parts. The first part is the key that holds the value we want
+to evaluate, the second part holds one if these eight operators:
# Greater than: >
# Greater than or equal to: >=
# String wise equal to: eq
# String wise not equal to: ne
-And finally comes the number used in the evaluation. So to *grab* all records with a sequence length greater than 30:
+And finally comes the number used in the evaluation. So to [grab] all records with a sequence
+length greater than 30:
{{{
... | grab -e 'SEQ_LEN > 30'
}}}
-If you want to locate all records containing the pattern 'human' and where the sequence length is greater that 30, you do this by running the stream through *grab* twice:
+If you want to locate all records containing the pattern 'human' and where the sequence length
+is greater that 30, you do this by running the stream through [grab] twice:
{{{
... | grab -p 'human' | grab -e 'SEQ_LEN > 30'
}}}
-Finally, it is possible to do fast matching of expressions from a file using the `-E` switch. Each of these expressions has to be matched exactly over the entrie length, which if useful if you e.g. have a file with accession numbers, that you want to locate in the stream:
+Finally, it is possible to do fast matching of expressions from a file using the `-E` switch.
+Each of these expressions has to be matched exactly over the entrie length, which if useful if
+you e.g. have a file with accession numbers, that you want to locate in the stream:
{{{
... | grab -E acc_no.txt
}}}
-Using `-E` is much faster than using `-P`, because with `-E` the expression has to be complete matches, where `-P` looks for subpatterns.
+Using `-E` is much faster than using `-P`, because with `-E` the expression has to be complete
+matches, where `-P` looks for subpatterns.
-NB! To get the best speed performance, use the most restrictive *grab* first.
+NB! To get the best speed performance, use the most restrictive [grab] first.
==See also==
==Author==
Martin Asser Hansen - Copyright (C) - All rights reserved.
+
mail@maasha.dk
+
August 2007
==License==
==Help==
-*grab* is part of the Biopieces framework.
+[grab] is part of the Biopieces framework.
http://code.google.com/p/biopieces/
==Description==
-Tabular input can be read with *read_tab* which will read in chosen rows and chosen columns (separated by a given delimiter) from a table in ASCII text format.
+Tabular input can be read with [read_tab] which will read in chosen rows and
+chosen columns (separated by a given delimiter) from a table in ASCII text format.
==Usage==
---
}}}
-However, the first line is a comment line that can be skipped using the `-s` switch which will skip a specified number of lines before reading. So to get the rows with data do:
+However, the first line is a comment line that can be skipped using the `-s` switch which
+will skip a specified number of lines before reading. So to get the rows with data do:
{{{
read_tab -i test.tab -s 1
---
}}}
-It is possible to select a subset of columns to read by using the `-c` switch which takes a comma separated list of columns numbers (first column is designated 0) as argument. So to read in only the sequence and the count so that the count comes before the sequence do:
+It is possible to select a subset of columns to read by using the `-c` switch which takes
+a comma separated list of columns numbers (first column is designated 0) as argument.
+So to read in only the sequence and the count so that the count comes before the sequence do:
{{{
read_tab -i test.tab -s 1 -c 2,1
==Author==
Martin Asser Hansen - Copyright (C) - All rights reserved.
+
mail@maasha.dk
+
August 2007
==License==
==Help==
-*read_tab* is part of the Biopieces framework.
+[read_tab] is part of the Biopieces framework.
http://code.google.com/p/biopieces/
==Description==
-Outputting the data stream as a table can be done with *write_tab*, which will write generate one row per record with the values as columns.
+Outputting the data stream as a table can be done with [write_tab]
+which will write generate one row per record with the values as columns.
==Usage==
... | write_tab -d ','
}}}
-If you want the values output in a specific order you have to supply a comma separated list using the `-k` switch that will print only those keys in that order:
+If you want the values output in a specific order you have to supply a comma
+separated list using the `-k` switch that will print only those keys in that order:
{{{
... | write_tab -k SEQ_NAME,COUNT
Keys from e.g. [read_tab] V0, V1, V2 ... Vn, is automagically sorted numerically.
-Alternatively, if you have some keys that you don't want in the tabular output, use the `-K` switch. So to print all keys except SEQ and SEQ_TYPE do:
+Alternatively, if you have some keys that you don't want in the tabular output,
+use the `-K` switch. So to print all keys except SEQ and SEQ_TYPE do:
{{{
... | write_tab -K SEQ,SEQ_TYPE
}}}
-Finally, if you have a stream containing a mix of different records types, e.g. records with sequences and records with matches, then you can use *write_tab* to output all the records in tabluar format, however, the `-c`, `-k`, and `-K` switches will only respond to records of the first type encountered. The reason is that outputting mixed records is probably not what you want anyway, and you should remove all the unwanted records from the stream before outputting the table: [grab] is your friend.
+Finally, if you have a stream containing a mix of different records types, e.g.
+records with sequences and records with matches, then you can use [write_tab] to
+output all the records in tabluar format, however, the `-c`, `-k`, and `-K` switches
+will only respond to records of the first type encountered. The reason is that outputting
+mixed records is probably not what you want anyway, and you should remove all the unwanted
+records from the stream before outputting the table: [grab] is your friend.
==See also==
==Author==
Martin Asser Hansen - Copyright (C) - All rights reserved.
+
mail@maasha.dk
+
August 2007
==License==
==Help==
-*write_tab* is part of the Biopieces framework.
+[write_tab] is part of the Biopieces framework.
http://code.google.com/p/biopieces/