From: martinahansen Date: Thu, 24 Nov 2011 12:58:47 +0000 (+0000) Subject: removed bp_doc/ and content X-Git-Url: https://git.donarmstrong.com/?a=commitdiff_plain;h=67652c7efb076860e571e12c75fc1d07b203585e;p=biopieces.git removed bp_doc/ and content git-svn-id: http://biopieces.googlecode.com/svn/trunk@1684 74ccb610-7750-0410-82ae-013aeee3265d --- diff --git a/bp_doc/00README b/bp_doc/00README deleted file mode 100644 index cdbed61..0000000 --- a/bp_doc/00README +++ /dev/null @@ -1,2 +0,0 @@ -# Adding stuff to the wiki, like images and documents -svn co https://biopieces.googlecode.com/svn/wiki wiki --username martinahansen diff --git a/bp_doc/biopieces_cookbook.lyx b/bp_doc/biopieces_cookbook.lyx deleted file mode 100644 index 1dc2694..0000000 --- a/bp_doc/biopieces_cookbook.lyx +++ /dev/null @@ -1,7258 +0,0 @@ -#LyX 1.5.1 created this file. For more info see http://www.lyx.org/ -\lyxformat 276 -\begin_document -\begin_header -\textclass scrartcl -\begin_preamble -\usepackage[colorlinks=true, urlcolor=blue, linkcolor=black]{hyperref} -\end_preamble -\language english -\inputencoding auto -\font_roman default -\font_sans default -\font_typewriter default -\font_default_family default -\font_sc false -\font_osf false -\font_sf_scale 100 -\font_tt_scale 100 -\graphics default -\paperfontsize default -\spacing single -\papersize default -\use_geometry false -\use_amsmath 1 -\use_esint 1 -\cite_engine basic -\use_bibtopic false -\paperorientation portrait -\secnumdepth 3 -\tocdepth 3 -\paragraph_separation skip -\defskip medskip -\quotes_language english -\papercolumns 1 -\papersides 1 -\paperpagestyle default -\tracking_changes false -\output_changes false -\author "" -\author "" -\end_header - -\begin_body - -\begin_layout Title -Biopieces Cookbook -\end_layout - -\begin_layout Author -Martin Asser Hansen -\end_layout - -\begin_layout Publishers -John Mattick Group -\newline -Institute for Molecular Bioscience -\newline -University of Queensland -\newline -Aust -ralia -\newline -E-mail: mail@maasha.dk -\end_layout - -\begin_layout Standard -\begin_inset ERT -status open - -\begin_layout Standard - - -\backslash -thispagestyle{empty} -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Standard - -\newpage - -\end_layout - -\begin_layout Standard -\begin_inset LatexCommand tableofcontents - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset FloatList figure - -\end_inset - - -\end_layout - -\begin_layout Standard - -\newpage - -\end_layout - -\begin_layout Section -Introduction -\end_layout - -\begin_layout Standard -Biopieces is a collection of bioinformatic tools that can be linked together - (piped as we shall call it) in a very flexible manner to perform both simple - and complex tasks. - The fundamental idea is that biopieces work on a data stream that will - only terminate at the end of an analysis and that this data stream can - be passed through several different biopieces, each performing one specific - task. - The advantage of this approach is that a user can perform simple and complex - tasks without having to write advanced code. - Moreover, since the data format used to pass data between biopieces is - text based, biopieces can be written by different developers in their favorite - programming language --- and still the biopieces will be able to work together. -\end_layout - -\begin_layout Standard -In the most simple form bioools can be piped together on the command line - like this (using the pipe character '|'): -\end_layout - -\begin_layout LyX-Code -read_data | calculate_something | write_result -\end_layout - -\begin_layout Standard -However, a more comprehensive analysis could be composed: -\end_layout - -\begin_layout LyX-Code -read_data | select_entries | convert_entries | search_database -\end_layout - -\begin_layout LyX-Code -evaluate_results | plot_diagram | plot_another_diagram | -\end_layout - -\begin_layout LyX-Code -load_to_database -\end_layout - -\begin_layout Standard -The data stream that is piped through the biopieces consists of records - of key/value pairs in the same way a hash does in order to keep as simple - a structure as possible. - An example record can be seen below: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -REC_TYPE: PATSCAN -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -MATCH: AGATCAAGTG -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -S_BEG: 7 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -S_END: 16 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -ALIGN_LEN: 9 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -S_ID: piR-t6 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -STRAND: + -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -PATTERN: AGATCAAGTG -\end_layout - -\begin_layout LyX-Code - -\size scriptsize ---- -\end_layout - -\begin_layout Standard -The ' -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\backslash -/- -\end_layout - -\end_inset - -' denotes the delimiter of the records, and each key is a word followed - by a ':' and a white-space and then the value. - By convention the biopieces only uses upper case keys (a list of used keys - can be seen in Appendix\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sec:Keys" - -\end_inset - -). - Since the records basically are hash structures this mean that the order - of the keys in the stream is unordered, and in the above example it is - pure coincidence that HIT_BEG is displayed before HIT_END, however, when - the order of the keys is importent, the biopieces will automagically see - to that. -\end_layout - -\begin_layout Standard -All of the biopieces are able to read and write a data stream to and from - file as long as the records are in the biopieces format. - This means that if you are undertaking a lengthy analysis where one of - the steps is time consuming, you may save the stream after this step, and - subsequently start one or more analysis from that last step -\begin_inset Foot -status collapsed - -\begin_layout Standard -It is a goal that the biopieces at some point will be able to dump the data - stream to file in case one of the tools fail critically. -\end_layout - -\end_inset - -. - If you are running a lengthy analysis it is highly recommended that you - create a small test sample of the data and run that through the pipeline - --- and once you are satisfied with the result proceed with the full data - set (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-select-a-few-records" - -\end_inset - -). -\end_layout - -\begin_layout Standard -All of the biopieces can be supplied with long arguments prefixed with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - - switches or single character arguments prefixed with - switches that can - be grouped together (e.g. - -xok). - In this cookbook only the long switches are used to emphasize what these - switches do. -\end_layout - -\begin_layout Section -Setup -\end_layout - -\begin_layout Standard -In order to get the biopieces to work, you need to add environment settings - to include the code binaries, scripts, and modules that constitute the - biopieces package. - Assuming that you are using bash, add the following to your '~/.bashrc' - file using your favorite editor. - After the changes has been saved you need to either run 'source ~/.bashrc' - or relogin. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -if [ -f "/home/m.hansen/maasha/conf/bashrc" ]; then -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - source "/home/m.hansen/maasha/conf/bashrc" -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -fi -\end_layout - -\begin_layout Section -Getting Started -\end_layout - -\begin_layout Standard -The biopiece -\series bold -list_biopieces -\series default - lists all the biopieces along with a description: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -list_biopieces -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -align_seq Align sequences in stream using Muscle. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -analyze_seq Analysis the residue composition of each sequence - in stream. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -analyze_vals Determine type, count, min, max, sum and mean for - values in stream. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -blast_seq BLAST sequences in stream against a specified database. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -blat_seq BLAT sequences in stream against a specified genome. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -complement_seq Complement sequences in stream. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -count_records Count the number of records in stream. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -count_seq Count sequences in stream. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -count_vals Count the number of times values of given keys exists - in stream. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -create_blast_db Create a BLAST database from sequences in stream for - use with BLAST. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -... -\end_layout - -\begin_layout Standard -To list the biopieces for writing different formats, you can use unix's - grep like this: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -list_biopieces | grep write -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -write_align Write aligned sequences in pretty alignment format. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -write_bed Write records from stream as BED lines. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -write_blast Write BLAST records from stream in BLAST tabular format - (-m8 and 9). -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -write_fasta Write sequences in FASTA format. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -write_psl Write records from stream in PSL format. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -write_tab Write records from stream as tab separated table. -\end_layout - -\begin_layout Standard -In order to find out how a specific biopiece works, you just type the program - name without any arguments and press return and the usage of the biopiece - will be displayed. - E.g. - -\series bold -read_fasta -\series default - : -\end_layout - -\begin_layout Standard -\begin_inset Box Frameless -position "t" -hor_pos "c" -has_inner_box 1 -inner_pos "t" -use_parbox 0 -width "100col%" -special "none" -height "1in" -height_special "totalheight" -status open - -\begin_layout LyX-Code - -\size scriptsize -Program name: read_fasta -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Author: Martin Asser Hansen - Copyright (C) - All rights reserved -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Contact: mail@maasha.dk -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Date: August 2007 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -License: GNU General Public License version 2 (http://www.gnu.org/copyleft/ -gpl.html) -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Description: Read FASTA entries. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Usage: read_fasta [options] -i -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Options: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-i | --data_in=] - Comma separated list of files - to read. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-n | --num=] - Limit number of records to read. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-I | --stream_in=] - Read input stream from file - - Default=STDIN -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-O | --stream_out=] - Write output stream to file - - Default=STDOUT -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Examples: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i test.fna - Read FASTA entries from file. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i test1.fna,test2,fna - Read FASTA entries from files. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i '*.fna' - Read FASTA entries from files. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i test.fna -n 10 - Read first 10 FASTA entries from - file. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Section -The Data Stream -\end_layout - -\begin_layout Subsection -How to read the data stream from file? -\begin_inset LatexCommand label -name "sub:How-to-read-stream" - -\end_inset - - -\end_layout - -\begin_layout Standard -You want to read a data stream that you previously have saved to file in - biopieces format. - This can be done implicetly or explicitly. - The implicit way uses the 'stdout' stream of the Unix terminal: -\end_layout - -\begin_layout LyX-Code -cat | -\end_layout - -\begin_layout Standard -cat is the Unix command that reads a file and output the result to 'stdout' - --- which in this case is piped to any biopiece represented by the . - It is also possible to read the data stream using '<' to direct the 'stdout' - stream into the biopiece like this: -\end_layout - -\begin_layout LyX-Code - < -\end_layout - -\begin_layout Standard -However, that will not work if you pipe more biopieces together. - Then it is much safer to read the stream from a file explicitly like this: -\end_layout - -\begin_layout LyX-Code - --stream_in= -\end_layout - -\begin_layout Standard -Here the filename is explicetly given to the biopiece - with the switch -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_in. - This switch works with all biopieces. - It is also possible to read in data from multiple sources by repeating - the explicit read step: -\end_layout - -\begin_layout LyX-Code - --stream_in= | --stream_in= -\end_layout - -\begin_layout Subsection -How to write the data stream to file? -\begin_inset LatexCommand label -name "sub:How-to-write-stream" - -\end_inset - - -\end_layout - -\begin_layout Standard -In order to save the output stream from a biopiece to file, so you can read - in the stream again at a later time, you can do one of two things: -\end_layout - -\begin_layout LyX-Code - > -\end_layout - -\begin_layout Standard -All, the biopieces write the data stream to 'stdout' by default which can - be written to a file by redirecting 'stdout' to file using '>' , however, - if one of the biopieces for writing other formats is used then the both - the biopieces records as well as the result output will go to 'stdout' - in a mixture causing havock! To avoid this you must use the switch -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_out that explictly tells the biopiece to write the output stream - to file: -\end_layout - -\begin_layout LyX-Code - --stream_out= -\end_layout - -\begin_layout Standard -The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_out switch works with all biopieces. -\end_layout - -\begin_layout Subsection -How to terminate the data stream? -\end_layout - -\begin_layout Standard -The data stream is never stops unless the user want to save the stream or - by supplying the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch that will terminate the stream: -\end_layout - -\begin_layout LyX-Code - --no_stream -\end_layout - -\begin_layout Standard -The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch only works with those biopieces where it makes sense that - the user might want to terminale the data stream, -\emph on -i.e -\emph default -. - after an analysis step where the user wants to output the result, but not - the data stream. -\end_layout - -\begin_layout Subsection -How to write my results to file? -\begin_inset LatexCommand label -name "sub:How-to-write-result" - -\end_inset - - -\end_layout - -\begin_layout Standard -Saving the result of an analysis to file can be done implicitly or explicitly. - The implicit way: -\end_layout - -\begin_layout LyX-Code - --no_stream > -\end_layout - -\begin_layout Standard -If you use '>' to redirect 'stdout' to file then it is important to use - the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch to avoid writing a mix of biopieces records and result - to the same file causing havock. - The safe way is to use the -\begin_inset ERT -status open - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out switch which explicetly tells the biopiece to write the result - to a given file: -\end_layout - -\begin_layout LyX-Code - --result_out= -\end_layout - -\begin_layout Standard -Using the above method will not terminate the stream, so it is possible - to pipe that into another biopiece generating different results: -\end_layout - -\begin_layout LyX-Code - --result_out= | --result_out= -\end_layout - -\begin_layout Standard -And still the data stream will continue unless terminated with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream: -\end_layout - -\begin_layout LyX-Code - --result_out= --no_stream -\end_layout - -\begin_layout Standard -Or written to file using implicitly or explicity -\begin_inset LatexCommand eqref -reference "sub:How-to-write-result" - -\end_inset - -. - The explicit way: -\end_layout - -\begin_layout LyX-Code - --result_out= --stream_out= -\end_layout - -\begin_layout Subsection -How to read data from multiple sources? -\end_layout - -\begin_layout Standard -To read multiple data sources, with the same type or different type of data - do: -\end_layout - -\begin_layout LyX-Code - --data_in= | --data_in= -\end_layout - -\begin_layout Standard -where type is the data type a specific biopiece reads. -\end_layout - -\begin_layout Section -Reading input -\end_layout - -\begin_layout Subsection -How to read biopieces input? -\end_layout - -\begin_layout Standard -See -\begin_inset LatexCommand eqref -reference "sub:How-to-read-stream" - -\end_inset - -. -\end_layout - -\begin_layout Subsection -How to read in data? -\end_layout - -\begin_layout Standard -Data in different formats can be read with the appropriate biopiece for - that format. - The biopieces are typicalled named 'read_' such as -\series bold -read_fasta -\series default -, -\series bold -read_bed -\series default -, -\series bold -read_tab -\series default -, etc., and all behave in a similar manner. - Data can be read by supplying the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -data_in switch and a file name to the file containing the data: -\end_layout - -\begin_layout LyX-Code - --data_in= -\end_layout - -\begin_layout Standard -It is also possible to read in a saved biopieces stream (see -\begin_inset LatexCommand ref -reference "sub:How-to-read-stream" - -\end_inset - -) as well as reading data in one go: -\end_layout - -\begin_layout LyX-Code - --stream_in= --data_in= -\end_layout - -\begin_layout Standard -If you want to read data from several files you can do this: -\end_layout - -\begin_layout LyX-Code - --data_in= | --data_in= -\end_layout - -\begin_layout Standard -If you have several data files you can read in all explicitly with a comma - separated list: -\end_layout - -\begin_layout LyX-Code - --data_in=file1,file2,file3 -\end_layout - -\begin_layout Standard -And it is also possible to use file globbing -\begin_inset Foot -status open - -\begin_layout Standard -using the short option will only work if you quote the argument -i '*.fna' -\end_layout - -\end_inset - -: -\end_layout - -\begin_layout LyX-Code - --data_in=*.fna -\end_layout - -\begin_layout Standard -Or in a combination: -\end_layout - -\begin_layout LyX-Code - --data_in=file1,/dir/*.fna -\end_layout - -\begin_layout Standard -Finally, it is possible to read in data in different formats using the appropria -te biopiece for each format: -\end_layout - -\begin_layout LyX-Code - --data_in= | --data_in= ... -\end_layout - -\begin_layout Subsection -How to read FASTA input? -\end_layout - -\begin_layout Standard -Sequences in FASTA format can be read explicitly using -\series bold -read_fasta -\series default -: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= -\end_layout - -\begin_layout Subsection -How to read alignment input? -\end_layout - -\begin_layout Standard -If your alignment if FASTA formatted then you can -\series bold -read_align -\series default -. - It is also possible to use -\series bold -read_fasta -\series default - since the data is FASTA formatted, however, with -\series bold -read_fasta -\series default - the key ALIGN will be omitted. - The ALIGN key is used to determine which sequences belong to what alignment - which is required for -\series bold -write_align -\series default -. -\end_layout - -\begin_layout LyX-Code -read_align --data_in= -\end_layout - -\begin_layout Subsection -How to read tabular input? -\begin_inset LatexCommand label -name "sub:How-to-read-table" - -\end_inset - - -\end_layout - -\begin_layout Standard -Tabular input can be read with -\series bold -read_tab -\series default - which will read in all rows and chosen columns (separated by a given delimter) - from a table in text format. -\end_layout - -\begin_layout Standard -The table below: -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Tabular - - - - - - - -\begin_inset Text - -\begin_layout Standard -Human -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -ATACGTCAG -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -23524 -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Dog -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -AGCATGAC -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -2442 -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Mouse -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -GACTG -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -234 -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Cat -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -AAATGCA -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -2342 -\end_layout - -\end_inset - - - - -\end_inset - - -\end_layout - -\begin_layout Standard -Can be read using the command: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= -\end_layout - -\begin_layout Standard -Which will result in four records, one for each row, where the keys V0, - V1, V2 are the default keys for the organism, sequence, and count, respectively. - It is possible to select a subset of colums to read by using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -cols switch which takes a comma separated list of columns numbers (first - column is designated 0) as argument. - So to read in only the sequence and the count so that the count comes before - the sequence do: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= --cols=2,1 -\end_layout - -\begin_layout Standard -It is also possible to name the columns with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys switch: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= --cols=2,1 --keys=COUNT,SEQ -\end_layout - -\begin_layout Subsection -How to read BED input? -\end_layout - -\begin_layout Standard -The BED (Browser Extensible Data -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://genome.ucsc.edu/FAQ/FAQformat" - -\end_inset - - -\end_layout - -\end_inset - -) format is a tabular format for data pertaining to one of the Eukaryotic - genomes in the UCSC genome brower -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://genome.ucsc.edu/" - -\end_inset - - -\end_layout - -\end_inset - -. - The BED format consists of up to 12 columns, where the first three are - mandatory CHR, CHR_BEG, and CHR_END. - The mandatory columns and any of the optional columns can all be read in - easily with the -\series bold -read_bed -\series default - biopiece. -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= -\end_layout - -\begin_layout Standard -It is also possible to read the BED file with -\series bold -read_tab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-read-table" - -\end_inset - -), however, that will be more cumbersome because you need to specify the - keys: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= --keys=CHR,CHR_BEG,CHR_END ... -\end_layout - -\begin_layout Subsection -How to read PSL input? -\end_layout - -\begin_layout Standard -The PSL format is the output from BLAT and contains 21 mandatory fields - that can be read with -\series bold -read_psl -\series default -: -\end_layout - -\begin_layout LyX-Code -read_psl --data_in= -\end_layout - -\begin_layout Section -Writing output -\end_layout - -\begin_layout Standard -All result output can be written explicitly to file using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out switch which all result generating biopieces have. - It is also possible to write the result to file implicetly by directing - 'stdout' to file using '>', however, that requires the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream swich to prevent a mixture of data stream and results in the file. - The explicit (and safe) way: -\end_layout - -\begin_layout LyX-Code -... - | --result_out= -\end_layout - -\begin_layout Standard -The implicit way: -\end_layout - -\begin_layout LyX-Code -... - | --no_stream > -\end_layout - -\begin_layout Subsection -How to write biopieces output? -\end_layout - -\begin_layout Standard -See -\begin_inset LatexCommand eqref -reference "sub:How-to-write-stream" - -\end_inset - -. -\end_layout - -\begin_layout Subsection -How to write FASTA output? -\begin_inset LatexCommand label -name "sub:How-to-write-fasta" - -\end_inset - - -\end_layout - -\begin_layout Standard -FASTA output can be written with -\series bold -write_fasta -\series default -. -\end_layout - -\begin_layout LyX-Code -... - | write_fasta --result_out= -\end_layout - -\begin_layout Standard -It is also possible to wrap the sequences to a given width using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -wrap switch allthough wrapping of sequence is generally an evil thing: -\end_layout - -\begin_layout LyX-Code -... - | write_fasta --no_stream --wrap=80 -\end_layout - -\begin_layout Subsection -How to write alignment output? -\begin_inset LatexCommand label -name "sub:How-to-write-alignment" - -\end_inset - - -\end_layout - -\begin_layout Standard -Pretty alignments with ruler -\begin_inset Foot -status collapsed - -\begin_layout Standard -'.' for every 10 residues, ':' for every 50, and '|' for every 100 -\end_layout - -\end_inset - - and consensus sequence -\begin_inset Note Note -status collapsed - -\begin_layout Standard -which reminds me to make that an option. -\end_layout - -\end_inset - - can be created with -\series bold -write_align -\series default -, what also have the optional -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -wrap switch to break the alignment into blocks of a given width: -\end_layout - -\begin_layout LyX-Code -... - | write_align --result_out= --wrap=80 -\end_layout - -\begin_layout Standard -If the number of sequnces in the alignment is 2 then a pairwise alignment - will be output otherwise a multiple alignment. - And if the sequence type, determined automagically, is protein, then residues - and symbols (+,\InsetSpace ~ -:,\InsetSpace ~ -.) will be used to show consensus according to the Blosum62 - matrix. -\end_layout - -\begin_layout Subsection -How to write tabular output? -\begin_inset LatexCommand label -name "sub:How-to-write-tab" - -\end_inset - - -\end_layout - -\begin_layout Standard -Outputting the data stream as a table can be done with -\series bold -write_tab -\series default -, which will write generate one row per record with the values as columns. - If you supply the optional -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -comment switch, when the first row in the table will be a 'comment' line - prefixed with a '#': -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --comment -\end_layout - -\begin_layout Standard -You can also change the delimiter from the default (tab) to -\emph on -e.g. - -\emph default - ',': -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --delimit=',' -\end_layout - -\begin_layout Standard -If you want the values output in a specific order you have to supply a comma - separated list using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys switch that will print only those keys in that order: -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --keys=SEQ_NAME,COUNT -\end_layout - -\begin_layout Standard -Alternatively, if you have some keys that you don't want in the tabular - output, use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_keys switch. - So to print all keys except SEQ and SEQ_TYPE do: -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --no_keys=SEQ,SEQ_TYPE -\end_layout - -\begin_layout Standard -Finally, if you have a stream containing a mix of different records types, - -\emph on -e.g. - -\emph default - records with sequences and records with matches, then you can use -\series bold -write_tab -\series default - to output all the records in tabluar format, however, the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -comment, -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys, and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_keys switches will only respond to records of the first type encountered. - The reason is that outputting mixed records is probably not what you want - anyway, and you should remove all the unwanted records from the stream - before outputting the table: -\series bold -grab -\series default - is your friend (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-grab" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to write a BED output? -\begin_inset LatexCommand label -name "sub:How-to-write-BED" - -\end_inset - - -\end_layout - -\begin_layout Standard -Data in BED format can be output if the records contain the mandatory keys - CHR, CHR_BEG, and CHR_END using -\series bold -write_bed -\series default -. - If the optional keys are also present, they will be output as well: -\end_layout - -\begin_layout LyX-Code -write_bed --result_out= -\end_layout - -\begin_layout Subsection -How to write PSL output? -\begin_inset LatexCommand label -name "sub:How-to-write-PSL" - -\end_inset - - -\end_layout - -\begin_layout Standard -Data in PSL format can be output using -\series bold -write_psl: -\end_layout - -\begin_layout LyX-Code -write_psl --result_out= -\end_layout - -\begin_layout Section -Manipulating Records -\end_layout - -\begin_layout Subsection -How to select a few records? -\begin_inset LatexCommand label -name "sub:How-to-select-a-few-records" - -\end_inset - - -\end_layout - -\begin_layout Standard -To quickly get an overview of your data you can limit the data stream to - show a few records. - This also very useful to test the pipeline with a few records if you are - setting up a complex analysis using several biopieces. - That way you can inspect that all goes well before analyzing and waiting - for the full data set. - All of the read_ biopieces have the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num switch which will take a number as argument and only that number of - records will be read. - So to read in the first 10 FASTA entries from a file: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna --num=10 -\end_layout - -\begin_layout Standard -Another way of doing this is to use -\series bold -head_records -\series default - will limit the stream to show the first 10 records (default): -\end_layout - -\begin_layout LyX-Code -... - | head_records -\end_layout - -\begin_layout Standard -Using -\series bold -head_records -\series default - directly after one of the read_ biopieces will be a lot slower than - using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num switch with the read_ biopieces, however, -\series bold -head_records -\series default - can also be used to limit the output from all the other biopieces. - It is also possible to give -\series bold -head_records -\series default - a number of records to show using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num switch. - So to display the first 100 records do: -\end_layout - -\begin_layout LyX-Code -... - | head_records --num=100 -\end_layout - -\begin_layout Subsection -How to select random records? -\begin_inset LatexCommand label -name "sub:How-to-select-random-records" - -\end_inset - - -\end_layout - -\begin_layout Standard -If you want to inspect a number of random records from the stream this can - be done with the -\series bold -random_records -\series default - biopiece. - So if you have 1 mio records in the stream and you want to select 1000 - random records do: -\end_layout - -\begin_layout LyX-Code -... - | random_records --num=1000 -\end_layout - -\begin_layout Subsection -How to count all records in the data stream? -\end_layout - -\begin_layout Standard -To count all the records in the data stream use -\series bold -count_records -\series default -, which adds one record (which is not included in the count) to the data - stream. - So to count the number of sequences in a FASTA file you can do this: -\end_layout - -\begin_layout LyX-Code -cat test.fna | read_fasta | count_records --no_stream -\end_layout - -\begin_layout Standard -Which will write the last record containing the count to 'stdout': -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -count_records: 630 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize ---- -\end_layout - -\begin_layout Standard -It is also possible to write the count to file using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out switch. -\end_layout - -\begin_layout Subsection -How to get the length of record values? -\begin_inset LatexCommand label -name "sub:How-to-get-value_length" - -\end_inset - - -\end_layout - -\begin_layout Standard -Use the -\series bold -length_vals -\series default - biopiece to get the length of each value for a comma separated list of - keys: -\end_layout - -\begin_layout LyX-Code -... - | length_vals --keys=HIT,PATTERN -\end_layout - -\begin_layout Subsection -How to grab specific records? -\begin_inset LatexCommand label -name "sub:How-to-grab" - -\end_inset - - -\end_layout - -\begin_layout Standard -The biopiece -\series bold -grab -\series default - is related to the Unix grep and locates records based on matching keys - and/or values using either a pattern, a Perl regex, or a numerical evaluation. - To easily -\series bold -grab -\series default - all records in the stream that has any mentioning of the pattern 'human' - just pipe the data stream through -\series bold -grab -\series default - like this: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human -\end_layout - -\begin_layout Standard -This will search for the pattern 'human' in all keys and all values. - The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern switch takes a comma separated list of patterns, so in order to - match multiple patterns do: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human,mouse -\end_layout - -\begin_layout Standard -It is also possible to use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in switch instead of -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern. - -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in is used to read a file with one pattern per line: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern_in=patterns.txt -\end_layout - -\begin_layout Standard -If you want the opposite result --- to find all records that does not match - the patterns, add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -invert switch, which not only works with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern switch, but also with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -regex and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -eval: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human --invert -\end_layout - -\begin_layout Standard -If you want to search the record keys only, -\emph on -e.g. - -\emph default - to find all records containing the key SEQ you can add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys_only switch. - This will prevent matching of SEQ in any record value, and in fact SEQ - is a not uncommon peptide sequence you could get an unwanted record. - Also, this will give an increase in speed since only the keys are searched: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=SEQ --keys_only -\end_layout - -\begin_layout Standard -However, if you are interested in finding the peptide sequence SEQ and not - the SEQ key, just add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -vals_only switch instead: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=SEQ --vals_only -\end_layout - -\begin_layout Standard -Also, if you want to grab for certain key/value pairs you can supply a comma - separated list of keys whos values will then be searched using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys switch. - This is handy if your records contain large genomic sequences and you dont - want to search the entire sequence for -\emph on -e.g. - -\emph default - the organism name --- it is much faster to tell -\series bold -grab -\series default - which keys to search the value for: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human --keys=SEQ_NAME -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout Standard -It is also possible to invoke flexible matching using regex (regular expressions -) instead of simple pattern matching. - In -\series bold -grab -\series default - the regex engine is Perl based and allows use of different type of wild - cards, alternatives, -\emph on -etc -\emph default - -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://perldoc.perl.org/perlreref.html" - -\end_inset - - -\end_layout - -\end_inset - -. - If you want to -\series bold -grab -\series default - records withs the sequence ATCG or GCTA you can do this: -\end_layout - -\begin_layout LyX-Code -... - | grab --regex='ATCG|GCTA' -\end_layout - -\begin_layout Standard -Or if you want to find sequences beginning with ATCG: -\end_layout - -\begin_layout LyX-Code -... - | grab --regex='^ATCG' -\end_layout - -\begin_layout Standard -You can also use -\series bold -grab -\series default - to locate records that fulfill a numerical property using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -eval switch witch takes an expression in three parts. - The first part is the key that holds the value we want to evaluate, the - second part holds one the six operators: -\end_layout - -\begin_layout Enumerate -Greater than: > -\end_layout - -\begin_layout Enumerate -Greater than or equal to: >= -\end_layout - -\begin_layout Enumerate -Less than: < -\end_layout - -\begin_layout Enumerate -Less than or equal to: <= -\end_layout - -\begin_layout Enumerate -Equal to: = -\end_layout - -\begin_layout Enumerate -Not equal to: != -\end_layout - -\begin_layout Enumerate -String wise equal to: eq -\end_layout - -\begin_layout Enumerate -String wise not equal to: ne -\end_layout - -\begin_layout Standard -And finally comes the number used in the evaluation. - So to -\series bold -grab -\series default - all records with a sequence length greater than 30: -\end_layout - -\begin_layout LyX-Code -... - length_seq | grab --eval='SEQ_LEN > 30' -\end_layout - -\begin_layout Standard -If you want to locate all records containing the pattern 'human' and where - the sequence length is greater that 30, you do this by running the stream - through -\series bold -grab -\series default - twice: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern='human' | length_seq | grab --eval='SEQ_LEN > 30' -\end_layout - -\begin_layout Standard -Finally, it is possible to do fast matching of expressions from a file using - the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -exact switch. - Each of these expressions has to be matched exactly over the entrie length, - which if useful if you have a file with accession numbers, that you want - to locate in the stream: -\end_layout - -\begin_layout LyX-Code -... - | grab --exact acc_no.txt | ... -\end_layout - -\begin_layout Standard -Using -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -exact is much faster than using -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in, because with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -exact the expression has to be complete matches, where -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in looks for subpatterns. -\end_layout - -\begin_layout Standard -NB! To get the best speed performance, use the most restrictive -\series bold -grab -\series default - first. -\end_layout - -\begin_layout Subsection -How to remove keys from records? -\end_layout - -\begin_layout Standard -To remove one or more specific keys from all records in the data stream - use -\series bold -remove_keys -\series default - like this: -\end_layout - -\begin_layout LyX-Code -... - | remove_keys --keys=SEQ,SEQ_NAME -\end_layout - -\begin_layout Standard -In the above example SEQ and SEQ_NAME will be removed from all records if - they exists in these. - If all keys are removed from a record, then the record will be removed. -\end_layout - -\begin_layout Subsection -How to rename keys in records? -\end_layout - -\begin_layout Standard -Sometimes you want to rename a record key, -\emph on -e.g. - -\emph default - if you have read in a two column table with sequence name and sequence - in each column (see -\begin_inset LatexCommand ref -reference "sub:How-to-read-table" - -\end_inset - -) without specifying the key names, then the sequence name will be called - V0 and the sequence V1 as default in the -\series bold -read_tab -\series default - biopiece. - To rename the V0 and V1 keys we need to run the stream through -\series bold -rename_keys -\series default - twice (one for each key to rename): -\end_layout - -\begin_layout LyX-Code -... - | rename_keys --keys=V0,SEQ_NAME | rename_keys --keys=V1,SEQ -\end_layout - -\begin_layout Standard -The first instance of -\series bold -rename_keys -\series default - replaces all the V0 keys with SEQ_NAME, and the second instance of -\series bold -rename_keys -\series default - replaces all the V1 keys with SEQ. - -\emph on -Et viola -\emph default - the data can now be used in the biopieces that requires these keys. -\end_layout - -\begin_layout Section -Manipulating Sequences -\end_layout - -\begin_layout Subsection -How to get sequence lengths? -\end_layout - -\begin_layout Standard -The length for sequences in records can be determined with -\series bold -length_seq -\series default -, which adds the key SEQ_LEN to each record with the sequence length as - the value. - It also generates an extra record that is emitted last with the key TOTAL_SEQ_L -EN showing the total length of all the sequences. -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_seq -\end_layout - -\begin_layout Standard -It is also possible to determine the sequence length using the generic tool - -\series bold -length_vals -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-get-value_length" - -\end_inset - -, which determines the length of the values for a given list of keys: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_vals --keys=SEQ -\end_layout - -\begin_layout Standard -To obtain the total length of all sequences use -\series bold -sum_vals -\series default - like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_vals --keys=SEQ -\end_layout - -\begin_layout LyX-Code -| sum_vals --keys=SEQ_LEN -\end_layout - -\begin_layout Standard -The biopiece -\series bold -analyze_seq -\series default - will also determine the length of each sequence (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-analyze" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to analyze sequence composition? -\begin_inset LatexCommand label -name "sub:How-to-analyze" - -\end_inset - - -\end_layout - -\begin_layout Standard -If you want to find out the sequence type, composition, length, as well - as GC content, indel content and proportions of soft and hard masked sequence, - then use -\series bold -analyze_seq -\series default -. - This handy biopiece will determine all these things per sequence from which - it is easy to get an overview using the -\series bold -write_tab -\series default - biopiece to output a table (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-tab" - -\end_inset - -). - So in order to determine the sequence composition of a FASTA file with - just one entry containing the sequence 'ATCG' we just read the data with - -\series bold -read_fasta -\series default - and run the output through -\series bold -analyze_seq -\series default - which will add the analysis to the record like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq ... -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:D: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -MIX_INDEX: 0.55 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:W: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:G: 16 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SOFT_MASK%: 63.75 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:B: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:V: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -HARD_MASK%: 0.00 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:H: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:S: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:N: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:.: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -GC%: 35.00 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:A: 8 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:Y: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:M: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:T: 44 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SEQ_TYPE: DNA -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:K: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:~: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SEQ: TTTCAGTTTGGGACGGAGTAAGGCCTTCCtttttttttttttttttttttttttttttgagaccgagtcttgctc -tgtcg -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SEQ_LEN: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -80 RES:R: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:C: 12 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:-: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:U: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize ---- -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout Standard -Now to make a table of how may As, Ts, Cs, and Gs you can add the following: -\end_layout - -\begin_layout LyX-Code -... - | analyze_seq | write_tab --keys=RES:A,RES:T,RES:C,RES:G -\end_layout - -\begin_layout Standard -Or if you want to see the proportions of hard and soft masked sequence: -\end_layout - -\begin_layout LyX-Code -... - | analyse_seq | write_tab --keys=HARD_MASK%,SOFT_MASK% -\end_layout - -\begin_layout Standard -If you have a stack of sequences in one file and you want to determine the - mean GC content you can do it using the -\series bold -mean_vals -\series default - biopiece: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq | mean_vals --keys=GC% -\end_layout - -\begin_layout Standard -Or if you want the total count of Ns you can use -\series bold -sum_vals -\series default - like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq | sum_vals --keys=RES:N -\end_layout - -\begin_layout Standard -The MIX_INDEX key is calculated as the count of the most common residue - over the sequence length, and can be used as a cut-off for removing sequence - tags consisting of mostly one nucleotide: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq | grab --eval='MIX_INDEX<0.85' -\end_layout - -\begin_layout Subsection -How to extract subsequences? -\begin_inset LatexCommand label -name "sub:How-to-extract" - -\end_inset - - -\end_layout - -\begin_layout Standard -In order to extract a subsequence from a longer sequence use the biopiece - extract_seq, which will replace the sequence in the record with the subsequence - (this behaviour should probably be modified to be dependant of a -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -replace or a -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_replace switch -\begin_inset Note Note -status collapsed - -\begin_layout Standard -also in split_seq -\end_layout - -\end_inset - -). - So to extract the first 20 residues from all sequences do (first residue - is designated 1): -\end_layout - -\begin_layout LyX-Code -... - | extract_seq --beg=1 --len=20 -\end_layout - -\begin_layout Standard -You can also specify a begin and end coordinate set: -\end_layout - -\begin_layout LyX-Code -... - | extract_seq --beg=20 --end=40 -\end_layout - -\begin_layout Standard -If you want the subsequences from position 20 to the sequence end do: -\end_layout - -\begin_layout LyX-Code -... - | extract_seq --beg=20 -\end_layout - -\begin_layout Standard -If you want to extract subsequences a given distance from the sequence end - you can do this by reversing the sequence with the biopiece -\series bold -reverse_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-reverse-seq" - -\end_inset - -, followed by -\series bold -extract_seq -\series default - to get the subsequence, and then -\series bold -reverse_seq -\series default - again to get the subsequence back in the original orientation: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | reverse_seq -\end_layout - -\begin_layout LyX-Code -| extract_seq --beg=10 --len=10 | reverse_seq -\end_layout - -\begin_layout Subsection -How to get genomic sequence? -\begin_inset LatexCommand label -name "sub:How-to-get-genomic-sequence" - -\end_inset - - -\end_layout - -\begin_layout Standard -The biopiece -\series bold -get_genomic_seq -\series default - can extract subsequences for a given genome specified with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -genome switch explicitly using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -beg and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -end/ -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -len switches: -\end_layout - -\begin_layout LyX-Code -get_genome_seq --genome= --beg=1 --len=100 -\end_layout - -\begin_layout Standard -Alternatively, -\series bold -get_genome_seq -\series default - can be used to append the corresponding sequence to BED, PSL, and BLAST - records: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= | get_genome_seq --genome= -\end_layout - -\begin_layout Standard -It is also possible to include flaking sequence using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -flank switch. - So to include 50 nucleotides upstream and 50 nucleotides downstream for - each BED entry do: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= | get_genome_seq --genome= --flank=50 -\end_layout - -\begin_layout Subsection -How to upper-case sequences? -\end_layout - -\begin_layout Standard -Sequences can be shifted from lower case to upper case using -\series bold -uppercase_seq -\series default -: -\end_layout - -\begin_layout LyX-Code -... - | uppercase_seq -\end_layout - -\begin_layout Subsection -How to reverse sequences? -\begin_inset LatexCommand label -name "sub:How-to-reverse-seq" - -\end_inset - - -\end_layout - -\begin_layout Standard -The order of residues in a sequence can be reversed using reverse_seq: -\end_layout - -\begin_layout LyX-Code -... - | reverse_seq -\end_layout - -\begin_layout Standard -Note that in order to reverse/complement a sequence you also need the -\series bold -complement_seq -\series default - biopiece (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-complement" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to complement sequences? -\begin_inset LatexCommand label -name "sub:How-to-complement" - -\end_inset - - -\end_layout - -\begin_layout Standard -DNA and RNA sequences can be complemented with -\series bold -complement_seq -\series default -, which automagically determines the sequence type: -\end_layout - -\begin_layout LyX-Code -... - | complement_seq -\end_layout - -\begin_layout Standard -Note that in order to reverse/complement a sequence you also need the -\series bold -reverse_seq -\series default - biopiece (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-reverse-seq" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to remove indels from sequnces? -\end_layout - -\begin_layout Standard -Indels can be removed from sequences with the -\series bold -remove_indels -\series default - biopiece. - This is useful if you have aligned some sequences (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-align" - -\end_inset - -) and extracted (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-extract" - -\end_inset - -) a block of subsequences from the alignment and you want to use these sequence - in a search where you need to remove the indels first. - '-', '~', and '.' are considered indels: -\end_layout - -\begin_layout LyX-Code -... - | remove_indels -\end_layout - -\begin_layout Subsection -How to shuffle sequences? -\end_layout - -\begin_layout Standard -All residues in sequences in the stream can be shuffled to random positions - using the -\series bold -shuffle_seq -\series default - biopiece: -\end_layout - -\begin_layout LyX-Code -... - | shuffle_seq -\end_layout - -\begin_layout Subsection -How to split sequences into overlapping subsequences? -\end_layout - -\begin_layout Standard -Sequences can be slit into overlapping subsequences with the -\series bold -split_seq -\series default - biopiece. -\end_layout - -\begin_layout LyX-Code -... - | split_seq --word_size=20 --uniq -\end_layout - -\begin_layout Subsection -How to determine the oligo frequency? -\end_layout - -\begin_layout Standard -In order to determine if any oligo usage is over represented in one or more - sequences you can determine the frequency of oligos of a given size with - -\series bold -oligo_freq -\series default -: -\end_layout - -\begin_layout LyX-Code -... - | oligo_freq --word_size=4 -\end_layout - -\begin_layout Standard -And if you have more than one sequence and want to accumulate the frequences - you need the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -all switch: -\end_layout - -\begin_layout LyX-Code -... - | oligo_freq --word_size=4 --all -\end_layout - -\begin_layout Standard -To get a meaningful result you need to write the resulting frequencies as - a table with -\series bold -write_tab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-tab" - -\end_inset - -), but first it is important to -\series bold -grab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-grab" - -\end_inset - -) the records with the frequencies to avoid full length sequences in the - table: -\end_layout - -\begin_layout LyX-Code -... - | oligo_freq --word_size=4 --all | grab --pattern=OLIGO --keys_only -\end_layout - -\begin_layout LyX-Code -| write_tab --no_stream -\end_layout - -\begin_layout Standard -And the resulting frequency table can be sorted with Unix sort (man sort). -\end_layout - -\begin_layout Subsection -How to search for sequences in genomes? -\end_layout - -\begin_layout Standard -See the following biopiece: -\end_layout - -\begin_layout Itemize - -\series bold -patscan_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-patscan" - -\end_inset - - -\end_layout - -\begin_layout Itemize - -\series bold -blat_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAT" - -\end_inset - - -\end_layout - -\begin_layout Itemize - -\series bold -blast_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAST" - -\end_inset - - -\end_layout - -\begin_layout Itemize - -\series bold -vmatch_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-Vmatch" - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to search sequences for a pattern? -\begin_inset LatexCommand label -name "sub:How-to-use-patscan" - -\end_inset - - -\end_layout - -\begin_layout Standard -It is possible to search sequences in the data stream for patterns using - the -\series bold -patscan_seq -\series default - biopiece which utilizes the powerful scan_for_matches engine. - Consult the documentation for scan_for_matches in order to learn how to - define patterns (the documentation is included in Appendix\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sec:scan_for_matches-README" - -\end_inset - -). -\end_layout - -\begin_layout Standard -To search all sequences for a simple pattern consisting of the sequence - ATCGATCG allowing for 3 mismatches, 2 insertions and 1 deletion: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | patscan_seq --pattern='ATCGATCG[3,2,1]' -\end_layout - -\begin_layout Standard -The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern switch takes a comma seperated list of patterns, so if you want - to search for more that one pattern do: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern='ATCGATCG[3,2,1],GCTAGCTA[3,2,1]' -\end_layout - -\begin_layout Standard -It is also possible to have a list of different patterns to search for in - a file with one pattern per line. - In order to get -\series bold -patscan_seq -\series default - to read these patterns use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in switch: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern_in= -\end_layout - -\begin_layout Standard -To also scan the complementary strand in nucleotide sequences ( -\series bold -patscan_seq -\series default - automagically determines the sequence type) you need to add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -comp switch: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern= --comp -\end_layout - -\begin_layout Standard -It is also possible to use -\series bold -patscan_seq -\series default - to output those records that does not contain a certain pattern by using - the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -invert switch: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern= --invert -\end_layout - -\begin_layout Standard -Finally, -\series bold -patscan_seq -\series default - can also scan for patterns in a given genome sequence, instead of sequences - in the stream, using the -\begin_inset ERT -status open - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -genome switch: -\end_layout - -\begin_layout LyX-Code -patscan --pattern= --genome= -\end_layout - -\begin_layout Subsection -How to use BLAT for sequence search? -\begin_inset LatexCommand label -name "sub:How-to-use-BLAT" - -\end_inset - - -\end_layout - -\begin_layout Standard -Sequences in the data stream can be matched against supported genomes using - -\series bold -blat_seq -\series default - which is a biopiece using BLAT as the name might suggest. - Currently only Mouse and Human genomes are available and it is not possible - to use OOC files since there is still a need for a local repository for - genome files. - Otherwise it is just: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | blat_seq --genome= -\end_layout - -\begin_layout Standard -The search results can then be written to file with -\series bold -write_psl -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-PSL" - -\end_inset - -) or -\series bold -write_bed -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-BED" - -\end_inset - -) allthough with -\series bold -write_bed -\series default - some information will be lost). - It is also possible to plot chromosome distribution of the search results - using -\series bold -plot_chrdist -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-plot-chrdist" - -\end_inset - -) or the distribution of the match lengths using -\series bold -plot_lendist -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-plot-lendist" - -\end_inset - -) or a karyogram with the hits using -\series bold -plot_karyogram -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-plot-karyogram" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to use BLAST for sequence search? -\begin_inset LatexCommand label -name "sub:How-to-use-BLAST" - -\end_inset - - -\end_layout - -\begin_layout Standard -Two biopieces exist for blasting sequences: -\series bold -create_blast_db -\series default - is used to create the BLAST database required for BLAST which is queried - using the biopiece -\series bold -blast_seq -\series default -. - So in order to create a BLAST database from sequences in the data stream - you simple run: -\end_layout - -\begin_layout LyX-Code -... - | create_blast_db --database=my_database --no_stream -\end_layout - -\begin_layout Standard -The type of sequence to use for the database is automagically determined - by -\series bold -create_blast_db -\series default -, but don't have a mixture of peptide and nucleic acids sequences in the - stream. - The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -database switch takes a path as argument, but will default to 'blastdb_ if not set. -\end_layout - -\begin_layout Standard -The resulting database can now be queried with sequences in another data - stream using -\series bold -blast_seq -\series default -: -\end_layout - -\begin_layout LyX-Code -... - | blast_seq --database=my_database -\end_layout - -\begin_layout Standard -Again, the sequence type is determined automagically and the appropriate - BLAST program is guessed (see below table), however, the program name can - be overruled with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -program switch. -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Tabular - - - - - - - -\begin_inset Text - -\begin_layout Standard -Subject sequence -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Query sequence -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Program guess -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -blastn -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -blastp -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -blastx -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -tblastn -\end_layout - -\end_inset - - - - -\end_inset - - -\end_layout - -\begin_layout Standard -Finally, it is also possible to use -\series bold -blast_seq -\series default - for blasting sequences agains a preformatted genome using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -genome switch instead of the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -database switch: -\end_layout - -\begin_layout LyX-Code -... - | blast_seq --genome= -\end_layout - -\begin_layout Subsection -How to use Vmatch for sequence search? -\begin_inset LatexCommand label -name "sub:How-to-use-Vmatch" - -\end_inset - - -\end_layout - -\begin_layout Standard -The powerful suffix array software package Vmatch -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://www.vmatch.de/" - -\end_inset - - -\end_layout - -\end_inset - - can be used for exact mapping of sequences against indexed genomes using - the biopiece -\series bold -vmatch_seq -\series default -, which will e.g. - map 700000 ESTs to the human genome locating all 160 mio hits in less than - an hour. - Only nucleotide sequences and sequences longer than 11 nucleotides will - be mapped. - It is recommended that sequences consisting of mostly one nucleotide type - are removed. - This can be done with the -\series bold -analyze_seq -\series default - biopiece -\begin_inset LatexCommand eqref -reference "sub:How-to-analyze" - -\end_inset - -. -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= -\end_layout - -\begin_layout Standard -It is also possible to allow for mismatches using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -hamming_dist switch. - So to allow for 2 mismatches: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=2 -\end_layout - -\begin_layout Standard -Or to allow for 10% mismathing nucleotides: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=10p -\end_layout - -\begin_layout Standard -To allow both indels and mismatches use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -edit_dist switch. - So to allow for one mismatch or one indel: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=1 -\end_layout - -\begin_layout Standard -Or to allow for 5% indels or mismatches: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=5p -\end_layout - -\begin_layout Standard -Note that using -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -hamming_dist or -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -edit_dist greatly slows down vmatch considerably --- use with care. -\end_layout - -\begin_layout Standard -The resulting SCORE key can be replaced to hold the number of genome matches - of a given sequence (multi-mappers) is the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -count switch is given. -\end_layout - -\begin_layout Subsection -How to find all matches between sequences? -\begin_inset LatexCommand label -name "sub:How-to-find-matches" - -\end_inset - - -\end_layout - -\begin_layout Standard -All matches between two sequences can be determined with the biopiece -\series bold -match_seq -\series default -. - The match finding engine underneath the hood of -\series bold -match_seq -\series default - is the super fast suffix tree program MUMmer -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://mummer.sourceforge.net/" - -\end_inset - - -\end_layout - -\end_inset - -, which will locate all forward and reverse matches between huge sequences - in a matter of minutes (if the repeat count is not too high and if the - word size used is appropriate). - Matching two -\emph on -Helicobacter pylori -\emph default - genomes (1.7Mbp) takes around 10 seconds: -\end_layout - -\begin_layout LyX-Code -... - | match_seq --word_size=20 --direction=both -\end_layout - -\begin_layout Standard -The output from -\series bold -match_seq -\series default - can be used to generate a dot plot with -\series bold -plot_matches -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-generate-dotplot" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to align sequences? -\begin_inset LatexCommand label -name "sub:How-to-align" - -\end_inset - - -\end_layout - -\begin_layout Standard -Sequences in the stream can be aligned with the -\series bold -align_seq -\series default - biopiece that uses Muscle -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://www.drive5.com/muscle/muscle.html" - -\end_inset - - -\end_layout - -\end_inset - - as aligment engine. - Currently you cannot change any of the Muscle alignment parameters and - -\series bold -align_seq -\series default - will create an alignment based on the defaults (which are really good!): -\end_layout - -\begin_layout LyX-Code -... - | align_seq -\end_layout - -\begin_layout Standard -The aligned output can be written to file in FASTA format using -\series bold -write_fasta -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-fasta" - -\end_inset - -) or in pretty text using -\series bold -write_align -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-alignment" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to create a weight matrix? -\end_layout - -\begin_layout Standard -If you want a weight matrix to show the sequence composition of a stack - of sequences you can use the biopiece create_weight_matrix: -\end_layout - -\begin_layout LyX-Code -... - | create_weight_matrix -\end_layout - -\begin_layout Standard -The result can be output in percent using the -\begin_inset ERT -status open - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -percent switch: -\end_layout - -\begin_layout LyX-Code -... - | create_weight_matrix --percent -\end_layout - -\begin_layout Standard -The weight matrix can be written as tabular output with -\series bold -write_tab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-tab" - -\end_inset - -) after removeing the records containing SEQ with -\series bold -grab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-grab" - -\end_inset - -): -\end_layout - -\begin_layout LyX-Code -... - | create_weight_matrix | grab --invert --keys=SEQ --keys_only -\end_layout - -\begin_layout LyX-Code -| write_tab --no_stream -\end_layout - -\begin_layout Standard -The V0 column will hold the residue, while the rest of the columns will - hold the frequencies for each sequence position. -\end_layout - -\begin_layout Section -Plotting -\end_layout - -\begin_layout Standard -There exists several biopieces for plotting. - Some of these are based on GNUplot -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://www.gnuplot.info/" - -\end_inset - - -\end_layout - -\end_inset - -, which is an extremely powerful platform to generate all sorts of plots - and even though GNUplot has quite a steep learning curve, the biopieces - utilizing GNUplot are simple to use. - GNUplot is able to output a lot of different formats (called terminals - in GNUplot), but the biopieces focusses on three formats only: -\end_layout - -\begin_layout Enumerate -The 'dumb' terminal is default to the GNUplot based biopieces and will output - a plot in crude ASCII text (Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Dumb-terminal" - -\end_inset - -). - This is quite nice for a quick and dirty plot to get an overview of your - data . -\end_layout - -\begin_layout Enumerate -The 'post' or 'postscript' terminal output postscript code which is publication - grade graphics that can be viewed with applications such as Ghostview, - Photoshop, and Preview. -\end_layout - -\begin_layout Enumerate -The 'svg' terminal output's scalable vector graphics (SVG) which is a vector - based format. - SVG is great because you can edit the resulting plot using Photoshop or - Inkscape -\begin_inset Foot -status collapsed - -\begin_layout Standard -Inkscape is a really handy drawing program that is free and open source. - Availble at -\begin_inset LatexCommand htmlurl -target "http://www.inkscape.org" - -\end_inset - - -\end_layout - -\end_inset - - if you want to add additional labels, captions, arrows, and so on and then - save the result in different formats, such as postscript without loosing - resolution. -\end_layout - -\begin_layout Standard -The biopieces for plotting that are not based on GNUplot only output SVG - (that may change in the future). -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename lendist_ascii.png - lyxscale 70 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Dumb-terminal" - -\end_inset - -Dumb terminal -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -The output of a length distribution plot in the default 'dumb terminal' - to the terminal window. - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a histogram? -\begin_inset LatexCommand label -name "How-to-plot-histogram" - -\end_inset - - -\end_layout - -\begin_layout Standard -A generic histogram for a given value can be plotted with the biopiece -\series bold -plot_histogram -\series default - (Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Histogram" - -\end_inset - -): -\end_layout - -\begin_layout LyX-Code -... - | plot_histogram --key=TISSUE --no_stream -\end_layout - -\begin_layout Standard -(Figure missing) -\end_layout - -\begin_layout Standard -\noindent -\align left -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename histogram.png - lyxscale 70 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Histogram" - -\end_inset - -Histogram -\end_layout - -\end_inset - - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a length distribution? -\begin_inset LatexCommand label -name "sub:How-to-plot-lendist" - -\end_inset - - -\end_layout - -\begin_layout Standard -Plotting of length distributions, weather sequence lengths, patterns lengths, - hit lengths, -\emph on -etc. - -\emph default - is a really handy thing and can be done with the the biopiece -\series bold -plot_lendist -\series default -. - If you have a file with FASTA entries and want to plot the length distribution - you do it like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_seq -\end_layout - -\begin_layout LyX-Code -| plot_lendist --key=SEQ_LEN --no_stream -\end_layout - -\begin_layout Standard -The result will be written to the default dumb terminal and will look like - Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Dumb-terminal" - -\end_inset - -. -\end_layout - -\begin_layout Standard -If you instead want the result in postscript format you can do: -\end_layout - -\begin_layout LyX-Code -... - | plot_lendist --key=SEQ_LEN --terminal=post --result_out=file.ps -\end_layout - -\begin_layout Standard -That will generate the plot and save it to file, but not interrupt the data - stream which can then be used in further analysis. - You can also save the plot implicetly using '>', however, it is then important - to terminate the stream with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch: -\end_layout - -\begin_layout LyX-Code -... - | plot_lendist --key=SEQ_LEN --terminal=post --no_stream > file.ps -\end_layout - -\begin_layout Standard -The resulting plot can be seen in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Length-distribution" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard - -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename lendist.ps - lyxscale 50 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Length-distribution" - -\end_inset - -Length distribution -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -Length distribution of 630 piRNA like RNAs. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a chromosome distribution? -\begin_inset LatexCommand label -name "sub:How-to-plot-chrdist" - -\end_inset - - -\end_layout - -\begin_layout Standard -If you have the result of a sequence search against a multi chromosome genome, - it is very practical to be able to plot the distribution of search hits - on the different chromosomes. - This can be done with -\series bold -plot_chrdist -\series default -: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | blat_genome | plot_chrdist --no_stream -\end_layout - -\begin_layout Standard -The above example will result in a crude plot using the 'dumb' terminal, - and if you want to mess around with the results from the BLAT search you - probably want to save the result to file first (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-PSL" - -\end_inset - -). - To plot the chromosome distribution from the saved search result you can - do: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in=file.bed | plot_chrdist --terminal=post --result_out=plot.ps -\end_layout - -\begin_layout Standard -That will result in the output show in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Chromosome-distribution" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard - -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename chrdist.ps - lyxscale 50 - width 12cm - rotateAngle 90 - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Chromosome-distribution" - -\end_inset - -Chromosome distribution -\end_layout - -\end_inset - - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to generate a dotplot? -\begin_inset LatexCommand label -name "sub:How-to-generate-dotplot" - -\end_inset - - -\end_layout - -\begin_layout Standard -A dotplot is a powerful way to get an overview of the size and location - of sequence insertions, deletions, and duplications between two sequences. - Generating a dotplot with biopieces is a two step process where you initially - find all matches between two sequences using the tool -\series bold -match_seq -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-find-matches" - -\end_inset - -) and plot the resulting matches with -\series bold -plot_matches -\series default -. - Matching and plotting two -\emph on -Helicobacter pylori -\emph default - genomes (1.7Mbp) takes around 10 seconds: -\end_layout - -\begin_layout LyX-Code -... - | match_seq | plot_matches --terminal=post --result_out=plot.ps -\end_layout - -\begin_layout Standard -The resulting dotplot is in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Dotplot" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename dotplot.ps - lyxscale 50 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Dotplot" - -\end_inset - -Dotplot -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -Forward matches are displayed in green while reverse matches are displayed - in red. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a sequence logo? -\end_layout - -\begin_layout Standard -Sequence logos can be generate with -\series bold -plot_seqlogo -\series default -. - The sequnce type is determined automagically and an entropy scale of 2 - bits and 4 bits is used for nucleotide and peptide sequences, respectively -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand htmlurl -target "http://www.ccrnp.ncifcrf.gov/~toms/paper/hawaii/latex/node5.html" - -\end_inset - - -\end_layout - -\end_inset - -. -\end_layout - -\begin_layout LyX-Code -... - | plot_seqlogo --no_stream --result_out=seqlogo.svg -\end_layout - -\begin_layout Standard -An example of a sequence logo can be seen in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Sequence-logo" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename seqlogo.png - lyxscale 50 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Sequence-logo" - -\end_inset - -Sequence logo -\end_layout - -\end_inset - - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a karyogram? -\begin_inset LatexCommand label -name "sub:How-to-plot-karyogram" - -\end_inset - - -\end_layout - -\begin_layout Standard -To plot search hits on genomes use -\series bold -plot_karyogram -\series default -, which will output a nice karyogram in SVG graphics: -\end_layout - -\begin_layout LyX-Code -... - | plot_karyogram --result_out=karyogram.svg -\end_layout - -\begin_layout Standard -The banding data is taken from the UCSC genome browser database and currently - only Human and Mouse is supported. - Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Karyogram" - -\end_inset - - shows the distribution of piRNA like RNAs matched to the Human genome. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename karyogram.png - lyxscale 35 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Karyogram" - -\end_inset - -Karyogram -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -Hits from a search of piRNA like RNAs in the Human genome is displayed as - short horizontal bars. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Section -Uploading Results -\end_layout - -\begin_layout Subsection -How do I display my results in the UCSC Genome Browser? -\end_layout - -\begin_layout Standard -Results from the list of biopieces below can be uploaded directly to a local - mirror of the UCSC Genome Browser using the biopiece -\series bold -upload_to_ucsc -\series default -: -\end_layout - -\begin_layout Itemize -patscan_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-patscan" - -\end_inset - - -\end_layout - -\begin_layout Itemize -blat_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAT" - -\end_inset - - -\end_layout - -\begin_layout Itemize -blast_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAST" - -\end_inset - - -\end_layout - -\begin_layout Itemize -vmatch_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-Vmatch" - -\end_inset - - -\end_layout - -\begin_layout Standard -The syntax for uploading data the most simple way requires two mandatory - switches: -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -database, which is the UCSC database name (such as hg18, mm9, etc.) and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -table which should be the users initials followed by an underscore and a - short description of the data: -\end_layout - -\begin_layout LyX-Code -... - | upload_to_ucsc --database=hg18 --table=mah_snoRNAs -\end_layout - -\begin_layout Standard -The -\series bold -upload_to_ucsc -\series default - biopiece modifies the users ~/ucsc/my_tracks.ra file automagically (a backup - is created with the name ~/ucsc/my_tracks.ra~) with default values that - can be overridden using the following switches: -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -short_label - Short label for track - Default=database->table -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -long_label - Long label for track - Default=database->table -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -group - Track group name - Default= -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -priority - Track display priority - Default=1 -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -color - Track color - Default=147,73,42 -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -chunk_size - Chunks for loading - Default=10000000 -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -visibility - Track visibility - Default=pack -\end_layout - -\begin_layout Standard -Also, data in BED or PSL format can be uploaded with -\series bold -upload_to_ucsc -\series default - as long as these reference to genomes and chromosomes existing in the UCSC - Genome Browser: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= | upload_to_ucsc ... -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code -read_psl --data_in= | upload_to_ucsc ... -\end_layout - -\begin_layout Section -Power Scripting -\end_layout - -\begin_layout Standard -It is possible to do commandline scripting of biopiece records using Perl. - Because a biopiece record essentially is a hash structure, you can pass - records to -\series bold -bioscript -\series default - command, which is a wrapper around the Perl executable that allows direct - manipulations of the records using the power of Perl. -\end_layout - -\begin_layout Standard -In the below example we replace in all records the value to the CHR key - with a forthrunning number: -\end_layout - -\begin_layout LyX-Code -... - | bioscript 'while($r=get_record( -\backslash -*STDIN)){$r->{CHR}=$i++; put_record($r)}' -\end_layout - -\begin_layout Standard -Something more useful would probably be to create custom FASTA headers. - E.g. - if we read in a BED file, lookup the genomic sequence, create a custom - FASTA header with -\series bold -bioscript -\series default - and output FASTA entries: -\end_layout - -\begin_layout LyX-Code -... - | bioscript 'while($r=get_record( -\backslash -*STDIN)){$r->{SEQ_NAME}= // -\end_layout - -\begin_layout LyX-Code -join("_",$r->{CHR},$r->{CHR_BEG},$r->{CHR_END}); put_record($r)}' -\end_layout - -\begin_layout Standard -And the output: -\end_layout - -\begin_layout LyX-Code ->chr2L_21567527_21567550 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_693380_693403 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_13859534_13859557 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_9005090_9005113 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_2106825_2106848 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_14649031_14649054 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout Section -Trouble shooting -\end_layout - -\begin_layout Standard -Shoot the messenger! -\end_layout - -\begin_layout Section -\start_of_appendix -Keys -\begin_inset LatexCommand label -name "sec:Keys" - -\end_inset - - -\end_layout - -\begin_layout Standard -HIT -\end_layout - -\begin_layout Standard -HIT_BEG -\end_layout - -\begin_layout Standard -HIT_END -\end_layout - -\begin_layout Standard -HIT_LEN -\end_layout - -\begin_layout Standard -HIT_NAME -\end_layout - -\begin_layout Standard -PATTERN -\end_layout - -\begin_layout Section -Switches -\begin_inset LatexCommand label -name "sec:Switches" - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_in -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_out -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -data_in -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num -\end_layout - -\begin_layout Section -scan_for_matches README -\begin_inset LatexCommand label -name "sec:scan_for_matches-README" - -\end_inset - - -\end_layout - -\begin_layout LyX-Code - scan_for_matches: -\end_layout - -\begin_layout LyX-Code - A Program to Scan Nucleotide or Protein Sequences for Matching Patterns -\end_layout - -\begin_layout LyX-Code - Ross Overbeek -\end_layout - -\begin_layout LyX-Code - MCS -\end_layout - -\begin_layout LyX-Code - Argonne National Laboratory -\end_layout - -\begin_layout LyX-Code - Argonne, IL 60439 -\end_layout - -\begin_layout LyX-Code - USA -\end_layout - -\begin_layout LyX-Code -Scan_for_matches is a utility that we have written to search for -\end_layout - -\begin_layout LyX-Code -patterns in DNA and protein sequences. - I wrote most of the code, -\end_layout - -\begin_layout LyX-Code -although David Joerg and Morgan Price wrote sections of an -\end_layout - -\begin_layout LyX-Code -earlier version. - The whole notion of pattern matching has a rich -\end_layout - -\begin_layout LyX-Code -history, and we borrowed liberally from many sources. - However, it is -\end_layout - -\begin_layout LyX-Code -worth noting that we were strongly influenced by the elegant tools -\end_layout - -\begin_layout LyX-Code -developed and distributed by David Searls. - My intent is to make the -\end_layout - -\begin_layout LyX-Code -existing tool available to anyone in the research community that might -\end_layout - -\begin_layout LyX-Code -find it useful. - I will continue to try to fix bugs and make suggested -\end_layout - -\begin_layout LyX-Code -enhancements, at least until I feel that a superior tool exists. -\end_layout - -\begin_layout LyX-Code -Hence, I would appreciate it if all bug reports and suggestions are -\end_layout - -\begin_layout LyX-Code -directed to me at Overbeek@mcs.anl.gov. - -\end_layout - -\begin_layout LyX-Code -I will try to log all bug fixes and report them to users that send me -\end_layout - -\begin_layout LyX-Code -their email addresses. - I do not require that you give me your name -\end_layout - -\begin_layout LyX-Code -and address. - However, if you do give it to me, I will try to notify -\end_layout - -\begin_layout LyX-Code -you of serious problems as they are discovered. -\end_layout - -\begin_layout LyX-Code -Getting Started: -\end_layout - -\begin_layout LyX-Code - The distribution should contain at least the following programs: -\end_layout - -\begin_layout LyX-Code - README - This document -\end_layout - -\begin_layout LyX-Code - ggpunit.c - One of the two source files -\end_layout - -\begin_layout LyX-Code - scan_for_matches.c - The second source file -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - run_tests - A perl script to test things -\end_layout - -\begin_layout LyX-Code - show_hits - A handy perl script -\end_layout - -\begin_layout LyX-Code - test_dna_input - Test sequences for DNA -\end_layout - -\begin_layout LyX-Code - test_dna_patterns - Test patterns for DNA scan -\end_layout - -\begin_layout LyX-Code - test_output - Desired output from test -\end_layout - -\begin_layout LyX-Code - test_prot_input - Test protein sequences -\end_layout - -\begin_layout LyX-Code - test_prot_patterns - Test patterns for proteins -\end_layout - -\begin_layout LyX-Code - testit - a perl script used for test -\end_layout - -\begin_layout LyX-Code - Only the first three files are required. - The others are useful, -\end_layout - -\begin_layout LyX-Code - but only if you have Perl installed on your system. - If you do -\end_layout - -\begin_layout LyX-Code - have Perl, I suggest that you type -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - which perl -\end_layout - -\begin_layout LyX-Code - to find out where it installed. - On my system, I get the following -\end_layout - -\begin_layout LyX-Code - response: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - clone% which perl -\end_layout - -\begin_layout LyX-Code - /usr/local/bin/perl -\end_layout - -\begin_layout LyX-Code - indicating that Perl is installed in /usr/local/bin. - Anyway, once -\end_layout - -\begin_layout LyX-Code - you know where it is installed, edit the first line of files -\end_layout - -\begin_layout LyX-Code - testit -\end_layout - -\begin_layout LyX-Code - show_hits -\end_layout - -\begin_layout LyX-Code - replacing /usr/local/bin/perl with the appropriate location. - I -\end_layout - -\begin_layout LyX-Code - will assume that you can do this, although it is not critical (it -\end_layout - -\begin_layout LyX-Code - is needed only to test the installation and to use the "show_hits" -\end_layout - -\begin_layout LyX-Code - utility). - Perl is not required to actually install and run -\end_layout - -\begin_layout LyX-Code - scan_for_matches. - -\end_layout - -\begin_layout LyX-Code - If you do not have Perl, I suggest you get it and install it (it -\end_layout - -\begin_layout LyX-Code - is a wonderful utility). - Information about Perl and how to get it -\end_layout - -\begin_layout LyX-Code - can be found in the book "Programming Perl" by Larry Wall and -\end_layout - -\begin_layout LyX-Code - Randall L. - Schwartz, published by O'Reilly & Associates, Inc. -\end_layout - -\begin_layout LyX-Code - To get started, you will need to compile the program. - I do this -\end_layout - -\begin_layout LyX-Code - using -\end_layout - -\begin_layout LyX-Code - gcc -O -o scan_for_matches ggpunit.c scan_for_matches.c -\end_layout - -\begin_layout LyX-Code - If you do not use GNU C, use -\end_layout - -\begin_layout LyX-Code - cc -O -DCC -o scan_for_matches ggpunit.c scan_for_matches.c -\end_layout - -\begin_layout LyX-Code - which works on my Sun. - -\end_layout - -\begin_layout LyX-Code - Once you have compiled scan_for_matches, you can verify that it -\end_layout - -\begin_layout LyX-Code - works with -\end_layout - -\begin_layout LyX-Code - clone% run_tests tmp -\end_layout - -\begin_layout LyX-Code - clone% diff tmp test_output -\end_layout - -\begin_layout LyX-Code - You may get a few strange lines of the sort -\end_layout - -\begin_layout LyX-Code - clone% run_tests tmp -\end_layout - -\begin_layout LyX-Code - rm: tmp: No such file or directory -\end_layout - -\begin_layout LyX-Code - clone% diff tmp test_output -\end_layout - -\begin_layout LyX-Code - These should cause no concern. - However, if the "diff" shows that -\end_layout - -\begin_layout LyX-Code - tmp and test_output are different, contact me (you have a -\end_layout - -\begin_layout LyX-Code - problem). - -\end_layout - -\begin_layout LyX-Code - You should now be able to use scan_for_matches by following the -\end_layout - -\begin_layout LyX-Code - instructions given below (which is all the normal user should have -\end_layout - -\begin_layout LyX-Code - to understand, once things are installed properly). -\end_layout - -\begin_layout LyX-Code - ============================================================== -\end_layout - -\begin_layout LyX-Code -How to run scan_for_matches: -\end_layout - -\begin_layout LyX-Code - To run the program, you type need to create two files -\end_layout - -\begin_layout LyX-Code - 1. - the first file contains the pattern you wish to scan for; I'll -\end_layout - -\begin_layout LyX-Code - call this file pat_file in what follows (but any name is ok) -\end_layout - -\begin_layout LyX-Code - 2. - the second file contains a set of sequences to scan. - These -\end_layout - -\begin_layout LyX-Code - should be in "fasta format". - Just look at the contents of -\end_layout - -\begin_layout LyX-Code - test_dna_input to see examples of this format. - Basically, -\end_layout - -\begin_layout LyX-Code - each sequence begins with a line of the form -\end_layout - -\begin_layout LyX-Code - >sequence_id -\end_layout - -\begin_layout LyX-Code - and is followed by one or more lines containing the sequence. -\end_layout - -\begin_layout LyX-Code - Once these files have been created, you just use -\end_layout - -\begin_layout LyX-Code - scan_for_matches pat_file < input_file -\end_layout - -\begin_layout LyX-Code - to scan all of the input sequences for the given pattern. - As an -\end_layout - -\begin_layout LyX-Code - example, suppose that pat_file contains a single line of the form -\end_layout - -\begin_layout LyX-Code - p1=4...7 3...8 ~p1 -\end_layout - -\begin_layout LyX-Code - Then, -\end_layout - -\begin_layout LyX-Code - scan_for_matches pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - should produce two "hits". - When I run this on my machine, I get -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - >tst1:[6,27] -\end_layout - -\begin_layout LyX-Code - cguaacc ggttaacc gguuacg -\end_layout - -\begin_layout LyX-Code - >tst2:[6,27] -\end_layout - -\begin_layout LyX-Code - CGUAACC GGTTAACC GGUUACG -\end_layout - -\begin_layout LyX-Code - clone% -\end_layout - -\begin_layout LyX-Code -Simple Patterns Built by Matching Ranges and Reverse Complements -\end_layout - -\begin_layout LyX-Code - Let me first explain this simple pattern: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - p1=4...7 3...8 ~p1 -\end_layout - -\begin_layout LyX-Code - The pattern consists of three "pattern units" separated by spaces. -\end_layout - -\begin_layout LyX-Code - The first pattern unit is -\end_layout - -\begin_layout LyX-Code - p1=4...7 -\end_layout - -\begin_layout LyX-Code - which means "match 4 to 7 characters and call them p1". - The -\end_layout - -\begin_layout LyX-Code - second pattern unit is -\end_layout - -\begin_layout LyX-Code - 3...8 -\end_layout - -\begin_layout LyX-Code - which means "then match 3 to 8 characters". - The last pattern unit -\end_layout - -\begin_layout LyX-Code - is -\end_layout - -\begin_layout LyX-Code - ~p1 -\end_layout - -\begin_layout LyX-Code - which means "match the reverse complement of p1". - The first -\end_layout - -\begin_layout LyX-Code - reported hit is shown as -\end_layout - -\begin_layout LyX-Code - >tst1:[6,27] -\end_layout - -\begin_layout LyX-Code - cguaacc ggttaacc gguuacg -\end_layout - -\begin_layout LyX-Code - which states that characters 6 through 27 of sequence tst1 were -\end_layout - -\begin_layout LyX-Code - matched. - "cguaac" matched the first pattern unit, "ggttaacc" the -\end_layout - -\begin_layout LyX-Code - second, and "gguuacg" the third. - This is an example of a common -\end_layout - -\begin_layout LyX-Code - type of pattern used to search for sections of DNA or RNA that -\end_layout - -\begin_layout LyX-Code - would fold into a hairpin loop. -\end_layout - -\begin_layout LyX-Code -Searching Both Strands -\end_layout - -\begin_layout LyX-Code - Now for a short aside: scan_for_matches only searched the -\end_layout - -\begin_layout LyX-Code - sequences in the input file; it did not search the opposite -\end_layout - -\begin_layout LyX-Code - strand. - With a pattern of the sort we just used, there is not -\end_layout - -\begin_layout LyX-Code - need o search the opposite strand. - However, it is normally the -\end_layout - -\begin_layout LyX-Code - case that you will wish to search both the sequence and the -\end_layout - -\begin_layout LyX-Code - opposite strand (i.e., the reverse complement of the sequence). -\end_layout - -\begin_layout LyX-Code - To do that, you would just use the "-c" command line. - For example, -\end_layout - -\begin_layout LyX-Code - scan_for_matches -c pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - Hits on the opposite strand will show a beginning location greater -\end_layout - -\begin_layout LyX-Code - than te end location of the match. -\end_layout - -\begin_layout LyX-Code -Defining Pairing Rules and Allowing Mismatches, Insertions, and Deletions -\end_layout - -\begin_layout LyX-Code - Let us stop now and ask "What additional features would one need to -\end_layout - -\begin_layout LyX-Code - really find the kinds of loop structures that characterize tRNAs, -\end_layout - -\begin_layout LyX-Code - rRNAs, and so forth?" I can immediately think of two: -\end_layout - -\begin_layout LyX-Code - a) you will need to be able to allow non-standard pairings -\end_layout - -\begin_layout LyX-Code - (those other than G-C and A-U), and -\end_layout - -\begin_layout LyX-Code - b) you will need to be able to tolerate some number of -\end_layout - -\begin_layout LyX-Code - mismatches and bulges. -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - Let me first show you how to handle non-standard "rules for -\end_layout - -\begin_layout LyX-Code - pairing in reverse complements". - Consider the following pattern, -\end_layout - -\begin_layout LyX-Code - which I show as two line (you may use as many lines as you like in -\end_layout - -\begin_layout LyX-Code - forming a pattern, although you can only break a pattern at points -\end_layout - -\begin_layout LyX-Code - where space would be legal): -\end_layout - -\begin_layout LyX-Code - r1={au,ua,gc,cg,gu,ug,ga,ag} -\end_layout - -\begin_layout LyX-Code - p1=2...3 0...4 p2=2...5 1...5 r1~p2 0...4 ~p1 -\end_layout - -\begin_layout LyX-Code - The first "pattern unit" does not actually match anything; rather, -\end_layout - -\begin_layout LyX-Code - it defines a "pairing rule" in which standard pairings are -\end_layout - -\begin_layout LyX-Code - allowed, as well as G-A and A-G (in case you wondered, Us and Ts -\end_layout - -\begin_layout LyX-Code - and upper and lower case can be used interchangably; for example -\end_layout - -\begin_layout LyX-Code - r1={AT,UA,gc,cg} could be used to define the "standard rule" for -\end_layout - -\begin_layout LyX-Code - pairings). - The second line consists of six pattern units which -\end_layout - -\begin_layout LyX-Code - may be interpreted as follows: -\end_layout - -\begin_layout LyX-Code - p1=2...3 match 2 or 3 characters (call it p1) -\end_layout - -\begin_layout LyX-Code - 0...4 match 0 to 4 characters -\end_layout - -\begin_layout LyX-Code - p2=2...5 match 2 to 5 characters (call it p2) -\end_layout - -\begin_layout LyX-Code - 1...5 match 1 to 5 characters -\end_layout - -\begin_layout LyX-Code - r1~p2 match the reverse complement of p2, -\end_layout - -\begin_layout LyX-Code - allowing G-A and A-G pairs -\end_layout - -\begin_layout LyX-Code - 0...4 match 0 to 4 characters -\end_layout - -\begin_layout LyX-Code - ~p1 match the reverse complement of p1 -\end_layout - -\begin_layout LyX-Code - allowing only G-C, C-G, A-T, and T-A pairs -\end_layout - -\begin_layout LyX-Code - Thus, r1~p2 means "match the reverse complement of p2 using rule r1". -\end_layout - -\begin_layout LyX-Code - Now let us consider the issue of tolerating mismatches and bulges. -\end_layout - -\begin_layout LyX-Code - You may add a "qualifier" to the pattern unit that gives the -\end_layout - -\begin_layout LyX-Code - tolerable number of "mismatches, deletions, and insertions". -\end_layout - -\begin_layout LyX-Code - Thus, -\end_layout - -\begin_layout LyX-Code - p1=10...10 3...8 ~p1[1,2,1] -\end_layout - -\begin_layout LyX-Code - means that the third pattern unit must match 10 characters, -\end_layout - -\begin_layout LyX-Code - allowing one "mismatch" (a pairing other than G-C, C-G, A-T, or -\end_layout - -\begin_layout LyX-Code - T-A), two deletions (a deletion is a character that occurs in p1, -\end_layout - -\begin_layout LyX-Code - but has been "deleted" from the string matched by ~p1), and one -\end_layout - -\begin_layout LyX-Code - insertion (an "insertion" is a character that occurs in the string -\end_layout - -\begin_layout LyX-Code - matched by ~p1, but not for which no corresponding character -\end_layout - -\begin_layout LyX-Code - occurs in p1). - In this case, the pattern would match -\end_layout - -\begin_layout LyX-Code - ACGTACGTAC GGGGGGGG GCGTTACCT -\end_layout - -\begin_layout LyX-Code - which is, you must admit, a fairly weak loop. - It is common to -\end_layout - -\begin_layout LyX-Code - allow mismatches, but you will find yourself using insertions and -\end_layout - -\begin_layout LyX-Code - deletions much more rarely. - In any event, you should note that -\end_layout - -\begin_layout LyX-Code - allowing mismatches, insertions, and deletions does force the -\end_layout - -\begin_layout LyX-Code - program to try many additional possible pairings, so it does slow -\end_layout - -\begin_layout LyX-Code - things down a bit. -\end_layout - -\begin_layout LyX-Code -How Patterns Are Matched -\end_layout - -\begin_layout LyX-Code - Now is as good a time as any to discuss the basic flow of control -\end_layout - -\begin_layout LyX-Code - when matching patterns. - Recall that a "pattern" is a sequence of -\end_layout - -\begin_layout LyX-Code - "pattern units". - Suppose that the pattern units were -\end_layout - -\begin_layout LyX-Code - u1 u2 u3 u4 ... - un -\end_layout - -\begin_layout LyX-Code - The scan of a sequence S begins by setting the current position -\end_layout - -\begin_layout LyX-Code - to 1. - Then, an attempt is made to match u1 starting at the -\end_layout - -\begin_layout LyX-Code - current position. - Each attempt to match a pattern unit can -\end_layout - -\begin_layout LyX-Code - succeed or fail. - If it succeeds, then an attempt is made to match -\end_layout - -\begin_layout LyX-Code - the next unit. - If it fails, then an attempt is made to find an -\end_layout - -\begin_layout LyX-Code - alternative match for the immediately preceding pattern unit. - If -\end_layout - -\begin_layout LyX-Code - this succeeds, then we proceed forward again to the next unit. - If -\end_layout - -\begin_layout LyX-Code - it fails we go back to the preceding unit. - This process is called -\end_layout - -\begin_layout LyX-Code - "backtracking". - If there are no previous units, then the current -\end_layout - -\begin_layout LyX-Code - position is incremented by one, and everything starts again. - This -\end_layout - -\begin_layout LyX-Code - proceeds until either the current position goes past the end of -\end_layout - -\begin_layout LyX-Code - the sequence or all of the pattern units succeed. - On success, -\end_layout - -\begin_layout LyX-Code - scan_for_matches reports the "hit", the current position is set -\end_layout - -\begin_layout LyX-Code - just past the hit, and an attempt is made to find another hit. -\end_layout - -\begin_layout LyX-Code - If you wish to limit the scan to simply finding a maximum of, say, -\end_layout - -\begin_layout LyX-Code - 10 hits, you can use the -n option (-n 10 would set the limit to -\end_layout - -\begin_layout LyX-Code - 10 reported hits). - For example, -\end_layout - -\begin_layout LyX-Code - scan_for_matches -c -n 1 pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - would search for just the first hit (and would stop searching the -\end_layout - -\begin_layout LyX-Code - current sequences or any that follow in the input file). -\end_layout - -\begin_layout LyX-Code -Searching for repeats: -\end_layout - -\begin_layout LyX-Code - In the last section, I discussed almost all of the details -\end_layout - -\begin_layout LyX-Code - required to allow you to look for repeats. - Consider the following -\end_layout - -\begin_layout LyX-Code - set of patterns: -\end_layout - -\begin_layout LyX-Code - p1=6...6 3...8 p1 (find exact 6 character repeat separated -\end_layout - -\begin_layout LyX-Code - by to 8 characters) -\end_layout - -\begin_layout LyX-Code - p1=6...6 3..8 p1[1,0,0] (allow one mismatch) -\end_layout - -\begin_layout LyX-Code - p1=3...3 p1[1,0,0] p1[1,0,0] p1[1,0,0] -\end_layout - -\begin_layout LyX-Code - (match 12 characters that are the remains -\end_layout - -\begin_layout LyX-Code - of a 3-character sequence occurring 4 times) -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - p1=4...8 0...3 p2=6...8 p1 0...3 p2 -\end_layout - -\begin_layout LyX-Code - (This would match things like -\end_layout - -\begin_layout LyX-Code - ATCT G TCTTT ATCT TG TCTTT -\end_layout - -\begin_layout LyX-Code - ) -\end_layout - -\begin_layout LyX-Code -Searching for particular sequences: -\end_layout - -\begin_layout LyX-Code - Occasionally, one wishes to match a specific, known sequence. -\end_layout - -\begin_layout LyX-Code - In such a case, you can just give the sequence (along with an -\end_layout - -\begin_layout LyX-Code - optional statement of the allowable mismatches, insertions, and -\end_layout - -\begin_layout LyX-Code - deletions). - Thus, -\end_layout - -\begin_layout LyX-Code - p1=6...8 GAGA ~p1 (match a hairpin with GAGA as the loop) -\end_layout - -\begin_layout LyX-Code - RRRRYYYY (match 4 purines followed by 4 pyrimidines) -\end_layout - -\begin_layout LyX-Code - TATAA[1,0,0] (match TATAA, allowing 1 mismatch) -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code -Matches against a "weight matrix": -\end_layout - -\begin_layout LyX-Code - I will conclude my examples of the types of pattern units -\end_layout - -\begin_layout LyX-Code - available for matching against nucleotide sequences by discussing a -\end_layout - -\begin_layout LyX-Code - crude implemetation of matching using a "weight matrix". - While I -\end_layout - -\begin_layout LyX-Code - am less than overwhelmed with the syntax that I chose, I think that -\end_layout - -\begin_layout LyX-Code - the reader should be aware that I was thinking of generating -\end_layout - -\begin_layout LyX-Code - patterns containing such pattern units automatically from -\end_layout - -\begin_layout LyX-Code - alignments (and did not really plan on typing such things in by -\end_layout - -\begin_layout LyX-Code - hand very often). - Anyway, suppose that you wanted to match a -\end_layout - -\begin_layout LyX-Code - sequence of eight characters. - The "consensus" of these eight -\end_layout - -\begin_layout LyX-Code - characters is GRCACCGS, but the actual "frequencies of occurrence" -\end_layout - -\begin_layout LyX-Code - are given in the matrix below. - Thus, the first character is an A -\end_layout - -\begin_layout LyX-Code - 16% the time and a G 84% of the time. - The second is an A 57% of -\end_layout - -\begin_layout LyX-Code - the time, a C 10% of the time, a G 29% of the time, and a T 4% of -\end_layout - -\begin_layout LyX-Code - the time. - -\end_layout - -\begin_layout LyX-Code - C1 C2 C3 C4 C5 C6 C7 C8 -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - A 16 57 0 95 0 18 0 0 -\end_layout - -\begin_layout LyX-Code - C 0 10 80 0 100 60 0 50 -\end_layout - -\begin_layout LyX-Code - G 84 29 0 0 0 20 100 50 -\end_layout - -\begin_layout LyX-Code - T 0 4 20 5 0 2 0 0 -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - One could use the following pattern unit to search for inexact -\end_layout - -\begin_layout LyX-Code - matches related to such a "weight matrix": -\end_layout - -\begin_layout LyX-Code - {(16,0,84,0),(57,10,29,4),(0,80,0,20),(95,0,0,5), -\end_layout - -\begin_layout LyX-Code - (0,100,0,0),(18,60,20,2),(0,0,100,0),(0,50,50,0)} > 450 -\end_layout - -\begin_layout LyX-Code - This pattern unit will attempt to match exactly eight characters. -\end_layout - -\begin_layout LyX-Code - For each character in the sequence, the entry in the corresponding -\end_layout - -\begin_layout LyX-Code - tuple is added to an accumulated sum. - If the sum is greater than -\end_layout - -\begin_layout LyX-Code - 450, the match succeeds; else it fails. -\end_layout - -\begin_layout LyX-Code - Recently, this feature was upgraded to allow ranges. - Thus, -\end_layout - -\begin_layout LyX-Code - 600 > {(16,0,84,0),(57,10,29,4),(0,80,0,20),(95,0,0,5), -\end_layout - -\begin_layout LyX-Code - (0,100,0,0),(18,60,20,2),(0,0,100,0),(0,50,50,0)} > 450 -\end_layout - -\begin_layout LyX-Code - will work, as well. -\end_layout - -\begin_layout LyX-Code -Allowing Alternatives: -\end_layout - -\begin_layout LyX-Code - Very occasionally, you may wish to allow alternative pattern units -\end_layout - -\begin_layout LyX-Code - (i.e., "match either A or B"). - You can do this using something -\end_layout - -\begin_layout LyX-Code - like -\end_layout - -\begin_layout LyX-Code - ( GAGA | GCGCA) -\end_layout - -\begin_layout LyX-Code - which says "match either GAGA or GCGCA". - You may take -\end_layout - -\begin_layout LyX-Code - alternatives of a list of pattern units, for example -\end_layout - -\begin_layout LyX-Code - (p1=3...3 3...8 ~p1 | p1=5...5 4...4 ~p1 GGG) -\end_layout - -\begin_layout LyX-Code - would match one of two sequences of pattern units. - There is one -\end_layout - -\begin_layout LyX-Code - clumsy aspect of the syntax: to match a list of alternatives, you -\end_layout - -\begin_layout LyX-Code - need to fully the request. - Thus, -\end_layout - -\begin_layout LyX-Code - (GAGA | (GCGCA | TTCGA)) -\end_layout - -\begin_layout LyX-Code - would be needed to try the three alternatives. -\end_layout - -\begin_layout LyX-Code -One Minor Extension -\end_layout - -\begin_layout LyX-Code - Sometimes a pattern will contain a sequence of distinct ranges, -\end_layout - -\begin_layout LyX-Code - and you might wish to limit the sum of the lengths of the matched -\end_layout - -\begin_layout LyX-Code - subsequences. - For example, suppose that you basically wanted to -\end_layout - -\begin_layout LyX-Code - match something like -\end_layout - -\begin_layout LyX-Code - ARRYYTT p1=0...5 GCA[1,0,0] p2=1...6 ~p1 4...8 ~p2 p3=4...10 CCT -\end_layout - -\begin_layout LyX-Code - but that the sum of the lengths of p1, p2, and p3 must not exceed -\end_layout - -\begin_layout LyX-Code - eight characters. - To do this, you could add -\end_layout - -\begin_layout LyX-Code - length(p1+p2+p3) < 9 -\end_layout - -\begin_layout LyX-Code - as the last pattern unit. - It will just succeed or fail (but does -\end_layout - -\begin_layout LyX-Code - not actually match any characters in the sequence). -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code -Matching Protein Sequences -\end_layout - -\begin_layout LyX-Code - Suppose that the input file contains protein sequences. - In this -\end_layout - -\begin_layout LyX-Code - case, you must invoke scan_for_matches with the "-p" option. - You -\end_layout - -\begin_layout LyX-Code - cannot use aspects of the language that relate directly to -\end_layout - -\begin_layout LyX-Code - nucleotide sequences (e.g., the -c command line option or pattern -\end_layout - -\begin_layout LyX-Code - constructs referring to the reverse complement of a previously -\end_layout - -\begin_layout LyX-Code - matched unit). - -\end_layout - -\begin_layout LyX-Code - You also have two additional constructs that allow you to match -\end_layout - -\begin_layout LyX-Code - either "one of a set of amino acids" or "any amino acid other than -\end_layout - -\begin_layout LyX-Code - those a given set". - For example, -\end_layout - -\begin_layout LyX-Code - p1=0...4 any(HQD) 1...3 notany(HK) p1 -\end_layout - -\begin_layout LyX-Code - would successfully match a string like -\end_layout - -\begin_layout LyX-Code - YWV D AA C YWV -\end_layout - -\begin_layout LyX-Code -Using the show_hits Utility -\end_layout - -\begin_layout LyX-Code - When viewing a large set of complex matches, you might find it -\end_layout - -\begin_layout LyX-Code - convenient to post-process the scan_for_matches output to get a -\end_layout - -\begin_layout LyX-Code - more readable version. - We provide a simple post-processor called -\end_layout - -\begin_layout LyX-Code - "show_hits". - To see its effect, just pipe the output of a -\end_layout - -\begin_layout LyX-Code - scan_for_matches into show_hits: -\end_layout - -\begin_layout LyX-Code - Normal Output: -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches -c pat_file < tmp -\end_layout - -\begin_layout LyX-Code - >tst1:[1,28] -\end_layout - -\begin_layout LyX-Code - gtacguaacc ggttaac cgguuacgtac -\end_layout - -\begin_layout LyX-Code - >tst1:[28,1] -\end_layout - -\begin_layout LyX-Code - gtacgtaacc ggttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - >tst2:[2,31] -\end_layout - -\begin_layout LyX-Code - CGTACGUAAC C GGTTAACC GGUUACGTACG -\end_layout - -\begin_layout LyX-Code - >tst2:[31,2] -\end_layout - -\begin_layout LyX-Code - CGTACGTAAC C GGTTAACC GGTTACGTACG -\end_layout - -\begin_layout LyX-Code - >tst3:[3,32] -\end_layout - -\begin_layout LyX-Code - gtacguaacc g gttaactt cgguuacgtac -\end_layout - -\begin_layout LyX-Code - >tst3:[32,3] -\end_layout - -\begin_layout LyX-Code - gtacgtaacc g aagttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - Piped Through show_hits: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches -c pat_file < tmp | show_hits -\end_layout - -\begin_layout LyX-Code - tst1:[1,28]: gtacguaacc ggttaac cgguuacgtac -\end_layout - -\begin_layout LyX-Code - tst1:[28,1]: gtacgtaacc ggttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - tst2:[2,31]: CGTACGUAAC C GGTTAACC GGUUACGTACG -\end_layout - -\begin_layout LyX-Code - tst2:[31,2]: CGTACGTAAC C GGTTAACC GGTTACGTACG -\end_layout - -\begin_layout LyX-Code - tst3:[3,32]: gtacguaacc g gttaactt cgguuacgtac -\end_layout - -\begin_layout LyX-Code - tst3:[32,3]: gtacgtaacc g aagttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - clone% -\end_layout - -\begin_layout LyX-Code - Optionally, you can specify which of the "fields" in the matches -\end_layout - -\begin_layout LyX-Code - you wish to sort on, and show_hits will sort them. - The field -\end_layout - -\begin_layout LyX-Code - numbers start with 0. - So, you might get something like -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches -c pat_file < tmp | show_hits 2 1 -\end_layout - -\begin_layout LyX-Code - tst2:[2,31]: CGTACGUAAC C GGTTAACC GGUUACGTACG -\end_layout - -\begin_layout LyX-Code - tst2:[31,2]: CGTACGTAAC C GGTTAACC GGTTACGTACG -\end_layout - -\begin_layout LyX-Code - tst3:[32,3]: gtacgtaacc g aagttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - tst1:[1,28]: gtacguaacc ggttaac cgguuacgtac -\end_layout - -\begin_layout LyX-Code - tst1:[28,1]: gtacgtaacc ggttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - tst3:[3,32]: gtacguaacc g gttaactt cgguuacgtac -\end_layout - -\begin_layout LyX-Code - clone% -\end_layout - -\begin_layout LyX-Code - In this case, the hits have been sorted on fields 2 and 1 (that is, -\end_layout - -\begin_layout LyX-Code - the third and second matched subfields). -\end_layout - -\begin_layout LyX-Code - show_hits is just one possible little post-processor, and you -\end_layout - -\begin_layout LyX-Code - might well wish to write a customized one for yourself. -\end_layout - -\begin_layout LyX-Code -Reducing the Cost of a Search -\end_layout - -\begin_layout LyX-Code - The scan_for_matches utility uses a fairly simple search, and may -\end_layout - -\begin_layout LyX-Code - consume large amounts of CPU time for complex patterns. - Someday, -\end_layout - -\begin_layout LyX-Code - I may decide to optimize the code. - However, until then, let me -\end_layout - -\begin_layout LyX-Code - mention one useful technique. - -\end_layout - -\begin_layout LyX-Code - When you have a complex pattern that includes a number of varying -\end_layout - -\begin_layout LyX-Code - ranges, imprecise matches, and so forth, it is useful to -\end_layout - -\begin_layout LyX-Code - "pipeline" matches. - That is, form a simpler pattern that can be -\end_layout - -\begin_layout LyX-Code - used to scan through a large database extracting sections that -\end_layout - -\begin_layout LyX-Code - might be matched by the more complex pattern. - Let me illustrate -\end_layout - -\begin_layout LyX-Code - with a short example. - Suppose that you really wished to match the -\end_layout - -\begin_layout LyX-Code - pattern -\end_layout - -\begin_layout LyX-Code - p1=3...5 0...8 ~p1[1,1,0] p2=6...7 3...6 AGC 3...5 RYGC ~p2[1,0,0] -\end_layout - -\begin_layout LyX-Code - In this case, the pattern units AGC 3...5 RYGC can be used to rapidly -\end_layout - -\begin_layout LyX-Code - constrain the overall search. - You can preprocess the overall -\end_layout - -\begin_layout LyX-Code - database using the pattern: -\end_layout - -\begin_layout LyX-Code - 31...31 AGC 3...5 RYGC 7...7 -\end_layout - -\begin_layout LyX-Code - Put the complex pattern in pat_file1 and the simpler pattern in -\end_layout - -\begin_layout LyX-Code - pat_file2. - Then use, -\end_layout - -\begin_layout LyX-Code - scan_for_matches -c pat_file2 < nucleotide_database | -\end_layout - -\begin_layout LyX-Code - scan_for_matches pat_file1 -\end_layout - -\begin_layout LyX-Code - The output will show things like -\end_layout - -\begin_layout LyX-Code - >seqid:[232,280][2,47] -\end_layout - -\begin_layout LyX-Code - matches pieces -\end_layout - -\begin_layout LyX-Code - Then, the actual section of the sequence that was matched can be -\end_layout - -\begin_layout LyX-Code - easily computed as [233,278] (remember, the positions start from -\end_layout - -\begin_layout LyX-Code - 1, not 0). -\end_layout - -\begin_layout LyX-Code - Let me finally add, you should do a few short experiments to see -\end_layout - -\begin_layout LyX-Code - whether or not such pipelining actually improves performance -- it -\end_layout - -\begin_layout LyX-Code - is not always obvious where the time is going, and I have -\end_layout - -\begin_layout LyX-Code - sometimes found that the added complexity of pipelining actually -\end_layout - -\begin_layout LyX-Code - slowed things up. - It gets its best improvements when there are -\end_layout - -\begin_layout LyX-Code - exact matches of more than just a few characters that can be -\end_layout - -\begin_layout LyX-Code - rapidly used to eliminate large sections of the database. -\end_layout - -\begin_layout LyX-Code -============= -\end_layout - -\begin_layout LyX-Code -Additions: -\end_layout - -\begin_layout LyX-Code -Feb 9, 1995: the pattern units ^ and $ now work as in normal regular -\end_layout - -\begin_layout LyX-Code - expressions. - That is -\end_layout - -\begin_layout LyX-Code - TTF $ -\end_layout - -\begin_layout LyX-Code - matches only TTF at the end of the string and -\end_layout - -\begin_layout LyX-Code - ^ TTF -\end_layout - -\begin_layout LyX-Code - matches only an initial TTF -\end_layout - -\begin_layout LyX-Code - The pattern unit -\end_layout - -\begin_layout LyX-Code - : -\end_layout - -\begin_layout Standard -\begin_inset Box Frameless -position "t" -hor_pos "c" -has_inner_box 1 -inner_pos "t" -use_parbox 0 -width "100col%" -special "none" -height "1in" -height_special "totalheight" -status open - -\begin_layout LyX-Code - -\size scriptsize -Program name: read_fasta -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Author: Martin Asser Hansen - Copyright (C) - All rights reserved -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Contact: mail@maasha.dk -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Date: August 2007 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -License: GNU General Public License version 2 (http://www.gnu.org/copyleft/ -gpl.html) -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Description: Read FASTA entries. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Usage: read_fasta [options] -i -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Options: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-i | --data_in=] - Comma separated list of files - to read. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-n | --num=] - Limit number of records to read. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-I | --stream_in=] - Read input stream from file - - Default=STDIN -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-O | --stream_out=] - Write output stream to file - - Default=STDOUT -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Examples: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i test.fna - Read FASTA entries from file. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i test1.fna,test2,fna - Read FASTA entries from files. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i '*.fna' - Read FASTA entries from files. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i test.fna -n 10 - Read first 10 FASTA entries from - file. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Section -The Data Stream -\end_layout - -\begin_layout Subsection -How to read the data stream from file? -\begin_inset LatexCommand label -name "sub:How-to-read-stream" - -\end_inset - - -\end_layout - -\begin_layout Standard -You want to read a data stream that you previously have saved to file in - biopieces format. - This can be done implicetly or explicitly. - The implicit way uses the 'stdout' stream of the Unix terminal: -\end_layout - -\begin_layout LyX-Code -cat | -\end_layout - -\begin_layout Standard -cat is the Unix command that reads a file and output the result to 'stdout' - --- which in this case is piped to any biopiece represented by the . - It is also possible to read the data stream using '<' to direct the 'stdout' - stream into the biopiece like this: -\end_layout - -\begin_layout LyX-Code - < -\end_layout - -\begin_layout Standard -However, that will not work if you pipe more biopieces together. - Then it is much safer to read the stream from a file explicitly like this: -\end_layout - -\begin_layout LyX-Code - --stream_in= -\end_layout - -\begin_layout Standard -Here the filename is explicetly given to the biopiece - with the switch -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_in. - This switch works with all biopieces. - It is also possible to read in data from multiple sources by repeating - the explicit read step: -\end_layout - -\begin_layout LyX-Code - --stream_in= | --stream_in= -\end_layout - -\begin_layout Subsection -How to write the data stream to file? -\begin_inset LatexCommand label -name "sub:How-to-write-stream" - -\end_inset - - -\end_layout - -\begin_layout Standard -In order to save the output stream from a biopiece to file, so you can read - in the stream again at a later time, you can do one of two things: -\end_layout - -\begin_layout LyX-Code - > -\end_layout - -\begin_layout Standard -All, the biopieces write the data stream to 'stdout' by default which can - be written to a file by redirecting 'stdout' to file using '>' , however, - if one of the biopieces for writing other formats is used then the both - the biopieces records as well as the result output will go to 'stdout' - in a mixture causing havock! To avoid this you must use the switch -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_out that explictly tells the biopiece to write the output stream - to file: -\end_layout - -\begin_layout LyX-Code - --stream_out= -\end_layout - -\begin_layout Standard -The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_out switch works with all biopieces. -\end_layout - -\begin_layout Subsection -How to terminate the data stream? -\end_layout - -\begin_layout Standard -The data stream is never stops unless the user want to save the stream or - by supplying the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch that will terminate the stream: -\end_layout - -\begin_layout LyX-Code - --no_stream -\end_layout - -\begin_layout Standard -The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch only works with those biopieces where it makes sense that - the user might want to terminale the data stream, -\emph on -i.e -\emph default -. - after an analysis step where the user wants to output the result, but not - the data stream. -\end_layout - -\begin_layout Subsection -How to write my results to file? -\begin_inset LatexCommand label -name "sub:How-to-write-result" - -\end_inset - - -\end_layout - -\begin_layout Standard -Saving the result of an analysis to file can be done implicitly or explicitly. - The implicit way: -\end_layout - -\begin_layout LyX-Code - --no_stream > -\end_layout - -\begin_layout Standard -If you use '>' to redirect 'stdout' to file then it is important to use - the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch to avoid writing a mix of biopieces records and result - to the same file causing havock. - The safe way is to use the -\begin_inset ERT -status open - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out switch which explicetly tells the biopiece to write the result - to a given file: -\end_layout - -\begin_layout LyX-Code - --result_out= -\end_layout - -\begin_layout Standard -Using the above method will not terminate the stream, so it is possible - to pipe that into another biopiece generating different results: -\end_layout - -\begin_layout LyX-Code - --result_out= | --result_out= -\end_layout - -\begin_layout Standard -And still the data stream will continue unless terminated with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream: -\end_layout - -\begin_layout LyX-Code - --result_out= --no_stream -\end_layout - -\begin_layout Standard -Or written to file using implicitly or explicity -\begin_inset LatexCommand eqref -reference "sub:How-to-write-result" - -\end_inset - -. - The explicit way: -\end_layout - -\begin_layout LyX-Code - --result_out= --stream_out= -\end_layout - -\begin_layout Subsection -How to read data from multiple sources? -\end_layout - -\begin_layout Standard -To read multiple data sources, with the same type or different type of data - do: -\end_layout - -\begin_layout LyX-Code - --data_in= | --data_in= -\end_layout - -\begin_layout Standard -where type is the data type a specific biopiece reads. -\end_layout - -\begin_layout Section -Reading input -\end_layout - -\begin_layout Subsection -How to read biopieces input? -\end_layout - -\begin_layout Standard -See -\begin_inset LatexCommand eqref -reference "sub:How-to-read-stream" - -\end_inset - -. -\end_layout - -\begin_layout Subsection -How to read in data? -\end_layout - -\begin_layout Standard -Data in different formats can be read with the appropriate biopiece for - that format. - The biopieces are typicalled named 'read_' such as -\series bold -read_fasta -\series default -, -\series bold -read_bed -\series default -, -\series bold -read_tab -\series default -, etc., and all behave in a similar manner. - Data can be read by supplying the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -data_in switch and a file name to the file containing the data: -\end_layout - -\begin_layout LyX-Code - --data_in= -\end_layout - -\begin_layout Standard -It is also possible to read in a saved biopieces stream (see -\begin_inset LatexCommand ref -reference "sub:How-to-read-stream" - -\end_inset - -) as well as reading data in one go: -\end_layout - -\begin_layout LyX-Code - --stream_in= --data_in= -\end_layout - -\begin_layout Standard -If you want to read data from several files you can do this: -\end_layout - -\begin_layout LyX-Code - --data_in= | --data_in= -\end_layout - -\begin_layout Standard -If you have several data files you can read in all explicitly with a comma - separated list: -\end_layout - -\begin_layout LyX-Code - --data_in=file1,file2,file3 -\end_layout - -\begin_layout Standard -And it is also possible to use file globbing -\begin_inset Foot -status open - -\begin_layout Standard -using the short option will only work if you quote the argument -i '*.fna' -\end_layout - -\end_inset - -: -\end_layout - -\begin_layout LyX-Code - --data_in=*.fna -\end_layout - -\begin_layout Standard -Or in a combination: -\end_layout - -\begin_layout LyX-Code - --data_in=file1,/dir/*.fna -\end_layout - -\begin_layout Standard -Finally, it is possible to read in data in different formats using the appropria -te biopiece for each format: -\end_layout - -\begin_layout LyX-Code - --data_in= | --data_in= ... -\end_layout - -\begin_layout Subsection -How to read FASTA input? -\end_layout - -\begin_layout Standard -Sequences in FASTA format can be read explicitly using -\series bold -read_fasta -\series default -: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= -\end_layout - -\begin_layout Subsection -How to read alignment input? -\end_layout - -\begin_layout Standard -If your alignment if FASTA formatted then you can -\series bold -read_align -\series default -. - It is also possible to use -\series bold -read_fasta -\series default - since the data is FASTA formatted, however, with -\series bold -read_fasta -\series default - the key ALIGN will be omitted. - The ALIGN key is used to determine which sequences belong to what alignment - which is required for -\series bold -write_align -\series default -. -\end_layout - -\begin_layout LyX-Code -read_align --data_in= -\end_layout - -\begin_layout Subsection -How to read tabular input? -\begin_inset LatexCommand label -name "sub:How-to-read-table" - -\end_inset - - -\end_layout - -\begin_layout Standard -Tabular input can be read with -\series bold -read_tab -\series default - which will read in all rows and chosen columns (separated by a given delimter) - from a table in text format. -\end_layout - -\begin_layout Standard -The table below: -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Tabular - - - - - - - -\begin_inset Text - -\begin_layout Standard -Human -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -ATACGTCAG -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -23524 -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Dog -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -AGCATGAC -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -2442 -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Mouse -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -GACTG -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -234 -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Cat -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -AAATGCA -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -2342 -\end_layout - -\end_inset - - - - -\end_inset - - -\end_layout - -\begin_layout Standard -Can be read using the command: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= -\end_layout - -\begin_layout Standard -Which will result in four records, one for each row, where the keys V0, - V1, V2 are the default keys for the organism, sequence, and count, respectively. - It is possible to select a subset of colums to read by using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -cols switch which takes a comma separated list of columns numbers (first - column is designated 0) as argument. - So to read in only the sequence and the count so that the count comes before - the sequence do: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= --cols=2,1 -\end_layout - -\begin_layout Standard -It is also possible to name the columns with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys switch: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= --cols=2,1 --keys=COUNT,SEQ -\end_layout - -\begin_layout Subsection -How to read BED input? -\end_layout - -\begin_layout Standard -The BED (Browser Extensible Data -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://genome.ucsc.edu/FAQ/FAQformat" - -\end_inset - - -\end_layout - -\end_inset - -) format is a tabular format for data pertaining to one of the Eukaryotic - genomes in the UCSC genome brower -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://genome.ucsc.edu/" - -\end_inset - - -\end_layout - -\end_inset - -. - The BED format consists of up to 12 columns, where the first three are - mandatory CHR, CHR_BEG, and CHR_END. - The mandatory columns and any of the optional columns can all be read in - easily with the -\series bold -read_bed -\series default - biopiece. -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= -\end_layout - -\begin_layout Standard -It is also possible to read the BED file with -\series bold -read_tab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-read-table" - -\end_inset - -), however, that will be more cumbersome because you need to specify the - keys: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= --keys=CHR,CHR_BEG,CHR_END ... -\end_layout - -\begin_layout Subsection -How to read PSL input? -\end_layout - -\begin_layout Standard -The PSL format is the output from BLAT and contains 21 mandatory fields - that can be read with -\series bold -read_psl -\series default -: -\end_layout - -\begin_layout LyX-Code -read_psl --data_in= -\end_layout - -\begin_layout Section -Writing output -\end_layout - -\begin_layout Standard -All result output can be written explicitly to file using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out switch which all result generating biopieces have. - It is also possible to write the result to file implicetly by directing - 'stdout' to file using '>', however, that requires the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream swich to prevent a mixture of data stream and results in the file. - The explicit (and safe) way: -\end_layout - -\begin_layout LyX-Code -... - | --result_out= -\end_layout - -\begin_layout Standard -The implicit way: -\end_layout - -\begin_layout LyX-Code -... - | --no_stream > -\end_layout - -\begin_layout Subsection -How to write biopieces output? -\end_layout - -\begin_layout Standard -See -\begin_inset LatexCommand eqref -reference "sub:How-to-write-stream" - -\end_inset - -. -\end_layout - -\begin_layout Subsection -How to write FASTA output? -\begin_inset LatexCommand label -name "sub:How-to-write-fasta" - -\end_inset - - -\end_layout - -\begin_layout Standard -FASTA output can be written with -\series bold -write_fasta -\series default -. -\end_layout - -\begin_layout LyX-Code -... - | write_fasta --result_out= -\end_layout - -\begin_layout Standard -It is also possible to wrap the sequences to a given width using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -wrap switch allthough wrapping of sequence is generally an evil thing: -\end_layout - -\begin_layout LyX-Code -... - | write_fasta --no_stream --wrap=80 -\end_layout - -\begin_layout Subsection -How to write alignment output? -\begin_inset LatexCommand label -name "sub:How-to-write-alignment" - -\end_inset - - -\end_layout - -\begin_layout Standard -Pretty alignments with ruler -\begin_inset Foot -status collapsed - -\begin_layout Standard -'.' for every 10 residues, ':' for every 50, and '|' for every 100 -\end_layout - -\end_inset - - and consensus sequence -\begin_inset Note Note -status collapsed - -\begin_layout Standard -which reminds me to make that an option. -\end_layout - -\end_inset - - can be created with -\series bold -write_align -\series default -, what also have the optional -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -wrap switch to break the alignment into blocks of a given width: -\end_layout - -\begin_layout LyX-Code -... - | write_align --result_out= --wrap=80 -\end_layout - -\begin_layout Standard -If the number of sequnces in the alignment is 2 then a pairwise alignment - will be output otherwise a multiple alignment. - And if the sequence type, determined automagically, is protein, then residues - and symbols (+,\InsetSpace ~ -:,\InsetSpace ~ -.) will be used to show consensus according to the Blosum62 - matrix. -\end_layout - -\begin_layout Subsection -How to write tabular output? -\begin_inset LatexCommand label -name "sub:How-to-write-tab" - -\end_inset - - -\end_layout - -\begin_layout Standard -Outputting the data stream as a table can be done with -\series bold -write_tab -\series default -, which will write generate one row per record with the values as columns. - If you supply the optional -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -comment switch, when the first row in the table will be a 'comment' line - prefixed with a '#': -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --comment -\end_layout - -\begin_layout Standard -You can also change the delimiter from the default (tab) to -\emph on -e.g. - -\emph default - ',': -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --delimit=',' -\end_layout - -\begin_layout Standard -If you want the values output in a specific order you have to supply a comma - separated list using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys switch that will print only those keys in that order: -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --keys=SEQ_NAME,COUNT -\end_layout - -\begin_layout Standard -Alternatively, if you have some keys that you don't want in the tabular - output, use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_keys switch. - So to print all keys except SEQ and SEQ_TYPE do: -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --no_keys=SEQ,SEQ_TYPE -\end_layout - -\begin_layout Standard -Finally, if you have a stream containing a mix of different records types, - -\emph on -e.g. - -\emph default - records with sequences and records with matches, then you can use -\series bold -write_tab -\series default - to output all the records in tabluar format, however, the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -comment, -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys, and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_keys switches will only respond to records of the first type encountered. - The reason is that outputting mixed records is probably not what you want - anyway, and you should remove all the unwanted records from the stream - before outputting the table: -\series bold -grab -\series default - is your friend (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-grab" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to write a BED output? -\begin_inset LatexCommand label -name "sub:How-to-write-BED" - -\end_inset - - -\end_layout - -\begin_layout Standard -Data in BED format can be output if the records contain the mandatory keys - CHR, CHR_BEG, and CHR_END using -\series bold -write_bed -\series default -. - If the optional keys are also present, they will be output as well: -\end_layout - -\begin_layout LyX-Code -write_bed --result_out= -\end_layout - -\begin_layout Subsection -How to write PSL output? -\begin_inset LatexCommand label -name "sub:How-to-write-PSL" - -\end_inset - - -\end_layout - -\begin_layout Standard -Data in PSL format can be output using -\series bold -write_psl: -\end_layout - -\begin_layout LyX-Code -write_psl --result_out= -\end_layout - -\begin_layout Section -Manipulating Records -\end_layout - -\begin_layout Subsection -How to select a few records? -\begin_inset LatexCommand label -name "sub:How-to-select-a-few-records" - -\end_inset - - -\end_layout - -\begin_layout Standard -To quickly get an overview of your data you can limit the data stream to - show a few records. - This also very useful to test the pipeline with a few records if you are - setting up a complex analysis using several biopieces. - That way you can inspect that all goes well before analyzing and waiting - for the full data set. - All of the read_ biopieces have the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num switch which will take a number as argument and only that number of - records will be read. - So to read in the first 10 FASTA entries from a file: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna --num=10 -\end_layout - -\begin_layout Standard -Another way of doing this is to use -\series bold -head_records -\series default - will limit the stream to show the first 10 records (default): -\end_layout - -\begin_layout LyX-Code -... - | head_records -\end_layout - -\begin_layout Standard -Using -\series bold -head_records -\series default - directly after one of the read_ biopieces will be a lot slower than - using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num switch with the read_ biopieces, however, -\series bold -head_records -\series default - can also be used to limit the output from all the other biopieces. - It is also possible to give -\series bold -head_records -\series default - a number of records to show using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num switch. - So to display the first 100 records do: -\end_layout - -\begin_layout LyX-Code -... - | head_records --num=100 -\end_layout - -\begin_layout Subsection -How to select random records? -\begin_inset LatexCommand label -name "sub:How-to-select-random-records" - -\end_inset - - -\end_layout - -\begin_layout Standard -If you want to inspect a number of random records from the stream this can - be done with the -\series bold -random_records -\series default - biopiece. - So if you have 1 mio records in the stream and you want to select 1000 - random records do: -\end_layout - -\begin_layout LyX-Code -... - | random_records --num=1000 -\end_layout - -\begin_layout Subsection -How to count all records in the data stream? -\end_layout - -\begin_layout Standard -To count all the records in the data stream use -\series bold -count_records -\series default -, which adds one record (which is not included in the count) to the data - stream. - So to count the number of sequences in a FASTA file you can do this: -\end_layout - -\begin_layout LyX-Code -cat test.fna | read_fasta | count_records --no_stream -\end_layout - -\begin_layout Standard -Which will write the last record containing the count to 'stdout': -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -count_records: 630 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize ---- -\end_layout - -\begin_layout Standard -It is also possible to write the count to file using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out switch. -\end_layout - -\begin_layout Subsection -How to get the length of record values? -\begin_inset LatexCommand label -name "sub:How-to-get-value_length" - -\end_inset - - -\end_layout - -\begin_layout Standard -Use the -\series bold -length_vals -\series default - biopiece to get the length of each value for a comma separated list of - keys: -\end_layout - -\begin_layout LyX-Code -... - | length_vals --keys=HIT,PATTERN -\end_layout - -\begin_layout Subsection -How to grab specific records? -\begin_inset LatexCommand label -name "sub:How-to-grab" - -\end_inset - - -\end_layout - -\begin_layout Standard -The biopiece -\series bold -grab -\series default - is related to the Unix grep and locates records based on matching keys - and/or values using either a pattern, a Perl regex, or a numerical evaluation. - To easily -\series bold -grab -\series default - all records in the stream that has any mentioning of the pattern 'human' - just pipe the data stream through -\series bold -grab -\series default - like this: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human -\end_layout - -\begin_layout Standard -This will search for the pattern 'human' in all keys and all values. - The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern switch takes a comma separated list of patterns, so in order to - match multiple patterns do: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human,mouse -\end_layout - -\begin_layout Standard -It is also possible to use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in switch instead of -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern. - -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in is used to read a file with one pattern per line: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern_in=patterns.txt -\end_layout - -\begin_layout Standard -If you want the opposite result --- to find all records that does not match - the patterns, add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -invert switch, which not only works with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern switch, but also with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -regex and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -eval: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human --invert -\end_layout - -\begin_layout Standard -If you want to search the record keys only, -\emph on -e.g. - -\emph default - to find all records containing the key SEQ you can add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys_only switch. - This will prevent matching of SEQ in any record value, and in fact SEQ - is a not uncommon peptide sequence you could get an unwanted record. - Also, this will give an increase in speed since only the keys are searched: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=SEQ --keys_only -\end_layout - -\begin_layout Standard -However, if you are interested in finding the peptide sequence SEQ and not - the SEQ key, just add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -vals_only switch instead: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=SEQ --vals_only -\end_layout - -\begin_layout Standard -Also, if you want to grab for certain key/value pairs you can supply a comma - separated list of keys whos values will then be searched using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys switch. - This is handy if your records contain large genomic sequences and you dont - want to search the entire sequence for -\emph on -e.g. - -\emph default - the organism name --- it is much faster to tell -\series bold -grab -\series default - which keys to search the value for: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human --keys=SEQ_NAME -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout Standard -It is also possible to invoke flexible matching using regex (regular expressions -) instead of simple pattern matching. - In -\series bold -grab -\series default - the regex engine is Perl based and allows use of different type of wild - cards, alternatives, -\emph on -etc -\emph default - -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://perldoc.perl.org/perlreref.html" - -\end_inset - - -\end_layout - -\end_inset - -. - If you want to -\series bold -grab -\series default - records withs the sequence ATCG or GCTA you can do this: -\end_layout - -\begin_layout LyX-Code -... - | grab --regex='ATCG|GCTA' -\end_layout - -\begin_layout Standard -Or if you want to find sequences beginning with ATCG: -\end_layout - -\begin_layout LyX-Code -... - | grab --regex='^ATCG' -\end_layout - -\begin_layout Standard -You can also use -\series bold -grab -\series default - to locate records that fulfill a numerical property using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -eval switch witch takes an expression in three parts. - The first part is the key that holds the value we want to evaluate, the - second part holds one the six operators: -\end_layout - -\begin_layout Enumerate -Greater than: > -\end_layout - -\begin_layout Enumerate -Greater than or equal to: >= -\end_layout - -\begin_layout Enumerate -Less than: < -\end_layout - -\begin_layout Enumerate -Less than or equal to: <= -\end_layout - -\begin_layout Enumerate -Equal to: = -\end_layout - -\begin_layout Enumerate -Not equal to: != -\end_layout - -\begin_layout Enumerate -String wise equal to: eq -\end_layout - -\begin_layout Enumerate -String wise not equal to: ne -\end_layout - -\begin_layout Standard -And finally comes the number used in the evaluation. - So to -\series bold -grab -\series default - all records with a sequence length greater than 30: -\end_layout - -\begin_layout LyX-Code -... - length_seq | grab --eval='SEQ_LEN > 30' -\end_layout - -\begin_layout Standard -If you want to locate all records containing the pattern 'human' and where - the sequence length is greater that 30, you do this by running the stream - through -\series bold -grab -\series default - twice: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern='human' | length_seq | grab --eval='SEQ_LEN > 30' -\end_layout - -\begin_layout Standard -Finally, it is possible to do fast matching of expressions from a file using - the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -exact switch. - Each of these expressions has to be matched exactly over the entrie length, - which if useful if you have a file with accession numbers, that you want - to locate in the stream: -\end_layout - -\begin_layout LyX-Code -... - | grab --exact acc_no.txt | ... -\end_layout - -\begin_layout Standard -Using -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -exact is much faster than using -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in, because with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -exact the expression has to be complete matches, where -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in looks for subpatterns. -\end_layout - -\begin_layout Standard -NB! To get the best speed performance, use the most restrictive -\series bold -grab -\series default - first. -\end_layout - -\begin_layout Subsection -How to remove keys from records? -\end_layout - -\begin_layout Standard -To remove one or more specific keys from all records in the data stream - use -\series bold -remove_keys -\series default - like this: -\end_layout - -\begin_layout LyX-Code -... - | remove_keys --keys=SEQ,SEQ_NAME -\end_layout - -\begin_layout Standard -In the above example SEQ and SEQ_NAME will be removed from all records if - they exists in these. - If all keys are removed from a record, then the record will be removed. -\end_layout - -\begin_layout Subsection -How to rename keys in records? -\end_layout - -\begin_layout Standard -Sometimes you want to rename a record key, -\emph on -e.g. - -\emph default - if you have read in a two column table with sequence name and sequence - in each column (see -\begin_inset LatexCommand ref -reference "sub:How-to-read-table" - -\end_inset - -) without specifying the key names, then the sequence name will be called - V0 and the sequence V1 as default in the -\series bold -read_tab -\series default - biopiece. - To rename the V0 and V1 keys we need to run the stream through -\series bold -rename_keys -\series default - twice (one for each key to rename): -\end_layout - -\begin_layout LyX-Code -... - | rename_keys --keys=V0,SEQ_NAME | rename_keys --keys=V1,SEQ -\end_layout - -\begin_layout Standard -The first instance of -\series bold -rename_keys -\series default - replaces all the V0 keys with SEQ_NAME, and the second instance of -\series bold -rename_keys -\series default - replaces all the V1 keys with SEQ. - -\emph on -Et viola -\emph default - the data can now be used in the biopieces that requires these keys. -\end_layout - -\begin_layout Section -Manipulating Sequences -\end_layout - -\begin_layout Subsection -How to get sequence lengths? -\end_layout - -\begin_layout Standard -The length for sequences in records can be determined with -\series bold -length_seq -\series default -, which adds the key SEQ_LEN to each record with the sequence length as - the value. - It also generates an extra record that is emitted last with the key TOTAL_SEQ_L -EN showing the total length of all the sequences. -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_seq -\end_layout - -\begin_layout Standard -It is also possible to determine the sequence length using the generic tool - -\series bold -length_vals -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-get-value_length" - -\end_inset - -, which determines the length of the values for a given list of keys: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_vals --keys=SEQ -\end_layout - -\begin_layout Standard -To obtain the total length of all sequences use -\series bold -sum_vals -\series default - like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_vals --keys=SEQ -\end_layout - -\begin_layout LyX-Code -| sum_vals --keys=SEQ_LEN -\end_layout - -\begin_layout Standard -The biopiece -\series bold -analyze_seq -\series default - will also determine the length of each sequence (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-analyze" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to analyze sequence composition? -\begin_inset LatexCommand label -name "sub:How-to-analyze" - -\end_inset - - -\end_layout - -\begin_layout Standard -If you want to find out the sequence type, composition, length, as well - as GC content, indel content and proportions of soft and hard masked sequence, - then use -\series bold -analyze_seq -\series default -. - This handy biopiece will determine all these things per sequence from which - it is easy to get an overview using the -\series bold -write_tab -\series default - biopiece to output a table (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-tab" - -\end_inset - -). - So in order to determine the sequence composition of a FASTA file with - just one entry containing the sequence 'ATCG' we just read the data with - -\series bold -read_fasta -\series default - and run the output through -\series bold -analyze_seq -\series default - which will add the analysis to the record like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq ... -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:D: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -MIX_INDEX: 0.55 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:W: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:G: 16 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SOFT_MASK%: 63.75 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:B: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:V: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -HARD_MASK%: 0.00 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:H: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:S: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:N: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:.: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -GC%: 35.00 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:A: 8 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:Y: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:M: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:T: 44 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SEQ_TYPE: DNA -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:K: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:~: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SEQ: TTTCAGTTTGGGACGGAGTAAGGCCTTCCtttttttttttttttttttttttttttttgagaccgagtcttgctc -tgtcg -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SEQ_LEN: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -80 RES:R: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:C: 12 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:-: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:U: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize ---- -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout Standard -Now to make a table of how may As, Ts, Cs, and Gs you can add the following: -\end_layout - -\begin_layout LyX-Code -... - | analyze_seq | write_tab --keys=RES:A,RES:T,RES:C,RES:G -\end_layout - -\begin_layout Standard -Or if you want to see the proportions of hard and soft masked sequence: -\end_layout - -\begin_layout LyX-Code -... - | analyse_seq | write_tab --keys=HARD_MASK%,SOFT_MASK% -\end_layout - -\begin_layout Standard -If you have a stack of sequences in one file and you want to determine the - mean GC content you can do it using the -\series bold -mean_vals -\series default - biopiece: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq | mean_vals --keys=GC% -\end_layout - -\begin_layout Standard -Or if you want the total count of Ns you can use -\series bold -sum_vals -\series default - like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq | sum_vals --keys=RES:N -\end_layout - -\begin_layout Standard -The MIX_INDEX key is calculated as the count of the most common residue - over the sequence length, and can be used as a cut-off for removing sequence - tags consisting of mostly one nucleotide: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq | grab --eval='MIX_INDEX<0.85' -\end_layout - -\begin_layout Subsection -How to extract subsequences? -\begin_inset LatexCommand label -name "sub:How-to-extract" - -\end_inset - - -\end_layout - -\begin_layout Standard -In order to extract a subsequence from a longer sequence use the biopiece - extract_seq, which will replace the sequence in the record with the subsequence - (this behaviour should probably be modified to be dependant of a -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -replace or a -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_replace switch -\begin_inset Note Note -status collapsed - -\begin_layout Standard -also in split_seq -\end_layout - -\end_inset - -). - So to extract the first 20 residues from all sequences do (first residue - is designated 1): -\end_layout - -\begin_layout LyX-Code -... - | extract_seq --beg=1 --len=20 -\end_layout - -\begin_layout Standard -You can also specify a begin and end coordinate set: -\end_layout - -\begin_layout LyX-Code -... - | extract_seq --beg=20 --end=40 -\end_layout - -\begin_layout Standard -If you want the subsequences from position 20 to the sequence end do: -\end_layout - -\begin_layout LyX-Code -... - | extract_seq --beg=20 -\end_layout - -\begin_layout Standard -If you want to extract subsequences a given distance from the sequence end - you can do this by reversing the sequence with the biopiece -\series bold -reverse_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-reverse-seq" - -\end_inset - -, followed by -\series bold -extract_seq -\series default - to get the subsequence, and then -\series bold -reverse_seq -\series default - again to get the subsequence back in the original orientation: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | reverse_seq -\end_layout - -\begin_layout LyX-Code -| extract_seq --beg=10 --len=10 | reverse_seq -\end_layout - -\begin_layout Subsection -How to get genomic sequence? -\begin_inset LatexCommand label -name "sub:How-to-get-genomic-sequence" - -\end_inset - - -\end_layout - -\begin_layout Standard -The biopiece -\series bold -get_genomic_seq -\series default - can extract subsequences for a given genome specified with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -genome switch explicitly using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -beg and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -end/ -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -len switches: -\end_layout - -\begin_layout LyX-Code -get_genome_seq --genome= --beg=1 --len=100 -\end_layout - -\begin_layout Standard -Alternatively, -\series bold -get_genome_seq -\series default - can be used to append the corresponding sequence to BED, PSL, and BLAST - records: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= | get_genome_seq --genome= -\end_layout - -\begin_layout Standard -It is also possible to include flaking sequence using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -flank switch. - So to include 50 nucleotides upstream and 50 nucleotides downstream for - each BED entry do: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= | get_genome_seq --genome= --flank=50 -\end_layout - -\begin_layout Subsection -How to upper-case sequences? -\end_layout - -\begin_layout Standard -Sequences can be shifted from lower case to upper case using -\series bold -uppercase_seq -\series default -: -\end_layout - -\begin_layout LyX-Code -... - | uppercase_seq -\end_layout - -\begin_layout Subsection -How to reverse sequences? -\begin_inset LatexCommand label -name "sub:How-to-reverse-seq" - -\end_inset - - -\end_layout - -\begin_layout Standard -The order of residues in a sequence can be reversed using reverse_seq: -\end_layout - -\begin_layout LyX-Code -... - | reverse_seq -\end_layout - -\begin_layout Standard -Note that in order to reverse/complement a sequence you also need the -\series bold -complement_seq -\series default - biopiece (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-complement" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to complement sequences? -\begin_inset LatexCommand label -name "sub:How-to-complement" - -\end_inset - - -\end_layout - -\begin_layout Standard -DNA and RNA sequences can be complemented with -\series bold -complement_seq -\series default -, which automagically determines the sequence type: -\end_layout - -\begin_layout LyX-Code -... - | complement_seq -\end_layout - -\begin_layout Standard -Note that in order to reverse/complement a sequence you also need the -\series bold -reverse_seq -\series default - biopiece (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-reverse-seq" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to remove indels from sequnces? -\end_layout - -\begin_layout Standard -Indels can be removed from sequences with the -\series bold -remove_indels -\series default - biopiece. - This is useful if you have aligned some sequences (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-align" - -\end_inset - -) and extracted (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-extract" - -\end_inset - -) a block of subsequences from the alignment and you want to use these sequence - in a search where you need to remove the indels first. - '-', '~', and '.' are considered indels: -\end_layout - -\begin_layout LyX-Code -... - | remove_indels -\end_layout - -\begin_layout Subsection -How to shuffle sequences? -\end_layout - -\begin_layout Standard -All residues in sequences in the stream can be shuffled to random positions - using the -\series bold -shuffle_seq -\series default - biopiece: -\end_layout - -\begin_layout LyX-Code -... - | shuffle_seq -\end_layout - -\begin_layout Subsection -How to split sequences into overlapping subsequences? -\end_layout - -\begin_layout Standard -Sequences can be slit into overlapping subsequences with the -\series bold -split_seq -\series default - biopiece. -\end_layout - -\begin_layout LyX-Code -... - | split_seq --word_size=20 --uniq -\end_layout - -\begin_layout Subsection -How to determine the oligo frequency? -\end_layout - -\begin_layout Standard -In order to determine if any oligo usage is over represented in one or more - sequences you can determine the frequency of oligos of a given size with - -\series bold -oligo_freq -\series default -: -\end_layout - -\begin_layout LyX-Code -... - | oligo_freq --word_size=4 -\end_layout - -\begin_layout Standard -And if you have more than one sequence and want to accumulate the frequences - you need the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -all switch: -\end_layout - -\begin_layout LyX-Code -... - | oligo_freq --word_size=4 --all -\end_layout - -\begin_layout Standard -To get a meaningful result you need to write the resulting frequencies as - a table with -\series bold -write_tab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-tab" - -\end_inset - -), but first it is important to -\series bold -grab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-grab" - -\end_inset - -) the records with the frequencies to avoid full length sequences in the - table: -\end_layout - -\begin_layout LyX-Code -... - | oligo_freq --word_size=4 --all | grab --pattern=OLIGO --keys_only -\end_layout - -\begin_layout LyX-Code -| write_tab --no_stream -\end_layout - -\begin_layout Standard -And the resulting frequency table can be sorted with Unix sort (man sort). -\end_layout - -\begin_layout Subsection -How to search for sequences in genomes? -\end_layout - -\begin_layout Standard -See the following biopiece: -\end_layout - -\begin_layout Itemize - -\series bold -patscan_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-patscan" - -\end_inset - - -\end_layout - -\begin_layout Itemize - -\series bold -blat_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAT" - -\end_inset - - -\end_layout - -\begin_layout Itemize - -\series bold -blast_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAST" - -\end_inset - - -\end_layout - -\begin_layout Itemize - -\series bold -vmatch_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-Vmatch" - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to search sequences for a pattern? -\begin_inset LatexCommand label -name "sub:How-to-use-patscan" - -\end_inset - - -\end_layout - -\begin_layout Standard -It is possible to search sequences in the data stream for patterns using - the -\series bold -patscan_seq -\series default - biopiece which utilizes the powerful scan_for_matches engine. - Consult the documentation for scan_for_matches in order to learn how to - define patterns (the documentation is included in Appendix\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sec:scan_for_matches-README" - -\end_inset - -). -\end_layout - -\begin_layout Standard -To search all sequences for a simple pattern consisting of the sequence - ATCGATCG allowing for 3 mismatches, 2 insertions and 1 deletion: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | patscan_seq --pattern='ATCGATCG[3,2,1]' -\end_layout - -\begin_layout Standard -The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern switch takes a comma seperated list of patterns, so if you want - to search for more that one pattern do: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern='ATCGATCG[3,2,1],GCTAGCTA[3,2,1]' -\end_layout - -\begin_layout Standard -It is also possible to have a list of different patterns to search for in - a file with one pattern per line. - In order to get -\series bold -patscan_seq -\series default - to read these patterns use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in switch: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern_in= -\end_layout - -\begin_layout Standard -To also scan the complementary strand in nucleotide sequences ( -\series bold -patscan_seq -\series default - automagically determines the sequence type) you need to add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -comp switch: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern= --comp -\end_layout - -\begin_layout Standard -It is also possible to use -\series bold -patscan_seq -\series default - to output those records that does not contain a certain pattern by using - the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -invert switch: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern= --invert -\end_layout - -\begin_layout Standard -Finally, -\series bold -patscan_seq -\series default - can also scan for patterns in a given genome sequence, instead of sequences - in the stream, using the -\begin_inset ERT -status open - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -genome switch: -\end_layout - -\begin_layout LyX-Code -patscan --pattern= --genome= -\end_layout - -\begin_layout Subsection -How to use BLAT for sequence search? -\begin_inset LatexCommand label -name "sub:How-to-use-BLAT" - -\end_inset - - -\end_layout - -\begin_layout Standard -Sequences in the data stream can be matched against supported genomes using - -\series bold -blat_seq -\series default - which is a biopiece using BLAT as the name might suggest. - Currently only Mouse and Human genomes are available and it is not possible - to use OOC files since there is still a need for a local repository for - genome files. - Otherwise it is just: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | blat_seq --genome= -\end_layout - -\begin_layout Standard -The search results can then be written to file with -\series bold -write_psl -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-PSL" - -\end_inset - -) or -\series bold -write_bed -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-BED" - -\end_inset - -) allthough with -\series bold -write_bed -\series default - some information will be lost). - It is also possible to plot chromosome distribution of the search results - using -\series bold -plot_chrdist -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-plot-chrdist" - -\end_inset - -) or the distribution of the match lengths using -\series bold -plot_lendist -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-plot-lendist" - -\end_inset - -) or a karyogram with the hits using -\series bold -plot_karyogram -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-plot-karyogram" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to use BLAST for sequence search? -\begin_inset LatexCommand label -name "sub:How-to-use-BLAST" - -\end_inset - - -\end_layout - -\begin_layout Standard -Two biopieces exist for blasting sequences: -\series bold -create_blast_db -\series default - is used to create the BLAST database required for BLAST which is queried - using the biopiece -\series bold -blast_seq -\series default -. - So in order to create a BLAST database from sequences in the data stream - you simple run: -\end_layout - -\begin_layout LyX-Code -... - | create_blast_db --database=my_database --no_stream -\end_layout - -\begin_layout Standard -The type of sequence to use for the database is automagically determined - by -\series bold -create_blast_db -\series default -, but don't have a mixture of peptide and nucleic acids sequences in the - stream. - The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -database switch takes a path as argument, but will default to 'blastdb_ if not set. -\end_layout - -\begin_layout Standard -The resulting database can now be queried with sequences in another data - stream using -\series bold -blast_seq -\series default -: -\end_layout - -\begin_layout LyX-Code -... - | blast_seq --database=my_database -\end_layout - -\begin_layout Standard -Again, the sequence type is determined automagically and the appropriate - BLAST program is guessed (see below table), however, the program name can - be overruled with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -program switch. -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Tabular - - - - - - - -\begin_inset Text - -\begin_layout Standard -Subject sequence -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Query sequence -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Program guess -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -blastn -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -blastp -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -blastx -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -tblastn -\end_layout - -\end_inset - - - - -\end_inset - - -\end_layout - -\begin_layout Standard -Finally, it is also possible to use -\series bold -blast_seq -\series default - for blasting sequences agains a preformatted genome using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -genome switch instead of the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -database switch: -\end_layout - -\begin_layout LyX-Code -... - | blast_seq --genome= -\end_layout - -\begin_layout Subsection -How to use Vmatch for sequence search? -\begin_inset LatexCommand label -name "sub:How-to-use-Vmatch" - -\end_inset - - -\end_layout - -\begin_layout Standard -The powerful suffix array software package Vmatch -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://www.vmatch.de/" - -\end_inset - - -\end_layout - -\end_inset - - can be used for exact mapping of sequences against indexed genomes using - the biopiece -\series bold -vmatch_seq -\series default -, which will e.g. - map 700000 ESTs to the human genome locating all 160 mio hits in less than - an hour. - Only nucleotide sequences and sequences longer than 11 nucleotides will - be mapped. - It is recommended that sequences consisting of mostly one nucleotide type - are removed. - This can be done with the -\series bold -analyze_seq -\series default - biopiece -\begin_inset LatexCommand eqref -reference "sub:How-to-analyze" - -\end_inset - -. -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= -\end_layout - -\begin_layout Standard -It is also possible to allow for mismatches using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -hamming_dist switch. - So to allow for 2 mismatches: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=2 -\end_layout - -\begin_layout Standard -Or to allow for 10% mismathing nucleotides: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=10p -\end_layout - -\begin_layout Standard -To allow both indels and mismatches use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -edit_dist switch. - So to allow for one mismatch or one indel: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=1 -\end_layout - -\begin_layout Standard -Or to allow for 5% indels or mismatches: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=5p -\end_layout - -\begin_layout Standard -Note that using -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -hamming_dist or -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -edit_dist greatly slows down vmatch considerably --- use with care. -\end_layout - -\begin_layout Standard -The resulting SCORE key can be replaced to hold the number of genome matches - of a given sequence (multi-mappers) is the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -count switch is given. -\end_layout - -\begin_layout Subsection -How to find all matches between sequences? -\begin_inset LatexCommand label -name "sub:How-to-find-matches" - -\end_inset - - -\end_layout - -\begin_layout Standard -All matches between two sequences can be determined with the biopiece -\series bold -match_seq -\series default -. - The match finding engine underneath the hood of -\series bold -match_seq -\series default - is the super fast suffix tree program MUMmer -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://mummer.sourceforge.net/" - -\end_inset - - -\end_layout - -\end_inset - -, which will locate all forward and reverse matches between huge sequences - in a matter of minutes (if the repeat count is not too high and if the - word size used is appropriate). - Matching two -\emph on -Helicobacter pylori -\emph default - genomes (1.7Mbp) takes around 10 seconds: -\end_layout - -\begin_layout LyX-Code -... - | match_seq --word_size=20 --direction=both -\end_layout - -\begin_layout Standard -The output from -\series bold -match_seq -\series default - can be used to generate a dot plot with -\series bold -plot_matches -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-generate-dotplot" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to align sequences? -\begin_inset LatexCommand label -name "sub:How-to-align" - -\end_inset - - -\end_layout - -\begin_layout Standard -Sequences in the stream can be aligned with the -\series bold -align_seq -\series default - biopiece that uses Muscle -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://www.drive5.com/muscle/muscle.html" - -\end_inset - - -\end_layout - -\end_inset - - as aligment engine. - Currently you cannot change any of the Muscle alignment parameters and - -\series bold -align_seq -\series default - will create an alignment based on the defaults (which are really good!): -\end_layout - -\begin_layout LyX-Code -... - | align_seq -\end_layout - -\begin_layout Standard -The aligned output can be written to file in FASTA format using -\series bold -write_fasta -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-fasta" - -\end_inset - -) or in pretty text using -\series bold -write_align -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-alignment" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to create a weight matrix? -\end_layout - -\begin_layout Standard -If you want a weight matrix to show the sequence composition of a stack - of sequences you can use the biopiece create_weight_matrix: -\end_layout - -\begin_layout LyX-Code -... - | create_weight_matrix -\end_layout - -\begin_layout Standard -The result can be output in percent using the -\begin_inset ERT -status open - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -percent switch: -\end_layout - -\begin_layout LyX-Code -... - | create_weight_matrix --percent -\end_layout - -\begin_layout Standard -The weight matrix can be written as tabular output with -\series bold -write_tab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-tab" - -\end_inset - -) after removeing the records containing SEQ with -\series bold -grab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-grab" - -\end_inset - -): -\end_layout - -\begin_layout LyX-Code -... - | create_weight_matrix | grab --invert --keys=SEQ --keys_only -\end_layout - -\begin_layout LyX-Code -| write_tab --no_stream -\end_layout - -\begin_layout Standard -The V0 column will hold the residue, while the rest of the columns will - hold the frequencies for each sequence position. -\end_layout - -\begin_layout Section -Plotting -\end_layout - -\begin_layout Standard -There exists several biopieces for plotting. - Some of these are based on GNUplot -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://www.gnuplot.info/" - -\end_inset - - -\end_layout - -\end_inset - -, which is an extremely powerful platform to generate all sorts of plots - and even though GNUplot has quite a steep learning curve, the biopieces - utilizing GNUplot are simple to use. - GNUplot is able to output a lot of different formats (called terminals - in GNUplot), but the biopieces focusses on three formats only: -\end_layout - -\begin_layout Enumerate -The 'dumb' terminal is default to the GNUplot based biopieces and will output - a plot in crude ASCII text (Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Dumb-terminal" - -\end_inset - -). - This is quite nice for a quick and dirty plot to get an overview of your - data . -\end_layout - -\begin_layout Enumerate -The 'post' or 'postscript' terminal output postscript code which is publication - grade graphics that can be viewed with applications such as Ghostview, - Photoshop, and Preview. -\end_layout - -\begin_layout Enumerate -The 'svg' terminal output's scalable vector graphics (SVG) which is a vector - based format. - SVG is great because you can edit the resulting plot using Photoshop or - Inkscape -\begin_inset Foot -status collapsed - -\begin_layout Standard -Inkscape is a really handy drawing program that is free and open source. - Availble at -\begin_inset LatexCommand htmlurl -target "http://www.inkscape.org" - -\end_inset - - -\end_layout - -\end_inset - - if you want to add additional labels, captions, arrows, and so on and then - save the result in different formats, such as postscript without loosing - resolution. -\end_layout - -\begin_layout Standard -The biopieces for plotting that are not based on GNUplot only output SVG - (that may change in the future). -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename lendist_ascii.png - lyxscale 70 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Dumb-terminal" - -\end_inset - -Dumb terminal -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -The output of a length distribution plot in the default 'dumb terminal' - to the terminal window. - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a histogram? -\begin_inset LatexCommand label -name "How-to-plot-histogram" - -\end_inset - - -\end_layout - -\begin_layout Standard -A generic histogram for a given value can be plotted with the biopiece -\series bold -plot_histogram -\series default - (Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Histogram" - -\end_inset - -): -\end_layout - -\begin_layout LyX-Code -... - | plot_histogram --key=TISSUE --no_stream -\end_layout - -\begin_layout Standard -(Figure missing) -\end_layout - -\begin_layout Standard -\noindent -\align left -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename histogram.png - lyxscale 70 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Histogram" - -\end_inset - -Histogram -\end_layout - -\end_inset - - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a length distribution? -\begin_inset LatexCommand label -name "sub:How-to-plot-lendist" - -\end_inset - - -\end_layout - -\begin_layout Standard -Plotting of length distributions, weather sequence lengths, patterns lengths, - hit lengths, -\emph on -etc. - -\emph default - is a really handy thing and can be done with the the biopiece -\series bold -plot_lendist -\series default -. - If you have a file with FASTA entries and want to plot the length distribution - you do it like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_seq -\end_layout - -\begin_layout LyX-Code -| plot_lendist --key=SEQ_LEN --no_stream -\end_layout - -\begin_layout Standard -The result will be written to the default dumb terminal and will look like - Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Dumb-terminal" - -\end_inset - -. -\end_layout - -\begin_layout Standard -If you instead want the result in postscript format you can do: -\end_layout - -\begin_layout LyX-Code -... - | plot_lendist --key=SEQ_LEN --terminal=post --result_out=file.ps -\end_layout - -\begin_layout Standard -That will generate the plot and save it to file, but not interrupt the data - stream which can then be used in further analysis. - You can also save the plot implicetly using '>', however, it is then important - to terminate the stream with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch: -\end_layout - -\begin_layout LyX-Code -... - | plot_lendist --key=SEQ_LEN --terminal=post --no_stream > file.ps -\end_layout - -\begin_layout Standard -The resulting plot can be seen in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Length-distribution" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard - -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename lendist.ps - lyxscale 50 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Length-distribution" - -\end_inset - -Length distribution -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -Length distribution of 630 piRNA like RNAs. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a chromosome distribution? -\begin_inset LatexCommand label -name "sub:How-to-plot-chrdist" - -\end_inset - - -\end_layout - -\begin_layout Standard -If you have the result of a sequence search against a multi chromosome genome, - it is very practical to be able to plot the distribution of search hits - on the different chromosomes. - This can be done with -\series bold -plot_chrdist -\series default -: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | blat_genome | plot_chrdist --no_stream -\end_layout - -\begin_layout Standard -The above example will result in a crude plot using the 'dumb' terminal, - and if you want to mess around with the results from the BLAT search you - probably want to save the result to file first (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-PSL" - -\end_inset - -). - To plot the chromosome distribution from the saved search result you can - do: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in=file.bed | plot_chrdist --terminal=post --result_out=plot.ps -\end_layout - -\begin_layout Standard -That will result in the output show in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Chromosome-distribution" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard - -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename chrdist.ps - lyxscale 50 - width 12cm - rotateAngle 90 - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Chromosome-distribution" - -\end_inset - -Chromosome distribution -\end_layout - -\end_inset - - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to generate a dotplot? -\begin_inset LatexCommand label -name "sub:How-to-generate-dotplot" - -\end_inset - - -\end_layout - -\begin_layout Standard -A dotplot is a powerful way to get an overview of the size and location - of sequence insertions, deletions, and duplications between two sequences. - Generating a dotplot with biopieces is a two step process where you initially - find all matches between two sequences using the tool -\series bold -match_seq -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-find-matches" - -\end_inset - -) and plot the resulting matches with -\series bold -plot_matches -\series default -. - Matching and plotting two -\emph on -Helicobacter pylori -\emph default - genomes (1.7Mbp) takes around 10 seconds: -\end_layout - -\begin_layout LyX-Code -... - | match_seq | plot_matches --terminal=post --result_out=plot.ps -\end_layout - -\begin_layout Standard -The resulting dotplot is in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Dotplot" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename dotplot.ps - lyxscale 50 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Dotplot" - -\end_inset - -Dotplot -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -Forward matches are displayed in green while reverse matches are displayed - in red. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a sequence logo? -\end_layout - -\begin_layout Standard -Sequence logos can be generate with -\series bold -plot_seqlogo -\series default -. - The sequnce type is determined automagically and an entropy scale of 2 - bits and 4 bits is used for nucleotide and peptide sequences, respectively -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand htmlurl -target "http://www.ccrnp.ncifcrf.gov/~toms/paper/hawaii/latex/node5.html" - -\end_inset - - -\end_layout - -\end_inset - -. -\end_layout - -\begin_layout LyX-Code -... - | plot_seqlogo --no_stream --result_out=seqlogo.svg -\end_layout - -\begin_layout Standard -An example of a sequence logo can be seen in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Sequence-logo" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename seqlogo.png - lyxscale 50 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Sequence-logo" - -\end_inset - -Sequence logo -\end_layout - -\end_inset - - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a karyogram? -\begin_inset LatexCommand label -name "sub:How-to-plot-karyogram" - -\end_inset - - -\end_layout - -\begin_layout Standard -To plot search hits on genomes use -\series bold -plot_karyogram -\series default -, which will output a nice karyogram in SVG graphics: -\end_layout - -\begin_layout LyX-Code -... - | plot_karyogram --result_out=karyogram.svg -\end_layout - -\begin_layout Standard -The banding data is taken from the UCSC genome browser database and currently - only Human and Mouse is supported. - Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Karyogram" - -\end_inset - - shows the distribution of piRNA like RNAs matched to the Human genome. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename karyogram.png - lyxscale 35 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Karyogram" - -\end_inset - -Karyogram -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -Hits from a search of piRNA like RNAs in the Human genome is displayed as - short horizontal bars. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Section -Uploading Results -\end_layout - -\begin_layout Subsection -How do I display my results in the UCSC Genome Browser? -\end_layout - -\begin_layout Standard -Results from the list of biopieces below can be uploaded directly to a local - mirror of the UCSC Genome Browser using the biopiece -\series bold -upload_to_ucsc -\series default -: -\end_layout - -\begin_layout Itemize -patscan_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-patscan" - -\end_inset - - -\end_layout - -\begin_layout Itemize -blat_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAT" - -\end_inset - - -\end_layout - -\begin_layout Itemize -blast_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAST" - -\end_inset - - -\end_layout - -\begin_layout Itemize -vmatch_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-Vmatch" - -\end_inset - - -\end_layout - -\begin_layout Standard -The syntax for uploading data the most simple way requires two mandatory - switches: -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -database, which is the UCSC database name (such as hg18, mm9, etc.) and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -table which should be the users initials followed by an underscore and a - short description of the data: -\end_layout - -\begin_layout LyX-Code -... - | upload_to_ucsc --database=hg18 --table=mah_snoRNAs -\end_layout - -\begin_layout Standard -The -\series bold -upload_to_ucsc -\series default - biopiece modifies the users ~/ucsc/my_tracks.ra file automagically (a backup - is created with the name ~/ucsc/my_tracks.ra~) with default values that - can be overridden using the following switches: -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -short_label - Short label for track - Default=database->table -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -long_label - Long label for track - Default=database->table -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -group - Track group name - Default= -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -priority - Track display priority - Default=1 -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -color - Track color - Default=147,73,42 -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -chunk_size - Chunks for loading - Default=10000000 -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -visibility - Track visibility - Default=pack -\end_layout - -\begin_layout Standard -Also, data in BED or PSL format can be uploaded with -\series bold -upload_to_ucsc -\series default - as long as these reference to genomes and chromosomes existing in the UCSC - Genome Browser: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= | upload_to_ucsc ... -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code -read_psl --data_in= | upload_to_ucsc ... -\end_layout - -\begin_layout Section -Power Scripting -\end_layout - -\begin_layout Standard -It is possible to do commandline scripting of biopiece records using Perl. - Because a biopiece record essentially is a hash structure, you can pass - records to -\series bold -bioscript -\series default - command, which is a wrapper around the Perl executable that allows direct - manipulations of the records using the power of Perl. -\end_layout - -\begin_layout Standard -In the below example we replace in all records the value to the CHR key - with a forthrunning number: -\end_layout - -\begin_layout LyX-Code -... - | bioscript 'while($r=get_record( -\backslash -*STDIN)){$r->{CHR}=$i++; put_record($r)}' -\end_layout - -\begin_layout Standard -Something more useful would probably be to create custom FASTA headers. - E.g. - if we read in a BED file, lookup the genomic sequence, create a custom - FASTA header with -\series bold -bioscript -\series default - and output FASTA entries: -\end_layout - -\begin_layout LyX-Code -... - | bioscript 'while($r=get_record( -\backslash -*STDIN)){$r->{SEQ_NAME}= // -\end_layout - -\begin_layout LyX-Code -join("_",$r->{CHR},$r->{CHR_BEG},$r->{CHR_END}); put_record($r)}' -\end_layout - -\begin_layout Standard -And the output: -\end_layout - -\begin_layout LyX-Code ->chr2L_21567527_21567550 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_693380_693403 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_13859534_13859557 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_9005090_9005113 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_2106825_2106848 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_14649031_14649054 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout Section -Trouble shooting -\end_layout - -\begin_layout Standard -Shoot the messenger! -\end_layout - -\begin_layout Section -\start_of_appendix -Keys -\begin_inset LatexCommand label -name "sec:Keys" - -\end_inset - - -\end_layout - -\begin_layout Standard -HIT -\end_layout - -\begin_layout Standard -HIT_BEG -\end_layout - -\begin_layout Standard -HIT_END -\end_layout - -\begin_layout Standard -HIT_LEN -\end_layout - -\begin_layout Standard -HIT_NAME -\end_layout - -\begin_layout Standard -PATTERN -\end_layout - -\begin_layout Section -Switches -\begin_inset LatexCommand label -name "sec:Switches" - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_in -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_out -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -data_in -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num -\end_layout - -\begin_layout Section -scan_for_matches README -\begin_inset LatexCommand label -name "sec:scan_for_matches-README" - -\end_inset - - -\end_layout - -\begin_layout LyX-Code - scan_for_matches: -\end_layout - -\begin_layout LyX-Code - A Program to Scan Nucleotide or Protein Sequences for Matching Patterns -\end_layout - -\begin_layout LyX-Code - Ross Overbeek -\end_layout - -\begin_layout LyX-Code - MCS -\end_layout - -\begin_layout LyX-Code - Argonne National Laboratory -\end_layout - -\begin_layout LyX-Code - Argonne, IL 60439 -\end_layout - -\begin_layout LyX-Code - USA -\end_layout - -\begin_layout LyX-Code -Scan_for_matches is a utility that we have written to search for -\end_layout - -\begin_layout LyX-Code -patterns in DNA and protein sequences. - I wrote most of the code, -\end_layout - -\begin_layout LyX-Code -although David Joerg and Morgan Price wrote sections of an -\end_layout - -\begin_layout LyX-Code -earlier version. - The whole notion of pattern matching has a rich -\end_layout - -\begin_layout LyX-Code -history, and we borrowed liberally from many sources. - However, it is -\end_layout - -\begin_layout LyX-Code -worth noting that we were strongly influenced by the elegant tools -\end_layout - -\begin_layout LyX-Code -developed and distributed by David Searls. - My intent is to make the -\end_layout - -\begin_layout LyX-Code -existing tool available to anyone in the research community that might -\end_layout - -\begin_layout LyX-Code -find it useful. - I will continue to try to fix bugs and make suggested -\end_layout - -\begin_layout LyX-Code -enhancements, at least until I feel that a superior tool exists. -\end_layout - -\begin_layout LyX-Code -Hence, I would appreciate it if all bug reports and suggestions are -\end_layout - -\begin_layout LyX-Code -directed to me at Overbeek@mcs.anl.gov. - -\end_layout - -\begin_layout LyX-Code -I will try to log all bug fixes and report them to users that send me -\end_layout - -\begin_layout LyX-Code -their email addresses. - I do not require that you give me your name -\end_layout - -\begin_layout LyX-Code -and address. - However, if you do give it to me, I will try to notify -\end_layout - -\begin_layout LyX-Code -you of serious problems as they are discovered. -\end_layout - -\begin_layout LyX-Code -Getting Started: -\end_layout - -\begin_layout LyX-Code - The distribution should contain at least the following programs: -\end_layout - -\begin_layout LyX-Code - README - This document -\end_layout - -\begin_layout LyX-Code - ggpunit.c - One of the two source files -\end_layout - -\begin_layout LyX-Code - scan_for_matches.c - The second source file -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - run_tests - A perl script to test things -\end_layout - -\begin_layout LyX-Code - show_hits - A handy perl script -\end_layout - -\begin_layout LyX-Code - test_dna_input - Test sequences for DNA -\end_layout - -\begin_layout LyX-Code - test_dna_patterns - Test patterns for DNA scan -\end_layout - -\begin_layout LyX-Code - test_output - Desired output from test -\end_layout - -\begin_layout LyX-Code - test_prot_input - Test protein sequences -\end_layout - -\begin_layout LyX-Code - test_prot_patterns - Test patterns for proteins -\end_layout - -\begin_layout LyX-Code - testit - a perl script used for test -\end_layout - -\begin_layout LyX-Code - Only the first three files are required. - The others are useful, -\end_layout - -\begin_layout LyX-Code - but only if you have Perl installed on your system. - If you do -\end_layout - -\begin_layout LyX-Code - have Perl, I suggest that you type -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - which perl -\end_layout - -\begin_layout LyX-Code - to find out where it installed. - On my system, I get the following -\end_layout - -\begin_layout LyX-Code - response: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - clone% which perl -\end_layout - -\begin_layout LyX-Code - /usr/local/bin/perl -\end_layout - -\begin_layout LyX-Code - indicating that Perl is installed in /usr/local/bin. - Anyway, once -\end_layout - -\begin_layout LyX-Code - you know where it is installed, edit the first line of files -\end_layout - -\begin_layout LyX-Code - testit -\end_layout - -\begin_layout LyX-Code - show_hits -\end_layout - -\begin_layout LyX-Code - replacing /usr/local/bin/perl with the appropriate location. - I -\end_layout - -\begin_layout LyX-Code - will assume that you can do this, although it is not critical (it -\end_layout - -\begin_layout LyX-Code - is needed only to test the installation and to use the "show_hits" -\end_layout - -\begin_layout LyX-Code - utility). - Perl is not required to actually install and run -\end_layout - -\begin_layout LyX-Code - scan_for_matches. - -\end_layout - -\begin_layout LyX-Code - If you do not have Perl, I suggest you get it and install it (it -\end_layout - -\begin_layout LyX-Code - is a wonderful utility). - Information about Perl and how to get it -\end_layout - -\begin_layout LyX-Code - can be found in the book "Programming Perl" by Larry Wall and -\end_layout - -\begin_layout LyX-Code - Randall L. - Schwartz, published by O'Reilly & Associates, Inc. -\end_layout - -\begin_layout LyX-Code - To get started, you will need to compile the program. - I do this -\end_layout - -\begin_layout LyX-Code - using -\end_layout - -\begin_layout LyX-Code - gcc -O -o scan_for_matches ggpunit.c scan_for_matches.c -\end_layout - -\begin_layout LyX-Code - If you do not use GNU C, use -\end_layout - -\begin_layout LyX-Code - cc -O -DCC -o scan_for_matches ggpunit.c scan_for_matches.c -\end_layout - -\begin_layout LyX-Code - which works on my Sun. - -\end_layout - -\begin_layout LyX-Code - Once you have compiled scan_for_matches, you can verify that it -\end_layout - -\begin_layout LyX-Code - works with -\end_layout - -\begin_layout LyX-Code - clone% run_tests tmp -\end_layout - -\begin_layout LyX-Code - clone% diff tmp test_output -\end_layout - -\begin_layout LyX-Code - You may get a few strange lines of the sort -\end_layout - -\begin_layout LyX-Code - clone% run_tests tmp -\end_layout - -\begin_layout LyX-Code - rm: tmp: No such file or directory -\end_layout - -\begin_layout LyX-Code - clone% diff tmp test_output -\end_layout - -\begin_layout LyX-Code - These should cause no concern. - However, if the "diff" shows that -\end_layout - -\begin_layout LyX-Code - tmp and test_output are different, contact me (you have a -\end_layout - -\begin_layout LyX-Code - problem). - -\end_layout - -\begin_layout LyX-Code - You should now be able to use scan_for_matches by following the -\end_layout - -\begin_layout LyX-Code - instructions given below (which is all the normal user should have -\end_layout - -\begin_layout LyX-Code - to understand, once things are installed properly). -\end_layout - -\begin_layout LyX-Code - ============================================================== -\end_layout - -\begin_layout LyX-Code -How to run scan_for_matches: -\end_layout - -\begin_layout LyX-Code - To run the program, you type need to create two files -\end_layout - -\begin_layout LyX-Code - 1. - the first file contains the pattern you wish to scan for; I'll -\end_layout - -\begin_layout LyX-Code - call this file pat_file in what follows (but any name is ok) -\end_layout - -\begin_layout LyX-Code - 2. - the second file contains a set of sequences to scan. - These -\end_layout - -\begin_layout LyX-Code - should be in "fasta format". - Just look at the contents of -\end_layout - -\begin_layout LyX-Code - test_dna_input to see examples of this format. - Basically, -\end_layout - -\begin_layout LyX-Code - each sequence begins with a line of the form -\end_layout - -\begin_layout LyX-Code - >sequence_id -\end_layout - -\begin_layout LyX-Code - and is followed by one or more lines containing the sequence. -\end_layout - -\begin_layout LyX-Code - Once these files have been created, you just use -\end_layout - -\begin_layout LyX-Code - scan_for_matches pat_file < input_file -\end_layout - -\begin_layout LyX-Code - to scan all of the input sequences for the given pattern. - As an -\end_layout - -\begin_layout LyX-Code - example, suppose that pat_file contains a single line of the form -\end_layout - -\begin_layout LyX-Code - p1=4...7 3...8 ~p1 -\end_layout - -\begin_layout LyX-Code - Then, -\end_layout - -\begin_layout LyX-Code - scan_for_matches pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - should produce two "hits". - When I run this on my machine, I get -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - >tst1:[6,27] -\end_layout - -\begin_layout LyX-Code - cguaacc ggttaacc gguuacg -\end_layout - -\begin_layout LyX-Code - >tst2:[6,27] -\end_layout - -\begin_layout LyX-Code - CGUAACC GGTTAACC GGUUACG -\end_layout - -\begin_layout LyX-Code - clone% -\end_layout - -\begin_layout LyX-Code -Simple Patterns Built by Matching Ranges and Reverse Complements -\end_layout - -\begin_layout LyX-Code - Let me first explain this simple pattern: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - p1=4...7 3...8 ~p1 -\end_layout - -\begin_layout LyX-Code - The pattern consists of three "pattern units" separated by spaces. -\end_layout - -\begin_layout LyX-Code - The first pattern unit is -\end_layout - -\begin_layout LyX-Code - p1=4...7 -\end_layout - -\begin_layout LyX-Code - which means "match 4 to 7 characters and call them p1". - The -\end_layout - -\begin_layout LyX-Code - second pattern unit is -\end_layout - -\begin_layout LyX-Code - 3...8 -\end_layout - -\begin_layout LyX-Code - which means "then match 3 to 8 characters". - The last pattern unit -\end_layout - -\begin_layout LyX-Code - is -\end_layout - -\begin_layout LyX-Code - ~p1 -\end_layout - -\begin_layout LyX-Code - which means "match the reverse complement of p1". - The first -\end_layout - -\begin_layout LyX-Code - reported hit is shown as -\end_layout - -\begin_layout LyX-Code - >tst1:[6,27] -\end_layout - -\begin_layout LyX-Code - cguaacc ggttaacc gguuacg -\end_layout - -\begin_layout LyX-Code - which states that characters 6 through 27 of sequence tst1 were -\end_layout - -\begin_layout LyX-Code - matched. - "cguaac" matched the first pattern unit, "ggttaacc" the -\end_layout - -\begin_layout LyX-Code - second, and "gguuacg" the third. - This is an example of a common -\end_layout - -\begin_layout LyX-Code - type of pattern used to search for sections of DNA or RNA that -\end_layout - -\begin_layout LyX-Code - would fold into a hairpin loop. -\end_layout - -\begin_layout LyX-Code -Searching Both Strands -\end_layout - -\begin_layout LyX-Code - Now for a short aside: scan_for_matches only searched the -\end_layout - -\begin_layout LyX-Code - sequences in the input file; it did not search the opposite -\end_layout - -\begin_layout LyX-Code - strand. - With a pattern of the sort we just used, there is not -\end_layout - -\begin_layout LyX-Code - need o search the opposite strand. - However, it is normally the -\end_layout - -\begin_layout LyX-Code - case that you will wish to search both the sequence and the -\end_layout - -\begin_layout LyX-Code - opposite strand (i.e., the reverse complement of the sequence). -\end_layout - -\begin_layout LyX-Code - To do that, you would just use the "-c" command line. - For example, -\end_layout - -\begin_layout LyX-Code - scan_for_matches -c pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - Hits on the opposite strand will show a beginning location greater -\end_layout - -\begin_layout LyX-Code - than te end location of the match. -\end_layout - -\begin_layout LyX-Code -Defining Pairing Rules and Allowing Mismatches, Insertions, and Deletions -\end_layout - -\begin_layout LyX-Code - Let us stop now and ask "What additional features would one need to -\end_layout - -\begin_layout LyX-Code - really find the kinds of loop structures that characterize tRNAs, -\end_layout - -\begin_layout LyX-Code - rRNAs, and so forth?" I can immediately think of two: -\end_layout - -\begin_layout LyX-Code - a) you will need to be able to allow non-standard pairings -\end_layout - -\begin_layout LyX-Code - (those other than G-C and A-U), and -\end_layout - -\begin_layout LyX-Code - b) you will need to be able to tolerate some number of -\end_layout - -\begin_layout LyX-Code - mismatches and bulges. -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - Let me first show you how to handle non-standard "rules for -\end_layout - -\begin_layout LyX-Code - pairing in reverse complements". - Consider the following pattern, -\end_layout - -\begin_layout LyX-Code - which I show as two line (you may use as many lines as you like in -\end_layout - -\begin_layout LyX-Code - forming a pattern, although you can only break a pattern at points -\end_layout - -\begin_layout LyX-Code - where space would be legal): -\end_layout - -\begin_layout LyX-Code - r1={au,ua,gc,cg,gu,ug,ga,ag} -\end_layout - -\begin_layout LyX-Code - p1=2...3 0...4 p2=2...5 1...5 r1~p2 0...4 ~p1 -\end_layout - -\begin_layout LyX-Code - The first "pattern unit" does not actually match anything; rather, -\end_layout - -\begin_layout LyX-Code - it defines a "pairing rule" in which standard pairings are -\end_layout - -\begin_layout LyX-Code - allowed, as well as G-A and A-G (in case you wondered, Us and Ts -\end_layout - -\begin_layout LyX-Code - and upper and lower case can be used interchangably; for example -\end_layout - -\begin_layout LyX-Code - r1={AT,UA,gc,cg} could be used to define the "standard rule" for -\end_layout - -\begin_layout LyX-Code - pairings). - The second line consists of six pattern units which -\end_layout - -\begin_layout LyX-Code - may be interpreted as follows: -\end_layout - -\begin_layout LyX-Code - p1=2...3 match 2 or 3 characters (call it p1) -\end_layout - -\begin_layout LyX-Code - 0...4 match 0 to 4 characters -\end_layout - -\begin_layout LyX-Code - p2=2...5 match 2 to 5 characters (call it p2) -\end_layout - -\begin_layout LyX-Code - 1...5 match 1 to 5 characters -\end_layout - -\begin_layout LyX-Code - r1~p2 match the reverse complement of p2, -\end_layout - -\begin_layout LyX-Code - allowing G-A and A-G pairs -\end_layout - -\begin_layout LyX-Code - 0...4 match 0 to 4 characters -\end_layout - -\begin_layout LyX-Code - ~p1 match the reverse complement of p1 -\end_layout - -\begin_layout LyX-Code - allowing only G-C, C-G, A-T, and T-A pairs -\end_layout - -\begin_layout LyX-Code - Thus, r1~p2 means "match the reverse complement of p2 using rule r1". -\end_layout - -\begin_layout LyX-Code - Now let us consider the issue of tolerating mismatches and bulges. -\end_layout - -\begin_layout LyX-Code - You may add a "qualifier" to the pattern unit that gives the -\end_layout - -\begin_layout LyX-Code - tolerable number of "mismatches, deletions, and insertions". -\end_layout - -\begin_layout LyX-Code - Thus, -\end_layout - -\begin_layout LyX-Code - p1=10...10 3...8 ~p1[1,2,1] -\end_layout - -\begin_layout LyX-Code - means that the third pattern unit must match 10 characters, -\end_layout - -\begin_layout LyX-Code - allowing one "mismatch" (a pairing other than G-C, C-G, A-T, or -\end_layout - -\begin_layout LyX-Code - T-A), two deletions (a deletion is a character that occurs in p1, -\end_layout - -\begin_layout LyX-Code - but has been "deleted" from the string matched by ~p1), and one -\end_layout - -\begin_layout LyX-Code - insertion (an "insertion" is a character that occurs in the string -\end_layout - -\begin_layout LyX-Code - matched by ~p1, but not for which no corresponding character -\end_layout - -\begin_layout LyX-Code - occurs in p1). - In this case, the pattern would match -\end_layout - -\begin_layout LyX-Code - ACGTACGTAC GGGGGGGG GCGTTACCT -\end_layout - -\begin_layout LyX-Code - which is, you must admit, a fairly weak loop. - It is common to -\end_layout - -\begin_layout LyX-Code - allow mismatches, but you will find yourself using insertions and -\end_layout - -\begin_layout LyX-Code - deletions much more rarely. - In any event, you should note that -\end_layout - -\begin_layout LyX-Code - allowing mismatches, insertions, and deletions does force the -\end_layout - -\begin_layout LyX-Code - program to try many additional possible pairings, so it does slow -\end_layout - -\begin_layout LyX-Code - things down a bit. -\end_layout - -\begin_layout LyX-Code -How Patterns Are Matched -\end_layout - -\begin_layout LyX-Code - Now is as good a time as any to discuss the basic flow of control -\end_layout - -\begin_layout LyX-Code - when matching patterns. - Recall that a "pattern" is a sequence of -\end_layout - -\begin_layout LyX-Code - "pattern units". - Suppose that the pattern units were -\end_layout - -\begin_layout LyX-Code - u1 u2 u3 u4 ... - un -\end_layout - -\begin_layout LyX-Code - The scan of a sequence S begins by setting the current position -\end_layout - -\begin_layout LyX-Code - to 1. - Then, an attempt is made to match u1 starting at the -\end_layout - -\begin_layout LyX-Code - current position. - Each attempt to match a pattern unit can -\end_layout - -\begin_layout LyX-Code - succeed or fail. - If it succeeds, then an attempt is made to match -\end_layout - -\begin_layout LyX-Code - the next unit. - If it fails, then an attempt is made to find an -\end_layout - -\begin_layout LyX-Code - alternative match for the immediately preceding pattern unit. - If -\end_layout - -\begin_layout LyX-Code - this succeeds, then we proceed forward again to the next unit. - If -\end_layout - -\begin_layout LyX-Code - it fails we go back to the preceding unit. - This process is called -\end_layout - -\begin_layout LyX-Code - "backtracking". - If there are no previous units, then the current -\end_layout - -\begin_layout LyX-Code - position is incremented by one, and everything starts again. - This -\end_layout - -\begin_layout LyX-Code - proceeds until either the current position goes past the end of -\end_layout - -\begin_layout LyX-Code - the sequence or all of the pattern units succeed. - On success, -\end_layout - -\begin_layout LyX-Code - scan_for_matches reports the "hit", the current position is set -\end_layout - -\begin_layout LyX-Code - just past the hit, and an attempt is made to find another hit. -\end_layout - -\begin_layout LyX-Code - If you wish to limit the scan to simply finding a maximum of, say, -\end_layout - -\begin_layout LyX-Code - 10 hits, you can use the -n option (-n 10 would set the limit to -\end_layout - -\begin_layout LyX-Code - 10 reported hits). - For example, -\end_layout - -\begin_layout LyX-Code - scan_for_matches -c -n 1 pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - would search for just the first hit (and would stop searching the -\end_layout - -\begin_layout LyX-Code - current sequences or any that follow in the input file). -\end_layout - -\begin_layout LyX-Code -Searching for repeats: -\end_layout - -\begin_layout LyX-Code - In the last section, I discussed almost all of the details -\end_layout - -\begin_layout LyX-Code - required to allow you to look for repeats. - Consider the following -\end_layout - -\begin_layout LyX-Code - set of patterns: -\end_layout - -\begin_layout LyX-Code - p1=6...6 3...8 p1 (find exact 6 character repeat separated -\end_layout - -\begin_layout LyX-Code - by to 8 characters) -\end_layout - -\begin_layout LyX-Code - p1=6...6 3..8 p1[1,0,0] (allow one mismatch) -\end_layout - -\begin_layout LyX-Code - p1=3...3 p1[1,0,0] p1[1,0,0] p1[1,0,0] -\end_layout - -\begin_layout LyX-Code - (match 12 characters that are the remains -\end_layout - -\begin_layout LyX-Code - of a 3-character sequence occurring 4 times) -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - p1=4...8 0...3 p2=6...8 p1 0...3 p2 -\end_layout - -\begin_layout LyX-Code - (This would match things like -\end_layout - -\begin_layout LyX-Code - ATCT G TCTTT ATCT TG TCTTT -\end_layout - -\begin_layout LyX-Code - ) -\end_layout - -\begin_layout LyX-Code -Searching for particular sequences: -\end_layout - -\begin_layout LyX-Code - Occasionally, one wishes to match a specific, known sequence. -\end_layout - -\begin_layout LyX-Code - In such a case, you can just give the sequence (along with an -\end_layout - -\begin_layout LyX-Code - optional statement of the allowable mismatches, insertions, and -\end_layout - -\begin_layout LyX-Code - deletions). - Thus, -\end_layout - -\begin_layout LyX-Code - p1=6...8 GAGA ~p1 (match a hairpin with GAGA as the loop) -\end_layout - -\begin_layout LyX-Code - RRRRYYYY (match 4 purines followed by 4 pyrimidines) -\end_layout - -\begin_layout LyX-Code - TATAA[1,0,0] (match TATAA, allowing 1 mismatch) -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code -Matches against a "weight matrix": -\end_layout - -\begin_layout LyX-Code - I will conclude my examples of the types of pattern units -\end_layout - -\begin_layout LyX-Code - available for matching against nucleotide sequences by discussing a -\end_layout - -\begin_layout LyX-Code - crude implemetation of matching using a "weight matrix". - While I -\end_layout - -\begin_layout LyX-Code - am less than overwhelmed with the syntax that I chose, I think that -\end_layout - -\begin_layout LyX-Code - the reader should be aware that I was thinking of generating -\end_layout - -\begin_layout LyX-Code - patterns containing such pattern units automatically from -\end_layout - -\begin_layout LyX-Code - alignments (and did not really plan on typing such things in by -\end_layout - -\begin_layout LyX-Code - hand very often). - Anyway, suppose that you wanted to match a -\end_layout - -\begin_layout LyX-Code - sequence of eight characters. - The "consensus" of these eight -\end_layout - -\begin_layout LyX-Code - characters is GRCACCGS, but the actual "frequencies of occurrence" -\end_layout - -\begin_layout LyX-Code - are given in the matrix below. - Thus, the first character is an A -\end_layout - -\begin_layout LyX-Code - 16% the time and a G 84% of the time. - The second is an A 57% of -\end_layout - -\begin_layout LyX-Code - the time, a C 10% of the time, a G 29% of the time, and a T 4% of -\end_layout - -\begin_layout LyX-Code - the time. - -\end_layout - -\begin_layout LyX-Code - C1 C2 C3 C4 C5 C6 C7 C8 -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - A 16 57 0 95 0 18 0 0 -\end_layout - -\begin_layout LyX-Code - C 0 10 80 0 100 60 0 50 -\end_layout - -\begin_layout LyX-Code - G 84 29 0 0 0 20 100 50 -\end_layout - -\begin_layout LyX-Code - T 0 4 20 5 0 2 0 0 -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - One could use the following pattern unit to search for inexact -\end_layout - -\begin_layout LyX-Code - matches related to such a "weight matrix": -\end_layout - -\begin_layout LyX-Code - {(16,0,84,0),(57,10,29,4),(0,80,0,20),(95,0,0,5), -\end_layout - -\begin_layout LyX-Code - (0,100,0,0),(18,60,20,2),(0,0,100,0),(0,50,50,0)} > 450 -\end_layout - -\begin_layout LyX-Code - This pattern unit will attempt to match exactly eight characters. -\end_layout - -\begin_layout LyX-Code - For each character in the sequence, the entry in the corresponding -\end_layout - -\begin_layout LyX-Code - tuple is added to an accumulated sum. - If the sum is greater than -\end_layout - -\begin_layout LyX-Code - 450, the match succeeds; else it fails. -\end_layout - -\begin_layout LyX-Code - Recently, this feature was upgraded to allow ranges. - Thus, -\end_layout - -\begin_layout LyX-Code - 600 > {(16,0,84,0),(57,10,29,4),(0,80,0,20),(95,0,0,5), -\end_layout - -\begin_layout LyX-Code - (0,100,0,0),(18,60,20,2),(0,0,100,0),(0,50,50,0)} > 450 -\end_layout - -\begin_layout LyX-Code - will work, as well. -\end_layout - -\begin_layout LyX-Code -Allowing Alternatives: -\end_layout - -\begin_layout LyX-Code - Very occasionally, you may wish to allow alternative pattern units -\end_layout - -\begin_layout LyX-Code - (i.e., "match either A or B"). - You can do this using something -\end_layout - -\begin_layout LyX-Code - like -\end_layout - -\begin_layout LyX-Code - ( GAGA | GCGCA) -\end_layout - -\begin_layout LyX-Code - which says "match either GAGA or GCGCA". - You may take -\end_layout - -\begin_layout LyX-Code - alternatives of a list of pattern units, for example -\end_layout - -\begin_layout LyX-Code - (p1=3...3 3...8 ~p1 | p1=5...5 4...4 ~p1 GGG) -\end_layout - -\begin_layout LyX-Code - would match one of two sequences of pattern units. - There is one -\end_layout - -\begin_layout LyX-Code - clumsy aspect of the syntax: to match a list of alternatives, you -\end_layout - -\begin_layout LyX-Code - need to fully the request. - Thus, -\end_layout - -\begin_layout LyX-Code - (GAGA | (GCGCA | TTCGA)) -\end_layout - -\begin_layout LyX-Code - would be needed to try the three alternatives. -\end_layout - -\begin_layout LyX-Code -One Minor Extension -\end_layout - -\begin_layout LyX-Code - Sometimes a pattern will contain a sequence of distinct ranges, -\end_layout - -\begin_layout LyX-Code - and you might wish to limit the sum of the lengths of the matched -\end_layout - -\begin_layout LyX-Code - subsequences. - For example, suppose that you basically wanted to -\end_layout - -\begin_layout LyX-Code - match something like -\end_layout - -\begin_layout LyX-Code - ARRYYTT p1=0...5 GCA[1,0,0] p2=1...6 ~p1 4...8 ~p2 p3=4...10 CCT -\end_layout - -\begin_layout LyX-Code - but that the sum of the lengths of p1, p2, and p3 must not exceed -\end_layout - -\begin_layout LyX-Code - eight characters. - To do this, you could add -\end_layout - -\begin_layout LyX-Code - length(p1+p2+p3) < 9 -\end_layout - -\begin_layout LyX-Code - as the last pattern unit. - It will just succeed or fail (but does -\end_layout - -\begin_layout LyX-Code - not actually match any characters in the sequence). -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code -Matching Protein Sequences -\end_layout - -\begin_layout LyX-Code - Suppose that the input file contains protein sequences. - In this -\end_layout - -\begin_layout LyX-Code - case, you must invoke scan_for_matches with the "-p" option. - You -\end_layout - -\begin_layout LyX-Code - cannot use aspects of the language that relate directly to -\end_layout - -\begin_layout LyX-Code - nucleotide sequences (e.g., the -c command line option or pattern -\end_layout - -\begin_layout LyX-Code - constructs referring to the reverse complement of a previously -\end_layout - -\begin_layout LyX-Code - matched unit). - -\end_layout - -\begin_layout LyX-Code - You also have two additional constructs that allow you to match -\end_layout - -\begin_layout LyX-Code - either "one of a set of amino acids" or "any amino acid other than -\end_layout - -\begin_layout LyX-Code - those a given set". - For example, -\end_layout - -\begin_layout LyX-Code - p1=0...4 any(HQD) 1...3 notany(HK) p1 -\end_layout - -\begin_layout LyX-Code - would successfully match a string like -\end_layout - -\begin_layout LyX-Code - YWV D AA C YWV -\end_layout - -\begin_layout LyX-Code -Using the show_hits Utility -\end_layout - -\begin_layout LyX-Code - When viewing a large set of complex matches, you might find it -\end_layout - -\begin_layout LyX-Code - convenient to post-process the scan_for_matches output to get a -\end_layout - -\begin_layout LyX-Code - more readable version. - We provide a simple post-processor called -\end_layout - -\begin_layout LyX-Code - "show_hits". - To see its effect, just pipe the output of a -\end_layout - -\begin_layout LyX-Code - scan_for_matches into show_hits: -\end_layout - -\begin_layout LyX-Code - Normal Output: -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches -c pat_file < tmp -\end_layout - -\begin_layout LyX-Code - >tst1:[1,28] -\end_layout - -\begin_layout LyX-Code - gtacguaacc ggttaac cgguuacgtac -\end_layout - -\begin_layout LyX-Code - >tst1:[28,1] -\end_layout - -\begin_layout LyX-Code - gtacgtaacc ggttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - >tst2:[2,31] -\end_layout - -\begin_layout LyX-Code - CGTACGUAAC C GGTTAACC GGUUACGTACG -\end_layout - -\begin_layout LyX-Code - >tst2:[31,2] -\end_layout - -\begin_layout LyX-Code - CGTACGTAAC C GGTTAACC GGTTACGTACG -\end_layout - -\begin_layout LyX-Code - >tst3:[3,32] -\end_layout - -\begin_layout LyX-Code - gtacguaacc g gttaactt cgguuacgtac -\end_layout - -\begin_layout LyX-Code - >tst3:[32,3] -\end_layout - -\begin_layout LyX-Code - gtacgtaacc g aagttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - Piped Through show_hits: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches -c pat_file < tmp | show_hits -\end_layout - -\begin_layout LyX-Code - tst1:[1,28]: gtacguaacc ggttaac cgguuacgtac -\end_layout - -\begin_layout LyX-Code - tst1:[28,1]: gtacgtaacc ggttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - tst2:[2,31]: CGTACGUAAC C GGTTAACC GGUUACGTACG -\end_layout - -\begin_layout LyX-Code - tst2:[31,2]: CGTACGTAAC C GGTTAACC GGTTACGTACG -\end_layout - -\begin_layout LyX-Code - tst3:[3,32]: gtacguaacc g gttaactt cgguuacgtac -\end_layout - -\begin_layout LyX-Code - tst3:[32,3]: gtacgtaacc g aagttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - clone% -\end_layout - -\begin_layout LyX-Code - Optionally, you can specify which of the "fields" in the matches -\end_layout - -\begin_layout LyX-Code - you wish to sort on, and show_hits will sort them. - The field -\end_layout - -\begin_layout LyX-Code - numbers start with 0. - So, you might get something like -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches -c pat_file < tmp | show_hits 2 1 -\end_layout - -\begin_layout LyX-Code - tst2:[2,31]: CGTACGUAAC C GGTTAACC GGUUACGTACG -\end_layout - -\begin_layout LyX-Code - tst2:[31,2]: CGTACGTAAC C GGTTAACC GGTTACGTACG -\end_layout - -\begin_layout LyX-Code - tst3:[32,3]: gtacgtaacc g aagttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - tst1:[1,28]: gtacguaacc ggttaac cgguuacgtac -\end_layout - -\begin_layout LyX-Code - tst1:[28,1]: gtacgtaacc ggttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - tst3:[3,32]: gtacguaacc g gttaactt cgguuacgtac -\end_layout - -\begin_layout LyX-Code - clone% -\end_layout - -\begin_layout LyX-Code - In this case, the hits have been sorted on fields 2 and 1 (that is, -\end_layout - -\begin_layout LyX-Code - the third and second matched subfields). -\end_layout - -\begin_layout LyX-Code - show_hits is just one possible little post-processor, and you -\end_layout - -\begin_layout LyX-Code - might well wish to write a customized one for yourself. -\end_layout - -\begin_layout LyX-Code -Reducing the Cost of a Search -\end_layout - -\begin_layout LyX-Code - The scan_for_matches utility uses a fairly simple search, and may -\end_layout - -\begin_layout LyX-Code - consume large amounts of CPU time for complex patterns. - Someday, -\end_layout - -\begin_layout LyX-Code - I may decide to optimize the code. - However, until then, let me -\end_layout - -\begin_layout LyX-Code - mention one useful technique. - -\end_layout - -\begin_layout LyX-Code - When you have a complex pattern that includes a number of varying -\end_layout - -\begin_layout LyX-Code - ranges, imprecise matches, and so forth, it is useful to -\end_layout - -\begin_layout LyX-Code - "pipeline" matches. - That is, form a simpler pattern that can be -\end_layout - -\begin_layout LyX-Code - used to scan through a large database extracting sections that -\end_layout - -\begin_layout LyX-Code - might be matched by the more complex pattern. - Let me illustrate -\end_layout - -\begin_layout LyX-Code - with a short example. - Suppose that you really wished to match the -\end_layout - -\begin_layout LyX-Code - pattern -\end_layout - -\begin_layout LyX-Code - p1=3...5 0...8 ~p1[1,1,0] p2=6...7 3...6 AGC 3...5 RYGC ~p2[1,0,0] -\end_layout - -\begin_layout LyX-Code - In this case, the pattern units AGC 3...5 RYGC can be used to rapidly -\end_layout - -\begin_layout LyX-Code - constrain the overall search. - You can preprocess the overall -\end_layout - -\begin_layout LyX-Code - database using the pattern: -\end_layout - -\begin_layout LyX-Code - 31...31 AGC 3...5 RYGC 7...7 -\end_layout - -\begin_layout LyX-Code - Put the complex pattern in pat_file1 and the simpler pattern in -\end_layout - -\begin_layout LyX-Code - pat_file2. - Then use, -\end_layout - -\begin_layout LyX-Code - scan_for_matches -c pat_file2 < nucleotide_database | -\end_layout - -\begin_layout LyX-Code - scan_for_matches pat_file1 -\end_layout - -\begin_layout LyX-Code - The output will show things like -\end_layout - -\begin_layout LyX-Code - >seqid:[232,280][2,47] -\end_layout - -\begin_layout LyX-Code - matches pieces -\end_layout - -\begin_layout LyX-Code - Then, the actual section of the sequence that was matched can be -\end_layout - -\begin_layout LyX-Code - easily computed as [233,278] (remember, the positions start from -\end_layout - -\begin_layout LyX-Code - 1, not 0). -\end_layout - -\begin_layout LyX-Code - Let me finally add, you should do a few short experiments to see -\end_layout - -\begin_layout LyX-Code - whether or not such pipelining actually improves performance -- it -\end_layout - -\begin_layout LyX-Code - is not always obvious where the time is going, and I have -\end_layout - -\begin_layout LyX-Code - sometimes found that the added complexity of pipelining actually -\end_layout - -\begin_layout LyX-Code - slowed things up. - It gets its best improvements when there are -\end_layout - -\begin_layout LyX-Code - exact matches of more than just a few characters that can be -\end_layout - -\begin_layout LyX-Code - rapidly used to eliminate large sections of the database. -\end_layout - -\begin_layout LyX-Code -============= -\end_layout - -\begin_layout LyX-Code -Additions: -\end_layout - -\begin_layout LyX-Code -Feb 9, 1995: the pattern units ^ and $ now work as in normal regular -\end_layout - -\begin_layout LyX-Code - expressions. - That is -\end_layout - -\begin_layout LyX-Code - TTF $ -\end_layout - -\begin_layout LyX-Code - matches only TTF at the end of the string and -\end_layout - -\begin_layout LyX-Code - ^ TTF -\end_layout - -\begin_layout LyX-Code - matches only an initial TTF -\end_layout - -\begin_layout LyX-Code - The pattern unit -\end_layout - -\begin_layout LyX-Code - : -\end_layout - -\begin_layout Standard -\begin_inset Box Frameless -position "t" -hor_pos "c" -has_inner_box 1 -inner_pos "t" -use_parbox 0 -width "100col%" -special "none" -height "1in" -height_special "totalheight" -status open - -\begin_layout LyX-Code - -\size scriptsize -Program name: read_fasta -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Author: Martin Asser Hansen - Copyright (C) - All rights reserved -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Contact: mail@maasha.dk -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Date: August 2007 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -License: GNU General Public License version 2 (http://www.gnu.org/copyleft/ -gpl.html) -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Description: Read FASTA entries. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Usage: read_fasta [options] -i -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Options: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-i | --data_in=] - Comma separated list of files - to read. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-n | --num=] - Limit number of records to read. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-I | --stream_in=] - Read input stream from file - - Default=STDIN -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-O | --stream_out=] - Write output stream to file - - Default=STDOUT -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Examples: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i test.fna - Read FASTA entries from file. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i test1.fna,test2,fna - Read FASTA entries from files. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i '*.fna' - Read FASTA entries from files. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i test.fna -n 10 - Read first 10 FASTA entries from - file. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Section -The Data Stream -\end_layout - -\begin_layout Subsection -How to read the data stream from file? -\begin_inset LatexCommand label -name "sub:How-to-read-stream" - -\end_inset - - -\end_layout - -\begin_layout Standard -You want to read a data stream that you previously have saved to file in - biotools format. - This can be done implicetly or explicitly. - The implicit way uses the 'stdout' stream of the Unix terminal: -\end_layout - -\begin_layout LyX-Code -cat | -\end_layout - -\begin_layout Standard -cat is the Unix command that reads a file and output the result to 'stdout' - --- which in this case is piped to any biotool represented by the . - It is also possible to read the data stream using '<' to direct the 'stdout' - stream into the biotool like this: -\end_layout - -\begin_layout LyX-Code - < -\end_layout - -\begin_layout Standard -However, that will not work if you pipe more biotools together. - Then it is much safer to read the stream from a file explicitly like this: -\end_layout - -\begin_layout LyX-Code - --stream_in= -\end_layout - -\begin_layout Standard -Here the filename is explicetly given to the biotool with - the switch -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_in. - This switch works with all biotools. - It is also possible to read in data from multiple sources by repeating - the explicit read step: -\end_layout - -\begin_layout LyX-Code - --stream_in= | --stream_in= -\end_layout - -\begin_layout Subsection -How to write the data stream to file? -\begin_inset LatexCommand label -name "sub:How-to-write-stream" - -\end_inset - - -\end_layout - -\begin_layout Standard -In order to save the output stream from a biotool to file, so you can read - in the stream again at a later time, you can do one of two things: -\end_layout - -\begin_layout LyX-Code - > -\end_layout - -\begin_layout Standard -All, the biotools write the data stream to 'stdout' by default which can - be written to a file by redirecting 'stdout' to file using '>' , however, - if one of the biotools for writing other formats is used then the both - the biotools records as well as the result output will go to 'stdout' in - a mixture causing havock! To avoid this you must use the switch -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_out that explictly tells the biotool to write the output stream to - file: -\end_layout - -\begin_layout LyX-Code - --stream_out= -\end_layout - -\begin_layout Standard -The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_out switch works with all biotools. -\end_layout - -\begin_layout Subsection -How to terminate the data stream? -\end_layout - -\begin_layout Standard -The data stream is never stops unless the user want to save the stream or - by supplying the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch that will terminate the stream: -\end_layout - -\begin_layout LyX-Code - --no_stream -\end_layout - -\begin_layout Standard -The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch only works with those biotools where it makes sense that - the user might want to terminale the data stream, -\emph on -i.e -\emph default -. - after an analysis step where the user wants to output the result, but not - the data stream. -\end_layout - -\begin_layout Subsection -How to write my results to file? -\begin_inset LatexCommand label -name "sub:How-to-write-result" - -\end_inset - - -\end_layout - -\begin_layout Standard -Saving the result of an analysis to file can be done implicitly or explicitly. - The implicit way: -\end_layout - -\begin_layout LyX-Code - --no_stream > -\end_layout - -\begin_layout Standard -If you use '>' to redirect 'stdout' to file then it is important to use - the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch to avoid writing a mix of biotools records and result to - the same file causing havock. - The safe way is to use the -\begin_inset ERT -status open - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out switch which explicetly tells the biotool to write the result - to a given file: -\end_layout - -\begin_layout LyX-Code - --result_out= -\end_layout - -\begin_layout Standard -Using the above method will not terminate the stream, so it is possible - to pipe that into another biotool generating different results: -\end_layout - -\begin_layout LyX-Code - --result_out= | --result_out= -\end_layout - -\begin_layout Standard -And still the data stream will continue unless terminated with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream: -\end_layout - -\begin_layout LyX-Code - --result_out= --no_stream -\end_layout - -\begin_layout Standard -Or written to file using implicitly or explicity -\begin_inset LatexCommand eqref -reference "sub:How-to-write-result" - -\end_inset - -. - The explicit way: -\end_layout - -\begin_layout LyX-Code - --result_out= --stream_out= -\end_layout - -\begin_layout Subsection -How to read data from multiple sources? -\end_layout - -\begin_layout Standard -To read multiple data sources, with the same type or different type of data - do: -\end_layout - -\begin_layout LyX-Code - --data_in= | --data_in= -\end_layout - -\begin_layout Standard -where type is the data type a specific biotool reads. -\end_layout - -\begin_layout Section -Reading input -\end_layout - -\begin_layout Subsection -How to read biotools input? -\end_layout - -\begin_layout Standard -See -\begin_inset LatexCommand eqref -reference "sub:How-to-read-stream" - -\end_inset - -. -\end_layout - -\begin_layout Subsection -How to read in data? -\end_layout - -\begin_layout Standard -Data in different formats can be read with the appropriate biotool for that - format. - The biotools are typicalled named 'read_' such as -\series bold -read_fasta -\series default -, -\series bold -read_bed -\series default -, -\series bold -read_tab -\series default -, etc., and all behave in a similar manner. - Data can be read by supplying the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -data_in switch and a file name to the file containing the data: -\end_layout - -\begin_layout LyX-Code - --data_in= -\end_layout - -\begin_layout Standard -It is also possible to read in a saved biotools stream (see -\begin_inset LatexCommand ref -reference "sub:How-to-read-stream" - -\end_inset - -) as well as reading data in one go: -\end_layout - -\begin_layout LyX-Code - --stream_in= --data_in= -\end_layout - -\begin_layout Standard -If you want to read data from several files you can do this: -\end_layout - -\begin_layout LyX-Code - --data_in= | --data_in= -\end_layout - -\begin_layout Standard -If you have several data files you can read in all explicitly with a comma - separated list: -\end_layout - -\begin_layout LyX-Code - --data_in=file1,file2,file3 -\end_layout - -\begin_layout Standard -And it is also possible to use file globbing -\begin_inset Foot -status open - -\begin_layout Standard -using the short option will only work if you quote the argument -i '*.fna' -\end_layout - -\end_inset - -: -\end_layout - -\begin_layout LyX-Code - --data_in=*.fna -\end_layout - -\begin_layout Standard -Or in a combination: -\end_layout - -\begin_layout LyX-Code - --data_in=file1,/dir/*.fna -\end_layout - -\begin_layout Standard -Finally, it is possible to read in data in different formats using the appropria -te biotool for each format: -\end_layout - -\begin_layout LyX-Code - --data_in= | --data_in= ... -\end_layout - -\begin_layout Subsection -How to read FASTA input? -\end_layout - -\begin_layout Standard -Sequences in FASTA format can be read explicitly using -\series bold -read_fasta -\series default -: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= -\end_layout - -\begin_layout Subsection -How to read alignment input? -\end_layout - -\begin_layout Standard -If your alignment if FASTA formatted then you can -\series bold -read_align -\series default -. - It is also possible to use -\series bold -read_fasta -\series default - since the data is FASTA formatted, however, with -\series bold -read_fasta -\series default - the key ALIGN will be omitted. - The ALIGN key is used to determine which sequences belong to what alignment - which is required for -\series bold -write_align -\series default -. -\end_layout - -\begin_layout LyX-Code -read_align --data_in= -\end_layout - -\begin_layout Subsection -How to read tabular input? -\begin_inset LatexCommand label -name "sub:How-to-read-table" - -\end_inset - - -\end_layout - -\begin_layout Standard -Tabular input can be read with -\series bold -read_tab -\series default - which will read in all rows and chosen columns (separated by a given delimter) - from a table in text format. -\end_layout - -\begin_layout Standard -The table below: -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Tabular - - - - - - - -\begin_inset Text - -\begin_layout Standard -Human -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -ATACGTCAG -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -23524 -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Dog -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -AGCATGAC -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -2442 -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Mouse -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -GACTG -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -234 -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Cat -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -AAATGCA -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -2342 -\end_layout - -\end_inset - - - - -\end_inset - - -\end_layout - -\begin_layout Standard -Can be read using the command: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= -\end_layout - -\begin_layout Standard -Which will result in four records, one for each row, where the keys V0, - V1, V2 are the default keys for the organism, sequence, and count, respectively. - It is possible to select a subset of colums to read by using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -cols switch which takes a comma separated list of columns numbers (first - column is designated 0) as argument. - So to read in only the sequence and the count so that the count comes before - the sequence do: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= --cols=2,1 -\end_layout - -\begin_layout Standard -It is also possible to name the columns with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys switch: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= --cols=2,1 --keys=COUNT,SEQ -\end_layout - -\begin_layout Subsection -How to read BED input? -\end_layout - -\begin_layout Standard -The BED (Browser Extensible Data -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://genome.ucsc.edu/FAQ/FAQformat" - -\end_inset - - -\end_layout - -\end_inset - -) format is a tabular format for data pertaining to one of the Eukaryotic - genomes in the UCSC genome brower -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://genome.ucsc.edu/" - -\end_inset - - -\end_layout - -\end_inset - -. - The BED format consists of up to 12 columns, where the first three are - mandatory CHR, CHR_BEG, and CHR_END. - The mandatory columns and any of the optional columns can all be read in - easily with the -\series bold -read_bed -\series default - biotool. -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= -\end_layout - -\begin_layout Standard -It is also possible to read the BED file with -\series bold -read_tab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-read-table" - -\end_inset - -), however, that will be more cumbersome because you need to specify the - keys: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= --keys=CHR,CHR_BEG,CHR_END ... -\end_layout - -\begin_layout Subsection -How to read PSL input? -\end_layout - -\begin_layout Standard -The PSL format is the output from BLAT and contains 21 mandatory fields - that can be read with -\series bold -read_psl -\series default -: -\end_layout - -\begin_layout LyX-Code -read_psl --data_in= -\end_layout - -\begin_layout Section -Writing output -\end_layout - -\begin_layout Standard -All result output can be written explicitly to file using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out switch which all result generating biotools have. - It is also possible to write the result to file implicetly by directing - 'stdout' to file using '>', however, that requires the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream swich to prevent a mixture of data stream and results in the file. - The explicit (and safe) way: -\end_layout - -\begin_layout LyX-Code -... - | --result_out= -\end_layout - -\begin_layout Standard -The implicit way: -\end_layout - -\begin_layout LyX-Code -... - | --no_stream > -\end_layout - -\begin_layout Subsection -How to write biotools output? -\end_layout - -\begin_layout Standard -See -\begin_inset LatexCommand eqref -reference "sub:How-to-write-stream" - -\end_inset - -. -\end_layout - -\begin_layout Subsection -How to write FASTA output? -\begin_inset LatexCommand label -name "sub:How-to-write-fasta" - -\end_inset - - -\end_layout - -\begin_layout Standard -FASTA output can be written with -\series bold -write_fasta -\series default -. -\end_layout - -\begin_layout LyX-Code -... - | write_fasta --result_out= -\end_layout - -\begin_layout Standard -It is also possible to wrap the sequences to a given width using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -wrap switch allthough wrapping of sequence is generally an evil thing: -\end_layout - -\begin_layout LyX-Code -... - | write_fasta --no_stream --wrap=80 -\end_layout - -\begin_layout Subsection -How to write alignment output? -\begin_inset LatexCommand label -name "sub:How-to-write-alignment" - -\end_inset - - -\end_layout - -\begin_layout Standard -Pretty alignments with ruler -\begin_inset Foot -status collapsed - -\begin_layout Standard -'.' for every 10 residues, ':' for every 50, and '|' for every 100 -\end_layout - -\end_inset - - and consensus sequence -\begin_inset Note Note -status collapsed - -\begin_layout Standard -which reminds me to make that an option. -\end_layout - -\end_inset - - can be created with -\series bold -write_align -\series default -, what also have the optional -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -wrap switch to break the alignment into blocks of a given width: -\end_layout - -\begin_layout LyX-Code -... - | write_align --result_out= --wrap=80 -\end_layout - -\begin_layout Standard -If the number of sequnces in the alignment is 2 then a pairwise alignment - will be output otherwise a multiple alignment. - And if the sequence type, determined automagically, is protein, then residues - and symbols (+,\InsetSpace ~ -:,\InsetSpace ~ -.) will be used to show consensus according to the Blosum62 - matrix. -\end_layout - -\begin_layout Subsection -How to write tabular output? -\begin_inset LatexCommand label -name "sub:How-to-write-tab" - -\end_inset - - -\end_layout - -\begin_layout Standard -Outputting the data stream as a table can be done with -\series bold -write_tab -\series default -, which will write generate one row per record with the values as columns. - If you supply the optional -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -comment switch, when the first row in the table will be a 'comment' line - prefixed with a '#': -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --comment -\end_layout - -\begin_layout Standard -You can also change the delimiter from the default (tab) to -\emph on -e.g. - -\emph default - ',': -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --delimit=',' -\end_layout - -\begin_layout Standard -If you want the values output in a specific order you have to supply a comma - separated list using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys switch that will print only those keys in that order: -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --keys=SEQ_NAME,COUNT -\end_layout - -\begin_layout Standard -Alternatively, if you have some keys that you don't want in the tabular - output, use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_keys switch. - So to print all keys except SEQ and SEQ_TYPE do: -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --no_keys=SEQ,SEQ_TYPE -\end_layout - -\begin_layout Standard -Finally, if you have a stream containing a mix of different records types, - -\emph on -e.g. - -\emph default - records with sequences and records with matches, then you can use -\series bold -write_tab -\series default - to output all the records in tabluar format, however, the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -comment, -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys, and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_keys switches will only respond to records of the first type encountered. - The reason is that outputting mixed records is probably not what you want - anyway, and you should remove all the unwanted records from the stream - before outputting the table: -\series bold -grab -\series default - is your friend (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-grab" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to write a BED output? -\begin_inset LatexCommand label -name "sub:How-to-write-BED" - -\end_inset - - -\end_layout - -\begin_layout Standard -Data in BED format can be output if the records contain the mandatory keys - CHR, CHR_BEG, and CHR_END using -\series bold -write_bed -\series default -. - If the optional keys are also present, they will be output as well: -\end_layout - -\begin_layout LyX-Code -write_bed --result_out= -\end_layout - -\begin_layout Subsection -How to write PSL output? -\begin_inset LatexCommand label -name "sub:How-to-write-PSL" - -\end_inset - - -\end_layout - -\begin_layout Standard -Data in PSL format can be output using -\series bold -write_psl: -\end_layout - -\begin_layout LyX-Code -write_psl --result_out= -\end_layout - -\begin_layout Section -Manipulating Records -\end_layout - -\begin_layout Subsection -How to select a few records? -\begin_inset LatexCommand label -name "sub:How-to-select-a-few-records" - -\end_inset - - -\end_layout - -\begin_layout Standard -To quickly get an overview of your data you can limit the data stream to - show a few records. - This also very useful to test the pipeline with a few records if you are - setting up a complex analysis using several biotools. - That way you can inspect that all goes well before analyzing and waiting - for the full data set. - All of the read_ biotools have the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num switch which will take a number as argument and only that number of - records will be read. - So to read in the first 10 FASTA entries from a file: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna --num=10 -\end_layout - -\begin_layout Standard -Another way of doing this is to use -\series bold -head_records -\series default - will limit the stream to show the first 10 records (default): -\end_layout - -\begin_layout LyX-Code -... - | head_records -\end_layout - -\begin_layout Standard -Using -\series bold -head_records -\series default - directly after one of the read_ biotools will be a lot slower than - using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num switch with the read_ biotools, however, -\series bold -head_records -\series default - can also be used to limit the output from all the other biotools. - It is also possible to give -\series bold -head_records -\series default - a number of records to show using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num switch. - So to display the first 100 records do: -\end_layout - -\begin_layout LyX-Code -... - | head_records --num=100 -\end_layout - -\begin_layout Subsection -How to select random records? -\begin_inset LatexCommand label -name "sub:How-to-select-random-records" - -\end_inset - - -\end_layout - -\begin_layout Standard -If you want to inspect a number of random records from the stream this can - be done with the -\series bold -random_records -\series default - biotool. - So if you have 1 mio records in the stream and you want to select 1000 - random records do: -\end_layout - -\begin_layout LyX-Code -... - | random_records --num=1000 -\end_layout - -\begin_layout Subsection -How to count all records in the data stream? -\end_layout - -\begin_layout Standard -To count all the records in the data stream use -\series bold -count_records -\series default -, which adds one record (which is not included in the count) to the data - stream. - So to count the number of sequences in a FASTA file you can do this: -\end_layout - -\begin_layout LyX-Code -cat test.fna | read_fasta | count_records --no_stream -\end_layout - -\begin_layout Standard -Which will write the last record containing the count to 'stdout': -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -count_records: 630 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize ---- -\end_layout - -\begin_layout Standard -It is also possible to write the count to file using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out switch. -\end_layout - -\begin_layout Subsection -How to get the length of record values? -\begin_inset LatexCommand label -name "sub:How-to-get-value_length" - -\end_inset - - -\end_layout - -\begin_layout Standard -Use the -\series bold -length_vals -\series default - biotool to get the length of each value for a comma separated list of keys: -\end_layout - -\begin_layout LyX-Code -... - | length_vals --keys=HIT,PATTERN -\end_layout - -\begin_layout Subsection -How to grab specific records? -\begin_inset LatexCommand label -name "sub:How-to-grab" - -\end_inset - - -\end_layout - -\begin_layout Standard -The biotool -\series bold -grab -\series default - is related to the Unix grep and locates records based on matching keys - and/or values using either a pattern, a Perl regex, or a numerical evaluation. - To easily -\series bold -grab -\series default - all records in the stream that has any mentioning of the pattern 'human' - just pipe the data stream through -\series bold -grab -\series default - like this: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human -\end_layout - -\begin_layout Standard -This will search for the pattern 'human' in all keys and all values. - The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern switch takes a comma separated list of patterns, so in order to - match multiple patterns do: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human,mouse -\end_layout - -\begin_layout Standard -It is also possible to use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in switch instead of -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern. - -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in is used to read a file with one pattern per line: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern_in=patterns.txt -\end_layout - -\begin_layout Standard -If you want the opposite result --- to find all records that does not match - the patterns, add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -invert switch, which not only works with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern switch, but also with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -regex and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -eval: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human --invert -\end_layout - -\begin_layout Standard -If you want to search the record keys only, -\emph on -e.g. - -\emph default - to find all records containing the key SEQ you can add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys_only switch. - This will prevent matching of SEQ in any record value, and in fact SEQ - is a not uncommon peptide sequence you could get an unwanted record. - Also, this will give an increase in speed since only the keys are searched: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=SEQ --keys_only -\end_layout - -\begin_layout Standard -However, if you are interested in finding the peptide sequence SEQ and not - the SEQ key, just add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -vals_only switch instead: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=SEQ --vals_only -\end_layout - -\begin_layout Standard -Also, if you want to grab for certain key/value pairs you can supply a comma - separated list of keys whos values will then be searched using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys switch. - This is handy if your records contain large genomic sequences and you dont - want to search the entire sequence for -\emph on -e.g. - -\emph default - the organism name --- it is much faster to tell -\series bold -grab -\series default - which keys to search the value for: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human --keys=SEQ_NAME -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout Standard -It is also possible to invoke flexible matching using regex (regular expressions -) instead of simple pattern matching. - In -\series bold -grab -\series default - the regex engine is Perl based and allows use of different type of wild - cards, alternatives, -\emph on -etc -\emph default - -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://perldoc.perl.org/perlreref.html" - -\end_inset - - -\end_layout - -\end_inset - -. - If you want to -\series bold -grab -\series default - records withs the sequence ATCG or GCTA you can do this: -\end_layout - -\begin_layout LyX-Code -... - | grab --regex='ATCG|GCTA' -\end_layout - -\begin_layout Standard -Or if you want to find sequences beginning with ATCG: -\end_layout - -\begin_layout LyX-Code -... - | grab --regex='^ATCG' -\end_layout - -\begin_layout Standard -You can also use -\series bold -grab -\series default - to locate records that fulfill a numerical property using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -eval switch witch takes an expression in three parts. - The first part is the key that holds the value we want to evaluate, the - second part holds one the six operators: -\end_layout - -\begin_layout Enumerate -Greater than: > -\end_layout - -\begin_layout Enumerate -Greater than or equal to: >= -\end_layout - -\begin_layout Enumerate -Less than: < -\end_layout - -\begin_layout Enumerate -Less than or equal to: <= -\end_layout - -\begin_layout Enumerate -Equal to: = -\end_layout - -\begin_layout Enumerate -Not equal to: != -\end_layout - -\begin_layout Enumerate -String wise equal to: eq -\end_layout - -\begin_layout Enumerate -String wise not equal to: ne -\end_layout - -\begin_layout Standard -And finally comes the number used in the evaluation. - So to -\series bold -grab -\series default - all records with a sequence length greater than 30: -\end_layout - -\begin_layout LyX-Code -... - length_seq | grab --eval='SEQ_LEN > 30' -\end_layout - -\begin_layout Standard -If you want to locate all records containing the pattern 'human' and where - the sequence length is greater that 30, you do this by running the stream - through -\series bold -grab -\series default - twice: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern='human' | length_seq | grab --eval='SEQ_LEN > 30' -\end_layout - -\begin_layout Standard -Finally, it is possible to do fast matching of expressions from a file using - the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -exact switch. - Each of these expressions has to be matched exactly over the entrie length, - which if useful if you have a file with accession numbers, that you want - to locate in the stream: -\end_layout - -\begin_layout LyX-Code -... - | grab --exact acc_no.txt | ... -\end_layout - -\begin_layout Standard -Using -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -exact is much faster than using -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in, because with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -exact the expression has to be complete matches, where -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in looks for subpatterns. -\end_layout - -\begin_layout Standard -NB! To get the best speed performance, use the most restrictive -\series bold -grab -\series default - first. -\end_layout - -\begin_layout Subsection -How to remove keys from records? -\end_layout - -\begin_layout Standard -To remove one or more specific keys from all records in the data stream - use -\series bold -remove_keys -\series default - like this: -\end_layout - -\begin_layout LyX-Code -... - | remove_keys --keys=SEQ,SEQ_NAME -\end_layout - -\begin_layout Standard -In the above example SEQ and SEQ_NAME will be removed from all records if - they exists in these. - If all keys are removed from a record, then the record will be removed. -\end_layout - -\begin_layout Subsection -How to rename keys in records? -\end_layout - -\begin_layout Standard -Sometimes you want to rename a record key, -\emph on -e.g. - -\emph default - if you have read in a two column table with sequence name and sequence - in each column (see -\begin_inset LatexCommand ref -reference "sub:How-to-read-table" - -\end_inset - -) without specifying the key names, then the sequence name will be called - V0 and the sequence V1 as default in the -\series bold -read_tab -\series default - biotool. - To rename the V0 and V1 keys we need to run the stream through -\series bold -rename_keys -\series default - twice (one for each key to rename): -\end_layout - -\begin_layout LyX-Code -... - | rename_keys --keys=V0,SEQ_NAME | rename_keys --keys=V1,SEQ -\end_layout - -\begin_layout Standard -The first instance of -\series bold -rename_keys -\series default - replaces all the V0 keys with SEQ_NAME, and the second instance of -\series bold -rename_keys -\series default - replaces all the V1 keys with SEQ. - -\emph on -Et viola -\emph default - the data can now be used in the biotools that requires these keys. -\end_layout - -\begin_layout Section -Manipulating Sequences -\end_layout - -\begin_layout Subsection -How to get sequence lengths? -\end_layout - -\begin_layout Standard -The length for sequences in records can be determined with -\series bold -length_seq -\series default -, which adds the key SEQ_LEN to each record with the sequence length as - the value. - It also generates an extra record that is emitted last with the key TOTAL_SEQ_L -EN showing the total length of all the sequences. -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_seq -\end_layout - -\begin_layout Standard -It is also possible to determine the sequence length using the generic tool - -\series bold -length_vals -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-get-value_length" - -\end_inset - -, which determines the length of the values for a given list of keys: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_vals --keys=SEQ -\end_layout - -\begin_layout Standard -To obtain the total length of all sequences use -\series bold -sum_vals -\series default - like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_vals --keys=SEQ -\end_layout - -\begin_layout LyX-Code -| sum_vals --keys=SEQ_LEN -\end_layout - -\begin_layout Standard -The biotool -\series bold -analyze_seq -\series default - will also determine the length of each sequence (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-analyze" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to analyze sequence composition? -\begin_inset LatexCommand label -name "sub:How-to-analyze" - -\end_inset - - -\end_layout - -\begin_layout Standard -If you want to find out the sequence type, composition, length, as well - as GC content, indel content and proportions of soft and hard masked sequence, - then use -\series bold -analyze_seq -\series default -. - This handy biotool will determine all these things per sequence from which - it is easy to get an overview using the -\series bold -write_tab -\series default - biotool to output a table (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-tab" - -\end_inset - -). - So in order to determine the sequence composition of a FASTA file with - just one entry containing the sequence 'ATCG' we just read the data with - -\series bold -read_fasta -\series default - and run the output through -\series bold -analyze_seq -\series default - which will add the analysis to the record like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq ... -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:D: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -MIX_INDEX: 0.55 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:W: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:G: 16 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SOFT_MASK%: 63.75 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:B: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:V: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -HARD_MASK%: 0.00 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:H: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:S: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:N: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:.: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -GC%: 35.00 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:A: 8 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:Y: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:M: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:T: 44 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SEQ_TYPE: DNA -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:K: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:~: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SEQ: TTTCAGTTTGGGACGGAGTAAGGCCTTCCtttttttttttttttttttttttttttttgagaccgagtcttgctc -tgtcg -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SEQ_LEN: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -80 RES:R: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:C: 12 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:-: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:U: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize ---- -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout Standard -Now to make a table of how may As, Ts, Cs, and Gs you can add the following: -\end_layout - -\begin_layout LyX-Code -... - | analyze_seq | write_tab --keys=RES:A,RES:T,RES:C,RES:G -\end_layout - -\begin_layout Standard -Or if you want to see the proportions of hard and soft masked sequence: -\end_layout - -\begin_layout LyX-Code -... - | analyse_seq | write_tab --keys=HARD_MASK%,SOFT_MASK% -\end_layout - -\begin_layout Standard -If you have a stack of sequences in one file and you want to determine the - mean GC content you can do it using the -\series bold -mean_vals -\series default - biotool: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq | mean_vals --keys=GC% -\end_layout - -\begin_layout Standard -Or if you want the total count of Ns you can use -\series bold -sum_vals -\series default - like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq | sum_vals --keys=RES:N -\end_layout - -\begin_layout Standard -The MIX_INDEX key is calculated as the count of the most common residue - over the sequence length, and can be used as a cut-off for removing sequence - tags consisting of mostly one nucleotide: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq | grab --eval='MIX_INDEX<0.85' -\end_layout - -\begin_layout Subsection -How to extract subsequences? -\begin_inset LatexCommand label -name "sub:How-to-extract" - -\end_inset - - -\end_layout - -\begin_layout Standard -In order to extract a subsequence from a longer sequence use the biotool - extract_seq, which will replace the sequence in the record with the subsequence - (this behaviour should probably be modified to be dependant of a -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -replace or a -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_replace switch -\begin_inset Note Note -status collapsed - -\begin_layout Standard -also in split_seq -\end_layout - -\end_inset - -). - So to extract the first 20 residues from all sequences do (first residue - is designated 1): -\end_layout - -\begin_layout LyX-Code -... - | extract_seq --beg=1 --len=20 -\end_layout - -\begin_layout Standard -You can also specify a begin and end coordinate set: -\end_layout - -\begin_layout LyX-Code -... - | extract_seq --beg=20 --end=40 -\end_layout - -\begin_layout Standard -If you want the subsequences from position 20 to the sequence end do: -\end_layout - -\begin_layout LyX-Code -... - | extract_seq --beg=20 -\end_layout - -\begin_layout Standard -If you want to extract subsequences a given distance from the sequence end - you can do this by reversing the sequence with the biotool -\series bold -reverse_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-reverse-seq" - -\end_inset - -, followed by -\series bold -extract_seq -\series default - to get the subsequence, and then -\series bold -reverse_seq -\series default - again to get the subsequence back in the original orientation: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | reverse_seq -\end_layout - -\begin_layout LyX-Code -| extract_seq --beg=10 --len=10 | reverse_seq -\end_layout - -\begin_layout Subsection -How to get genomic sequence? -\begin_inset LatexCommand label -name "sub:How-to-get-genomic-sequence" - -\end_inset - - -\end_layout - -\begin_layout Standard -The biotool -\series bold -get_genomic_seq -\series default - can extract subsequences for a given genome specified with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -genome switch explicitly using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -beg and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -end/ -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -len switches: -\end_layout - -\begin_layout LyX-Code -get_genome_seq --genome= --beg=1 --len=100 -\end_layout - -\begin_layout Standard -Alternatively, -\series bold -get_genome_seq -\series default - can be used to append the corresponding sequence to BED, PSL, and BLAST - records: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= | get_genome_seq --genome= -\end_layout - -\begin_layout Standard -It is also possible to include flaking sequence using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -flank switch. - So to include 50 nucleotides upstream and 50 nucleotides downstream for - each BED entry do: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= | get_genome_seq --genome= --flank=50 -\end_layout - -\begin_layout Subsection -How to upper-case sequences? -\end_layout - -\begin_layout Standard -Sequences can be shifted from lower case to upper case using -\series bold -uppercase_seq -\series default -: -\end_layout - -\begin_layout LyX-Code -... - | uppercase_seq -\end_layout - -\begin_layout Subsection -How to reverse sequences? -\begin_inset LatexCommand label -name "sub:How-to-reverse-seq" - -\end_inset - - -\end_layout - -\begin_layout Standard -The order of residues in a sequence can be reversed using reverse_seq: -\end_layout - -\begin_layout LyX-Code -... - | reverse_seq -\end_layout - -\begin_layout Standard -Note that in order to reverse/complement a sequence you also need the -\series bold -complement_seq -\series default - biotool (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-complement" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to complement sequences? -\begin_inset LatexCommand label -name "sub:How-to-complement" - -\end_inset - - -\end_layout - -\begin_layout Standard -DNA and RNA sequences can be complemented with -\series bold -complement_seq -\series default -, which automagically determines the sequence type: -\end_layout - -\begin_layout LyX-Code -... - | complement_seq -\end_layout - -\begin_layout Standard -Note that in order to reverse/complement a sequence you also need the -\series bold -reverse_seq -\series default - biotool (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-reverse-seq" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to remove indels from sequnces? -\end_layout - -\begin_layout Standard -Indels can be removed from sequences with the -\series bold -remove_indels -\series default - biotool. - This is useful if you have aligned some sequences (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-align" - -\end_inset - -) and extracted (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-extract" - -\end_inset - -) a block of subsequences from the alignment and you want to use these sequence - in a search where you need to remove the indels first. - '-', '~', and '.' are considered indels: -\end_layout - -\begin_layout LyX-Code -... - | remove_indels -\end_layout - -\begin_layout Subsection -How to shuffle sequences? -\end_layout - -\begin_layout Standard -All residues in sequences in the stream can be shuffled to random positions - using the -\series bold -shuffle_seq -\series default - biotool: -\end_layout - -\begin_layout LyX-Code -... - | shuffle_seq -\end_layout - -\begin_layout Subsection -How to split sequences into overlapping subsequences? -\end_layout - -\begin_layout Standard -Sequences can be slit into overlapping subsequences with the -\series bold -split_seq -\series default - biotool. -\end_layout - -\begin_layout LyX-Code -... - | split_seq --word_size=20 --uniq -\end_layout - -\begin_layout Subsection -How to determine the oligo frequency? -\end_layout - -\begin_layout Standard -In order to determine if any oligo usage is over represented in one or more - sequences you can determine the frequency of oligos of a given size with - -\series bold -oligo_freq -\series default -: -\end_layout - -\begin_layout LyX-Code -... - | oligo_freq --word_size=4 -\end_layout - -\begin_layout Standard -And if you have more than one sequence and want to accumulate the frequences - you need the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -all switch: -\end_layout - -\begin_layout LyX-Code -... - | oligo_freq --word_size=4 --all -\end_layout - -\begin_layout Standard -To get a meaningful result you need to write the resulting frequencies as - a table with -\series bold -write_tab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-tab" - -\end_inset - -), but first it is important to -\series bold -grab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-grab" - -\end_inset - -) the records with the frequencies to avoid full length sequences in the - table: -\end_layout - -\begin_layout LyX-Code -... - | oligo_freq --word_size=4 --all | grab --pattern=OLIGO --keys_only -\end_layout - -\begin_layout LyX-Code -| write_tab --no_stream -\end_layout - -\begin_layout Standard -And the resulting frequency table can be sorted with Unix sort (man sort). -\end_layout - -\begin_layout Subsection -How to search for sequences in genomes? -\end_layout - -\begin_layout Standard -See the following biotool: -\end_layout - -\begin_layout Itemize - -\series bold -patscan_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-patscan" - -\end_inset - - -\end_layout - -\begin_layout Itemize - -\series bold -blat_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAT" - -\end_inset - - -\end_layout - -\begin_layout Itemize - -\series bold -blast_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAST" - -\end_inset - - -\end_layout - -\begin_layout Itemize - -\series bold -vmatch_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-Vmatch" - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to search sequences for a pattern? -\begin_inset LatexCommand label -name "sub:How-to-use-patscan" - -\end_inset - - -\end_layout - -\begin_layout Standard -It is possible to search sequences in the data stream for patterns using - the -\series bold -patscan_seq -\series default - biotool which utilizes the powerful scan_for_matches engine. - Consult the documentation for scan_for_matches in order to learn how to - define patterns (the documentation is included in Appendix\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sec:scan_for_matches-README" - -\end_inset - -). -\end_layout - -\begin_layout Standard -To search all sequences for a simple pattern consisting of the sequence - ATCGATCG allowing for 3 mismatches, 2 insertions and 1 deletion: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | patscan_seq --pattern='ATCGATCG[3,2,1]' -\end_layout - -\begin_layout Standard -The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern switch takes a comma seperated list of patterns, so if you want - to search for more that one pattern do: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern='ATCGATCG[3,2,1],GCTAGCTA[3,2,1]' -\end_layout - -\begin_layout Standard -It is also possible to have a list of different patterns to search for in - a file with one pattern per line. - In order to get -\series bold -patscan_seq -\series default - to read these patterns use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in switch: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern_in= -\end_layout - -\begin_layout Standard -To also scan the complementary strand in nucleotide sequences ( -\series bold -patscan_seq -\series default - automagically determines the sequence type) you need to add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -comp switch: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern= --comp -\end_layout - -\begin_layout Standard -It is also possible to use -\series bold -patscan_seq -\series default - to output those records that does not contain a certain pattern by using - the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -invert switch: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern= --invert -\end_layout - -\begin_layout Standard -Finally, -\series bold -patscan_seq -\series default - can also scan for patterns in a given genome sequence, instead of sequences - in the stream, using the -\begin_inset ERT -status open - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -genome switch: -\end_layout - -\begin_layout LyX-Code -patscan --pattern= --genome= -\end_layout - -\begin_layout Subsection -How to use BLAT for sequence search? -\begin_inset LatexCommand label -name "sub:How-to-use-BLAT" - -\end_inset - - -\end_layout - -\begin_layout Standard -Sequences in the data stream can be matched against supported genomes using - -\series bold -blat_seq -\series default - which is a biotool using BLAT as the name might suggest. - Currently only Mouse and Human genomes are available and it is not possible - to use OOC files since there is still a need for a local repository for - genome files. - Otherwise it is just: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | blat_seq --genome= -\end_layout - -\begin_layout Standard -The search results can then be written to file with -\series bold -write_psl -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-PSL" - -\end_inset - -) or -\series bold -write_bed -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-BED" - -\end_inset - -) allthough with -\series bold -write_bed -\series default - some information will be lost). - It is also possible to plot chromosome distribution of the search results - using -\series bold -plot_chrdist -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-plot-chrdist" - -\end_inset - -) or the distribution of the match lengths using -\series bold -plot_lendist -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-plot-lendist" - -\end_inset - -) or a karyogram with the hits using -\series bold -plot_karyogram -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-plot-karyogram" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to use BLAST for sequence search? -\begin_inset LatexCommand label -name "sub:How-to-use-BLAST" - -\end_inset - - -\end_layout - -\begin_layout Standard -Two biotools exist for blasting sequences: -\series bold -create_blast_db -\series default - is used to create the BLAST database required for BLAST which is queried - using the biotool -\series bold -blast_seq -\series default -. - So in order to create a BLAST database from sequences in the data stream - you simple run: -\end_layout - -\begin_layout LyX-Code -... - | create_blast_db --database=my_database --no_stream -\end_layout - -\begin_layout Standard -The type of sequence to use for the database is automagically determined - by -\series bold -create_blast_db -\series default -, but don't have a mixture of peptide and nucleic acids sequences in the - stream. - The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -database switch takes a path as argument, but will default to 'blastdb_ if not set. -\end_layout - -\begin_layout Standard -The resulting database can now be queried with sequences in another data - stream using -\series bold -blast_seq -\series default -: -\end_layout - -\begin_layout LyX-Code -... - | blast_seq --database=my_database -\end_layout - -\begin_layout Standard -Again, the sequence type is determined automagically and the appropriate - BLAST program is guessed (see below table), however, the program name can - be overruled with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -program switch. -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Tabular - - - - - - - -\begin_inset Text - -\begin_layout Standard -Subject sequence -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Query sequence -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Program guess -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -blastn -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -blastp -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -blastx -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -tblastn -\end_layout - -\end_inset - - - - -\end_inset - - -\end_layout - -\begin_layout Standard -Finally, it is also possible to use -\series bold -blast_seq -\series default - for blasting sequences agains a preformatted genome using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -genome switch instead of the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -database switch: -\end_layout - -\begin_layout LyX-Code -... - | blast_seq --genome= -\end_layout - -\begin_layout Subsection -How to use Vmatch for sequence search? -\begin_inset LatexCommand label -name "sub:How-to-use-Vmatch" - -\end_inset - - -\end_layout - -\begin_layout Standard -The powerful suffix array software package Vmatch -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://www.vmatch.de/" - -\end_inset - - -\end_layout - -\end_inset - - can be used for exact mapping of sequences against indexed genomes using - the biotool -\series bold -vmatch_seq -\series default -, which will e.g. - map 700000 ESTs to the human genome locating all 160 mio hits in less than - an hour. - Only nucleotide sequences and sequences longer than 11 nucleotides will - be mapped. - It is recommended that sequences consisting of mostly one nucleotide type - are removed. - This can be done with the -\series bold -analyze_seq -\series default - biotool -\begin_inset LatexCommand eqref -reference "sub:How-to-analyze" - -\end_inset - -. -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= -\end_layout - -\begin_layout Standard -It is also possible to allow for mismatches using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -hamming_dist switch. - So to allow for 2 mismatches: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=2 -\end_layout - -\begin_layout Standard -Or to allow for 10% mismathing nucleotides: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=10p -\end_layout - -\begin_layout Standard -To allow both indels and mismatches use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -edit_dist switch. - So to allow for one mismatch or one indel: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=1 -\end_layout - -\begin_layout Standard -Or to allow for 5% indels or mismatches: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=5p -\end_layout - -\begin_layout Standard -Note that using -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -hamming_dist or -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -edit_dist greatly slows down vmatch considerably --- use with care. -\end_layout - -\begin_layout Standard -The resulting SCORE key can be replaced to hold the number of genome matches - of a given sequence (multi-mappers) is the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -count switch is given. -\end_layout - -\begin_layout Subsection -How to find all matches between sequences? -\begin_inset LatexCommand label -name "sub:How-to-find-matches" - -\end_inset - - -\end_layout - -\begin_layout Standard -All matches between two sequences can be determined with the biotool -\series bold -match_seq -\series default -. - The match finding engine underneath the hood of -\series bold -match_seq -\series default - is the super fast suffix tree program MUMmer -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://mummer.sourceforge.net/" - -\end_inset - - -\end_layout - -\end_inset - -, which will locate all forward and reverse matches between huge sequences - in a matter of minutes (if the repeat count is not too high and if the - word size used is appropriate). - Matching two -\emph on -Helicobacter pylori -\emph default - genomes (1.7Mbp) takes around 10 seconds: -\end_layout - -\begin_layout LyX-Code -... - | match_seq --word_size=20 --direction=both -\end_layout - -\begin_layout Standard -The output from -\series bold -match_seq -\series default - can be used to generate a dot plot with -\series bold -plot_matches -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-generate-dotplot" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to align sequences? -\begin_inset LatexCommand label -name "sub:How-to-align" - -\end_inset - - -\end_layout - -\begin_layout Standard -Sequences in the stream can be aligned with the -\series bold -align_seq -\series default - biotool that uses Muscle -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://www.drive5.com/muscle/muscle.html" - -\end_inset - - -\end_layout - -\end_inset - - as aligment engine. - Currently you cannot change any of the Muscle alignment parameters and - -\series bold -align_seq -\series default - will create an alignment based on the defaults (which are really good!): -\end_layout - -\begin_layout LyX-Code -... - | align_seq -\end_layout - -\begin_layout Standard -The aligned output can be written to file in FASTA format using -\series bold -write_fasta -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-fasta" - -\end_inset - -) or in pretty text using -\series bold -write_align -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-alignment" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to create a weight matrix? -\end_layout - -\begin_layout Standard -If you want a weight matrix to show the sequence composition of a stack - of sequences you can use the biotool create_weight_matrix: -\end_layout - -\begin_layout LyX-Code -... - | create_weight_matrix -\end_layout - -\begin_layout Standard -The result can be output in percent using the -\begin_inset ERT -status open - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -percent switch: -\end_layout - -\begin_layout LyX-Code -... - | create_weight_matrix --percent -\end_layout - -\begin_layout Standard -The weight matrix can be written as tabular output with -\series bold -write_tab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-tab" - -\end_inset - -) after removeing the records containing SEQ with -\series bold -grab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-grab" - -\end_inset - -): -\end_layout - -\begin_layout LyX-Code -... - | create_weight_matrix | grab --invert --keys=SEQ --keys_only -\end_layout - -\begin_layout LyX-Code -| write_tab --no_stream -\end_layout - -\begin_layout Standard -The V0 column will hold the residue, while the rest of the columns will - hold the frequencies for each sequence position. -\end_layout - -\begin_layout Section -Plotting -\end_layout - -\begin_layout Standard -There exists several biotools for plotting. - Some of these are based on GNUplot -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://www.gnuplot.info/" - -\end_inset - - -\end_layout - -\end_inset - -, which is an extremely powerful platform to generate all sorts of plots - and even though GNUplot has quite a steep learning curve, the biotools - utilizing GNUplot are simple to use. - GNUplot is able to output a lot of different formats (called terminals - in GNUplot), but the biotools focusses on three formats only: -\end_layout - -\begin_layout Enumerate -The 'dumb' terminal is default to the GNUplot based biotools and will output - a plot in crude ASCII text (Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Dumb-terminal" - -\end_inset - -). - This is quite nice for a quick and dirty plot to get an overview of your - data . -\end_layout - -\begin_layout Enumerate -The 'post' or 'postscript' terminal output postscript code which is publication - grade graphics that can be viewed with applications such as Ghostview, - Photoshop, and Preview. -\end_layout - -\begin_layout Enumerate -The 'svg' terminal output's scalable vector graphics (SVG) which is a vector - based format. - SVG is great because you can edit the resulting plot using Photoshop or - Inkscape -\begin_inset Foot -status collapsed - -\begin_layout Standard -Inkscape is a really handy drawing program that is free and open source. - Availble at -\begin_inset LatexCommand htmlurl -target "http://www.inkscape.org" - -\end_inset - - -\end_layout - -\end_inset - - if you want to add additional labels, captions, arrows, and so on and then - save the result in different formats, such as postscript without loosing - resolution. -\end_layout - -\begin_layout Standard -The biotools for plotting that are not based on GNUplot only output SVG - (that may change in the future). -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename lendist_ascii.png - lyxscale 70 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Dumb-terminal" - -\end_inset - -Dumb terminal -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -The output of a length distribution plot in the default 'dumb terminal' - to the terminal window. - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a histogram? -\begin_inset LatexCommand label -name "How-to-plot-histogram" - -\end_inset - - -\end_layout - -\begin_layout Standard -A generic histogram for a given value can be plotted with the biotool -\series bold -plot_histogram -\series default - (Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Histogram" - -\end_inset - -): -\end_layout - -\begin_layout LyX-Code -... - | plot_histogram --key=TISSUE --no_stream -\end_layout - -\begin_layout Standard -(Figure missing) -\end_layout - -\begin_layout Standard -\noindent -\align left -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename histogram.png - lyxscale 70 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Histogram" - -\end_inset - -Histogram -\end_layout - -\end_inset - - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a length distribution? -\begin_inset LatexCommand label -name "sub:How-to-plot-lendist" - -\end_inset - - -\end_layout - -\begin_layout Standard -Plotting of length distributions, weather sequence lengths, patterns lengths, - hit lengths, -\emph on -etc. - -\emph default - is a really handy thing and can be done with the the biotool -\series bold -plot_lendist -\series default -. - If you have a file with FASTA entries and want to plot the length distribution - you do it like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_seq -\end_layout - -\begin_layout LyX-Code -| plot_lendist --key=SEQ_LEN --no_stream -\end_layout - -\begin_layout Standard -The result will be written to the default dumb terminal and will look like - Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Dumb-terminal" - -\end_inset - -. -\end_layout - -\begin_layout Standard -If you instead want the result in postscript format you can do: -\end_layout - -\begin_layout LyX-Code -... - | plot_lendist --key=SEQ_LEN --terminal=post --result_out=file.ps -\end_layout - -\begin_layout Standard -That will generate the plot and save it to file, but not interrupt the data - stream which can then be used in further analysis. - You can also save the plot implicetly using '>', however, it is then important - to terminate the stream with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch: -\end_layout - -\begin_layout LyX-Code -... - | plot_lendist --key=SEQ_LEN --terminal=post --no_stream > file.ps -\end_layout - -\begin_layout Standard -The resulting plot can be seen in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Length-distribution" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard - -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename lendist.ps - lyxscale 50 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Length-distribution" - -\end_inset - -Length distribution -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -Length distribution of 630 piRNA like RNAs. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a chromosome distribution? -\begin_inset LatexCommand label -name "sub:How-to-plot-chrdist" - -\end_inset - - -\end_layout - -\begin_layout Standard -If you have the result of a sequence search against a multi chromosome genome, - it is very practical to be able to plot the distribution of search hits - on the different chromosomes. - This can be done with -\series bold -plot_chrdist -\series default -: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | blat_genome | plot_chrdist --no_stream -\end_layout - -\begin_layout Standard -The above example will result in a crude plot using the 'dumb' terminal, - and if you want to mess around with the results from the BLAT search you - probably want to save the result to file first (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-PSL" - -\end_inset - -). - To plot the chromosome distribution from the saved search result you can - do: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in=file.bed | plot_chrdist --terminal=post --result_out=plot.ps -\end_layout - -\begin_layout Standard -That will result in the output show in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Chromosome-distribution" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard - -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename chrdist.ps - lyxscale 50 - width 12cm - rotateAngle 90 - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Chromosome-distribution" - -\end_inset - -Chromosome distribution -\end_layout - -\end_inset - - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to generate a dotplot? -\begin_inset LatexCommand label -name "sub:How-to-generate-dotplot" - -\end_inset - - -\end_layout - -\begin_layout Standard -A dotplot is a powerful way to get an overview of the size and location - of sequence insertions, deletions, and duplications between two sequences. - Generating a dotplot with biotools is a two step process where you initially - find all matches between two sequences using the tool -\series bold -match_seq -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-find-matches" - -\end_inset - -) and plot the resulting matches with -\series bold -plot_matches -\series default -. - Matching and plotting two -\emph on -Helicobacter pylori -\emph default - genomes (1.7Mbp) takes around 10 seconds: -\end_layout - -\begin_layout LyX-Code -... - | match_seq | plot_matches --terminal=post --result_out=plot.ps -\end_layout - -\begin_layout Standard -The resulting dotplot is in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Dotplot" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename dotplot.ps - lyxscale 50 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Dotplot" - -\end_inset - -Dotplot -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -Forward matches are displayed in green while reverse matches are displayed - in red. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a sequence logo? -\end_layout - -\begin_layout Standard -Sequence logos can be generate with -\series bold -plot_seqlogo -\series default -. - The sequnce type is determined automagically and an entropy scale of 2 - bits and 4 bits is used for nucleotide and peptide sequences, respectively -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand htmlurl -target "http://www.ccrnp.ncifcrf.gov/~toms/paper/hawaii/latex/node5.html" - -\end_inset - - -\end_layout - -\end_inset - -. -\end_layout - -\begin_layout LyX-Code -... - | plot_seqlogo --no_stream --result_out=seqlogo.svg -\end_layout - -\begin_layout Standard -An example of a sequence logo can be seen in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Sequence-logo" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename seqlogo.png - lyxscale 50 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Sequence-logo" - -\end_inset - -Sequence logo -\end_layout - -\end_inset - - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a karyogram? -\begin_inset LatexCommand label -name "sub:How-to-plot-karyogram" - -\end_inset - - -\end_layout - -\begin_layout Standard -To plot search hits on genomes use -\series bold -plot_karyogram -\series default -, which will output a nice karyogram in SVG graphics: -\end_layout - -\begin_layout LyX-Code -... - | plot_karyogram --result_out=karyogram.svg -\end_layout - -\begin_layout Standard -The banding data is taken from the UCSC genome browser database and currently - only Human and Mouse is supported. - Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Karyogram" - -\end_inset - - shows the distribution of piRNA like RNAs matched to the Human genome. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename karyogram.png - lyxscale 35 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Karyogram" - -\end_inset - -Karyogram -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -Hits from a search of piRNA like RNAs in the Human genome is displayed as - short horizontal bars. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Section -Uploading Results -\end_layout - -\begin_layout Subsection -How do I display my results in the UCSC Genome Browser? -\end_layout - -\begin_layout Standard -Results from the list of biotools below can be uploaded directly to a local - mirror of the UCSC Genome Browser using the biotool -\series bold -upload_to_ucsc -\series default -: -\end_layout - -\begin_layout Itemize -patscan_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-patscan" - -\end_inset - - -\end_layout - -\begin_layout Itemize -blat_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAT" - -\end_inset - - -\end_layout - -\begin_layout Itemize -blast_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAST" - -\end_inset - - -\end_layout - -\begin_layout Itemize -vmatch_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-Vmatch" - -\end_inset - - -\end_layout - -\begin_layout Standard -The syntax for uploading data the most simple way requires two mandatory - switches: -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -database, which is the UCSC database name (such as hg18, mm9, etc.) and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -table which should be the users initials followed by an underscore and a - short description of the data: -\end_layout - -\begin_layout LyX-Code -... - | upload_to_ucsc --database=hg18 --table=mah_snoRNAs -\end_layout - -\begin_layout Standard -The -\series bold -upload_to_ucsc -\series default - biotool modifies the users ~/ucsc/my_tracks.ra file automagically (a backup - is created with the name ~/ucsc/my_tracks.ra~) with default values that - can be overridden using the following switches: -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -short_label - Short label for track - Default=database->table -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -long_label - Long label for track - Default=database->table -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -group - Track group name - Default= -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -priority - Track display priority - Default=1 -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -color - Track color - Default=147,73,42 -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -chunk_size - Chunks for loading - Default=10000000 -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -visibility - Track visibility - Default=pack -\end_layout - -\begin_layout Standard -Also, data in BED or PSL format can be uploaded with -\series bold -upload_to_ucsc -\series default - as long as these reference to genomes and chromosomes existing in the UCSC - Genome Browser: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= | upload_to_ucsc ... -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code -read_psl --data_in= | upload_to_ucsc ... -\end_layout - -\begin_layout Section -Power Scripting -\end_layout - -\begin_layout Standard -It is possible to do commandline scripting of biotool records using Perl. - Because a biotool record essentially is a hash structure, you can pass - records to -\series bold -bioscript -\series default - command, which is a wrapper around the Perl executable that allows direct - manipulations of the records using the power of Perl. -\end_layout - -\begin_layout Standard -In the below example we replace in all records the value to the CHR key - with a forthrunning number: -\end_layout - -\begin_layout LyX-Code -... - | bioscript 'while($r=get_record( -\backslash -*STDIN)){$r->{CHR}=$i++; put_record($r)}' -\end_layout - -\begin_layout Standard -Something more useful would probably be to create custom FASTA headers. - E.g. - if we read in a BED file, lookup the genomic sequence, create a custom - FASTA header with -\series bold -bioscript -\series default - and output FASTA entries: -\end_layout - -\begin_layout LyX-Code -... - | bioscript 'while($r=get_record( -\backslash -*STDIN)){$r->{SEQ_NAME}= // -\end_layout - -\begin_layout LyX-Code -join("_",$r->{CHR},$r->{CHR_BEG},$r->{CHR_END}); put_record($r)}' -\end_layout - -\begin_layout Standard -And the output: -\end_layout - -\begin_layout LyX-Code ->chr2L_21567527_21567550 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_693380_693403 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_13859534_13859557 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_9005090_9005113 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_2106825_2106848 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_14649031_14649054 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout Section -Trouble shooting -\end_layout - -\begin_layout Standard -Shoot the messenger! -\end_layout - -\begin_layout Section -\start_of_appendix -Keys -\begin_inset LatexCommand label -name "sec:Keys" - -\end_inset - - -\end_layout - -\begin_layout Standard -HIT -\end_layout - -\begin_layout Standard -HIT_BEG -\end_layout - -\begin_layout Standard -HIT_END -\end_layout - -\begin_layout Standard -HIT_LEN -\end_layout - -\begin_layout Standard -HIT_NAME -\end_layout - -\begin_layout Standard -PATTERN -\end_layout - -\begin_layout Section -Switches -\begin_inset LatexCommand label -name "sec:Switches" - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_in -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_out -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -data_in -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num -\end_layout - -\begin_layout Section -scan_for_matches README -\begin_inset LatexCommand label -name "sec:scan_for_matches-README" - -\end_inset - - -\end_layout - -\begin_layout LyX-Code - scan_for_matches: -\end_layout - -\begin_layout LyX-Code - A Program to Scan Nucleotide or Protein Sequences for Matching Patterns -\end_layout - -\begin_layout LyX-Code - Ross Overbeek -\end_layout - -\begin_layout LyX-Code - MCS -\end_layout - -\begin_layout LyX-Code - Argonne National Laboratory -\end_layout - -\begin_layout LyX-Code - Argonne, IL 60439 -\end_layout - -\begin_layout LyX-Code - USA -\end_layout - -\begin_layout LyX-Code -Scan_for_matches is a utility that we have written to search for -\end_layout - -\begin_layout LyX-Code -patterns in DNA and protein sequences. - I wrote most of the code, -\end_layout - -\begin_layout LyX-Code -although David Joerg and Morgan Price wrote sections of an -\end_layout - -\begin_layout LyX-Code -earlier version. - The whole notion of pattern matching has a rich -\end_layout - -\begin_layout LyX-Code -history, and we borrowed liberally from many sources. - However, it is -\end_layout - -\begin_layout LyX-Code -worth noting that we were strongly influenced by the elegant tools -\end_layout - -\begin_layout LyX-Code -developed and distributed by David Searls. - My intent is to make the -\end_layout - -\begin_layout LyX-Code -existing tool available to anyone in the research community that might -\end_layout - -\begin_layout LyX-Code -find it useful. - I will continue to try to fix bugs and make suggested -\end_layout - -\begin_layout LyX-Code -enhancements, at least until I feel that a superior tool exists. -\end_layout - -\begin_layout LyX-Code -Hence, I would appreciate it if all bug reports and suggestions are -\end_layout - -\begin_layout LyX-Code -directed to me at Overbeek@mcs.anl.gov. - -\end_layout - -\begin_layout LyX-Code -I will try to log all bug fixes and report them to users that send me -\end_layout - -\begin_layout LyX-Code -their email addresses. - I do not require that you give me your name -\end_layout - -\begin_layout LyX-Code -and address. - However, if you do give it to me, I will try to notify -\end_layout - -\begin_layout LyX-Code -you of serious problems as they are discovered. -\end_layout - -\begin_layout LyX-Code -Getting Started: -\end_layout - -\begin_layout LyX-Code - The distribution should contain at least the following programs: -\end_layout - -\begin_layout LyX-Code - README - This document -\end_layout - -\begin_layout LyX-Code - ggpunit.c - One of the two source files -\end_layout - -\begin_layout LyX-Code - scan_for_matches.c - The second source file -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - run_tests - A perl script to test things -\end_layout - -\begin_layout LyX-Code - show_hits - A handy perl script -\end_layout - -\begin_layout LyX-Code - test_dna_input - Test sequences for DNA -\end_layout - -\begin_layout LyX-Code - test_dna_patterns - Test patterns for DNA scan -\end_layout - -\begin_layout LyX-Code - test_output - Desired output from test -\end_layout - -\begin_layout LyX-Code - test_prot_input - Test protein sequences -\end_layout - -\begin_layout LyX-Code - test_prot_patterns - Test patterns for proteins -\end_layout - -\begin_layout LyX-Code - testit - a perl script used for test -\end_layout - -\begin_layout LyX-Code - Only the first three files are required. - The others are useful, -\end_layout - -\begin_layout LyX-Code - but only if you have Perl installed on your system. - If you do -\end_layout - -\begin_layout LyX-Code - have Perl, I suggest that you type -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - which perl -\end_layout - -\begin_layout LyX-Code - to find out where it installed. - On my system, I get the following -\end_layout - -\begin_layout LyX-Code - response: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - clone% which perl -\end_layout - -\begin_layout LyX-Code - /usr/local/bin/perl -\end_layout - -\begin_layout LyX-Code - indicating that Perl is installed in /usr/local/bin. - Anyway, once -\end_layout - -\begin_layout LyX-Code - you know where it is installed, edit the first line of files -\end_layout - -\begin_layout LyX-Code - testit -\end_layout - -\begin_layout LyX-Code - show_hits -\end_layout - -\begin_layout LyX-Code - replacing /usr/local/bin/perl with the appropriate location. - I -\end_layout - -\begin_layout LyX-Code - will assume that you can do this, although it is not critical (it -\end_layout - -\begin_layout LyX-Code - is needed only to test the installation and to use the "show_hits" -\end_layout - -\begin_layout LyX-Code - utility). - Perl is not required to actually install and run -\end_layout - -\begin_layout LyX-Code - scan_for_matches. - -\end_layout - -\begin_layout LyX-Code - If you do not have Perl, I suggest you get it and install it (it -\end_layout - -\begin_layout LyX-Code - is a wonderful utility). - Information about Perl and how to get it -\end_layout - -\begin_layout LyX-Code - can be found in the book "Programming Perl" by Larry Wall and -\end_layout - -\begin_layout LyX-Code - Randall L. - Schwartz, published by O'Reilly & Associates, Inc. -\end_layout - -\begin_layout LyX-Code - To get started, you will need to compile the program. - I do this -\end_layout - -\begin_layout LyX-Code - using -\end_layout - -\begin_layout LyX-Code - gcc -O -o scan_for_matches ggpunit.c scan_for_matches.c -\end_layout - -\begin_layout LyX-Code - If you do not use GNU C, use -\end_layout - -\begin_layout LyX-Code - cc -O -DCC -o scan_for_matches ggpunit.c scan_for_matches.c -\end_layout - -\begin_layout LyX-Code - which works on my Sun. - -\end_layout - -\begin_layout LyX-Code - Once you have compiled scan_for_matches, you can verify that it -\end_layout - -\begin_layout LyX-Code - works with -\end_layout - -\begin_layout LyX-Code - clone% run_tests tmp -\end_layout - -\begin_layout LyX-Code - clone% diff tmp test_output -\end_layout - -\begin_layout LyX-Code - You may get a few strange lines of the sort -\end_layout - -\begin_layout LyX-Code - clone% run_tests tmp -\end_layout - -\begin_layout LyX-Code - rm: tmp: No such file or directory -\end_layout - -\begin_layout LyX-Code - clone% diff tmp test_output -\end_layout - -\begin_layout LyX-Code - These should cause no concern. - However, if the "diff" shows that -\end_layout - -\begin_layout LyX-Code - tmp and test_output are different, contact me (you have a -\end_layout - -\begin_layout LyX-Code - problem). - -\end_layout - -\begin_layout LyX-Code - You should now be able to use scan_for_matches by following the -\end_layout - -\begin_layout LyX-Code - instructions given below (which is all the normal user should have -\end_layout - -\begin_layout LyX-Code - to understand, once things are installed properly). -\end_layout - -\begin_layout LyX-Code - ============================================================== -\end_layout - -\begin_layout LyX-Code -How to run scan_for_matches: -\end_layout - -\begin_layout LyX-Code - To run the program, you type need to create two files -\end_layout - -\begin_layout LyX-Code - 1. - the first file contains the pattern you wish to scan for; I'll -\end_layout - -\begin_layout LyX-Code - call this file pat_file in what follows (but any name is ok) -\end_layout - -\begin_layout LyX-Code - 2. - the second file contains a set of sequences to scan. - These -\end_layout - -\begin_layout LyX-Code - should be in "fasta format". - Just look at the contents of -\end_layout - -\begin_layout LyX-Code - test_dna_input to see examples of this format. - Basically, -\end_layout - -\begin_layout LyX-Code - each sequence begins with a line of the form -\end_layout - -\begin_layout LyX-Code - >sequence_id -\end_layout - -\begin_layout LyX-Code - and is followed by one or more lines containing the sequence. -\end_layout - -\begin_layout LyX-Code - Once these files have been created, you just use -\end_layout - -\begin_layout LyX-Code - scan_for_matches pat_file < input_file -\end_layout - -\begin_layout LyX-Code - to scan all of the input sequences for the given pattern. - As an -\end_layout - -\begin_layout LyX-Code - example, suppose that pat_file contains a single line of the form -\end_layout - -\begin_layout LyX-Code - p1=4...7 3...8 ~p1 -\end_layout - -\begin_layout LyX-Code - Then, -\end_layout - -\begin_layout LyX-Code - scan_for_matches pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - should produce two "hits". - When I run this on my machine, I get -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - >tst1:[6,27] -\end_layout - -\begin_layout LyX-Code - cguaacc ggttaacc gguuacg -\end_layout - -\begin_layout LyX-Code - >tst2:[6,27] -\end_layout - -\begin_layout LyX-Code - CGUAACC GGTTAACC GGUUACG -\end_layout - -\begin_layout LyX-Code - clone% -\end_layout - -\begin_layout LyX-Code -Simple Patterns Built by Matching Ranges and Reverse Complements -\end_layout - -\begin_layout LyX-Code - Let me first explain this simple pattern: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - p1=4...7 3...8 ~p1 -\end_layout - -\begin_layout LyX-Code - The pattern consists of three "pattern units" separated by spaces. -\end_layout - -\begin_layout LyX-Code - The first pattern unit is -\end_layout - -\begin_layout LyX-Code - p1=4...7 -\end_layout - -\begin_layout LyX-Code - which means "match 4 to 7 characters and call them p1". - The -\end_layout - -\begin_layout LyX-Code - second pattern unit is -\end_layout - -\begin_layout LyX-Code - 3...8 -\end_layout - -\begin_layout LyX-Code - which means "then match 3 to 8 characters". - The last pattern unit -\end_layout - -\begin_layout LyX-Code - is -\end_layout - -\begin_layout LyX-Code - ~p1 -\end_layout - -\begin_layout LyX-Code - which means "match the reverse complement of p1". - The first -\end_layout - -\begin_layout LyX-Code - reported hit is shown as -\end_layout - -\begin_layout LyX-Code - >tst1:[6,27] -\end_layout - -\begin_layout LyX-Code - cguaacc ggttaacc gguuacg -\end_layout - -\begin_layout LyX-Code - which states that characters 6 through 27 of sequence tst1 were -\end_layout - -\begin_layout LyX-Code - matched. - "cguaac" matched the first pattern unit, "ggttaacc" the -\end_layout - -\begin_layout LyX-Code - second, and "gguuacg" the third. - This is an example of a common -\end_layout - -\begin_layout LyX-Code - type of pattern used to search for sections of DNA or RNA that -\end_layout - -\begin_layout LyX-Code - would fold into a hairpin loop. -\end_layout - -\begin_layout LyX-Code -Searching Both Strands -\end_layout - -\begin_layout LyX-Code - Now for a short aside: scan_for_matches only searched the -\end_layout - -\begin_layout LyX-Code - sequences in the input file; it did not search the opposite -\end_layout - -\begin_layout LyX-Code - strand. - With a pattern of the sort we just used, there is not -\end_layout - -\begin_layout LyX-Code - need o search the opposite strand. - However, it is normally the -\end_layout - -\begin_layout LyX-Code - case that you will wish to search both the sequence and the -\end_layout - -\begin_layout LyX-Code - opposite strand (i.e., the reverse complement of the sequence). -\end_layout - -\begin_layout LyX-Code - To do that, you would just use the "-c" command line. - For example, -\end_layout - -\begin_layout LyX-Code - scan_for_matches -c pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - Hits on the opposite strand will show a beginning location greater -\end_layout - -\begin_layout LyX-Code - than te end location of the match. -\end_layout - -\begin_layout LyX-Code -Defining Pairing Rules and Allowing Mismatches, Insertions, and Deletions -\end_layout - -\begin_layout LyX-Code - Let us stop now and ask "What additional features would one need to -\end_layout - -\begin_layout LyX-Code - really find the kinds of loop structures that characterize tRNAs, -\end_layout - -\begin_layout LyX-Code - rRNAs, and so forth?" I can immediately think of two: -\end_layout - -\begin_layout LyX-Code - a) you will need to be able to allow non-standard pairings -\end_layout - -\begin_layout LyX-Code - (those other than G-C and A-U), and -\end_layout - -\begin_layout LyX-Code - b) you will need to be able to tolerate some number of -\end_layout - -\begin_layout LyX-Code - mismatches and bulges. -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - Let me first show you how to handle non-standard "rules for -\end_layout - -\begin_layout LyX-Code - pairing in reverse complements". - Consider the following pattern, -\end_layout - -\begin_layout LyX-Code - which I show as two line (you may use as many lines as you like in -\end_layout - -\begin_layout LyX-Code - forming a pattern, although you can only break a pattern at points -\end_layout - -\begin_layout LyX-Code - where space would be legal): -\end_layout - -\begin_layout LyX-Code - r1={au,ua,gc,cg,gu,ug,ga,ag} -\end_layout - -\begin_layout LyX-Code - p1=2...3 0...4 p2=2...5 1...5 r1~p2 0...4 ~p1 -\end_layout - -\begin_layout LyX-Code - The first "pattern unit" does not actually match anything; rather, -\end_layout - -\begin_layout LyX-Code - it defines a "pairing rule" in which standard pairings are -\end_layout - -\begin_layout LyX-Code - allowed, as well as G-A and A-G (in case you wondered, Us and Ts -\end_layout - -\begin_layout LyX-Code - and upper and lower case can be used interchangably; for example -\end_layout - -\begin_layout LyX-Code - r1={AT,UA,gc,cg} could be used to define the "standard rule" for -\end_layout - -\begin_layout LyX-Code - pairings). - The second line consists of six pattern units which -\end_layout - -\begin_layout LyX-Code - may be interpreted as follows: -\end_layout - -\begin_layout LyX-Code - p1=2...3 match 2 or 3 characters (call it p1) -\end_layout - -\begin_layout LyX-Code - 0...4 match 0 to 4 characters -\end_layout - -\begin_layout LyX-Code - p2=2...5 match 2 to 5 characters (call it p2) -\end_layout - -\begin_layout LyX-Code - 1...5 match 1 to 5 characters -\end_layout - -\begin_layout LyX-Code - r1~p2 match the reverse complement of p2, -\end_layout - -\begin_layout LyX-Code - allowing G-A and A-G pairs -\end_layout - -\begin_layout LyX-Code - 0...4 match 0 to 4 characters -\end_layout - -\begin_layout LyX-Code - ~p1 match the reverse complement of p1 -\end_layout - -\begin_layout LyX-Code - allowing only G-C, C-G, A-T, and T-A pairs -\end_layout - -\begin_layout LyX-Code - Thus, r1~p2 means "match the reverse complement of p2 using rule r1". -\end_layout - -\begin_layout LyX-Code - Now let us consider the issue of tolerating mismatches and bulges. -\end_layout - -\begin_layout LyX-Code - You may add a "qualifier" to the pattern unit that gives the -\end_layout - -\begin_layout LyX-Code - tolerable number of "mismatches, deletions, and insertions". -\end_layout - -\begin_layout LyX-Code - Thus, -\end_layout - -\begin_layout LyX-Code - p1=10...10 3...8 ~p1[1,2,1] -\end_layout - -\begin_layout LyX-Code - means that the third pattern unit must match 10 characters, -\end_layout - -\begin_layout LyX-Code - allowing one "mismatch" (a pairing other than G-C, C-G, A-T, or -\end_layout - -\begin_layout LyX-Code - T-A), two deletions (a deletion is a character that occurs in p1, -\end_layout - -\begin_layout LyX-Code - but has been "deleted" from the string matched by ~p1), and one -\end_layout - -\begin_layout LyX-Code - insertion (an "insertion" is a character that occurs in the string -\end_layout - -\begin_layout LyX-Code - matched by ~p1, but not for which no corresponding character -\end_layout - -\begin_layout LyX-Code - occurs in p1). - In this case, the pattern would match -\end_layout - -\begin_layout LyX-Code - ACGTACGTAC GGGGGGGG GCGTTACCT -\end_layout - -\begin_layout LyX-Code - which is, you must admit, a fairly weak loop. - It is common to -\end_layout - -\begin_layout LyX-Code - allow mismatches, but you will find yourself using insertions and -\end_layout - -\begin_layout LyX-Code - deletions much more rarely. - In any event, you should note that -\end_layout - -\begin_layout LyX-Code - allowing mismatches, insertions, and deletions does force the -\end_layout - -\begin_layout LyX-Code - program to try many additional possible pairings, so it does slow -\end_layout - -\begin_layout LyX-Code - things down a bit. -\end_layout - -\begin_layout LyX-Code -How Patterns Are Matched -\end_layout - -\begin_layout LyX-Code - Now is as good a time as any to discuss the basic flow of control -\end_layout - -\begin_layout LyX-Code - when matching patterns. - Recall that a "pattern" is a sequence of -\end_layout - -\begin_layout LyX-Code - "pattern units". - Suppose that the pattern units were -\end_layout - -\begin_layout LyX-Code - u1 u2 u3 u4 ... - un -\end_layout - -\begin_layout LyX-Code - The scan of a sequence S begins by setting the current position -\end_layout - -\begin_layout LyX-Code - to 1. - Then, an attempt is made to match u1 starting at the -\end_layout - -\begin_layout LyX-Code - current position. - Each attempt to match a pattern unit can -\end_layout - -\begin_layout LyX-Code - succeed or fail. - If it succeeds, then an attempt is made to match -\end_layout - -\begin_layout LyX-Code - the next unit. - If it fails, then an attempt is made to find an -\end_layout - -\begin_layout LyX-Code - alternative match for the immediately preceding pattern unit. - If -\end_layout - -\begin_layout LyX-Code - this succeeds, then we proceed forward again to the next unit. - If -\end_layout - -\begin_layout LyX-Code - it fails we go back to the preceding unit. - This process is called -\end_layout - -\begin_layout LyX-Code - "backtracking". - If there are no previous units, then the current -\end_layout - -\begin_layout LyX-Code - position is incremented by one, and everything starts again. - This -\end_layout - -\begin_layout LyX-Code - proceeds until either the current position goes past the end of -\end_layout - -\begin_layout LyX-Code - the sequence or all of the pattern units succeed. - On success, -\end_layout - -\begin_layout LyX-Code - scan_for_matches reports the "hit", the current position is set -\end_layout - -\begin_layout LyX-Code - just past the hit, and an attempt is made to find another hit. -\end_layout - -\begin_layout LyX-Code - If you wish to limit the scan to simply finding a maximum of, say, -\end_layout - -\begin_layout LyX-Code - 10 hits, you can use the -n option (-n 10 would set the limit to -\end_layout - -\begin_layout LyX-Code - 10 reported hits). - For example, -\end_layout - -\begin_layout LyX-Code - scan_for_matches -c -n 1 pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - would search for just the first hit (and would stop searching the -\end_layout - -\begin_layout LyX-Code - current sequences or any that follow in the input file). -\end_layout - -\begin_layout LyX-Code -Searching for repeats: -\end_layout - -\begin_layout LyX-Code - In the last section, I discussed almost all of the details -\end_layout - -\begin_layout LyX-Code - required to allow you to look for repeats. - Consider the following -\end_layout - -\begin_layout LyX-Code - set of patterns: -\end_layout - -\begin_layout LyX-Code - p1=6...6 3...8 p1 (find exact 6 character repeat separated -\end_layout - -\begin_layout LyX-Code - by to 8 characters) -\end_layout - -\begin_layout LyX-Code - p1=6...6 3..8 p1[1,0,0] (allow one mismatch) -\end_layout - -\begin_layout LyX-Code - p1=3...3 p1[1,0,0] p1[1,0,0] p1[1,0,0] -\end_layout - -\begin_layout LyX-Code - (match 12 characters that are the remains -\end_layout - -\begin_layout LyX-Code - of a 3-character sequence occurring 4 times) -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - p1=4...8 0...3 p2=6...8 p1 0...3 p2 -\end_layout - -\begin_layout LyX-Code - (This would match things like -\end_layout - -\begin_layout LyX-Code - ATCT G TCTTT ATCT TG TCTTT -\end_layout - -\begin_layout LyX-Code - ) -\end_layout - -\begin_layout LyX-Code -Searching for particular sequences: -\end_layout - -\begin_layout LyX-Code - Occasionally, one wishes to match a specific, known sequence. -\end_layout - -\begin_layout LyX-Code - In such a case, you can just give the sequence (along with an -\end_layout - -\begin_layout LyX-Code - optional statement of the allowable mismatches, insertions, and -\end_layout - -\begin_layout LyX-Code - deletions). - Thus, -\end_layout - -\begin_layout LyX-Code - p1=6...8 GAGA ~p1 (match a hairpin with GAGA as the loop) -\end_layout - -\begin_layout LyX-Code - RRRRYYYY (match 4 purines followed by 4 pyrimidines) -\end_layout - -\begin_layout LyX-Code - TATAA[1,0,0] (match TATAA, allowing 1 mismatch) -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code -Matches against a "weight matrix": -\end_layout - -\begin_layout LyX-Code - I will conclude my examples of the types of pattern units -\end_layout - -\begin_layout LyX-Code - available for matching against nucleotide sequences by discussing a -\end_layout - -\begin_layout LyX-Code - crude implemetation of matching using a "weight matrix". - While I -\end_layout - -\begin_layout LyX-Code - am less than overwhelmed with the syntax that I chose, I think that -\end_layout - -\begin_layout LyX-Code - the reader should be aware that I was thinking of generating -\end_layout - -\begin_layout LyX-Code - patterns containing such pattern units automatically from -\end_layout - -\begin_layout LyX-Code - alignments (and did not really plan on typing such things in by -\end_layout - -\begin_layout LyX-Code - hand very often). - Anyway, suppose that you wanted to match a -\end_layout - -\begin_layout LyX-Code - sequence of eight characters. - The "consensus" of these eight -\end_layout - -\begin_layout LyX-Code - characters is GRCACCGS, but the actual "frequencies of occurrence" -\end_layout - -\begin_layout LyX-Code - are given in the matrix below. - Thus, the first character is an A -\end_layout - -\begin_layout LyX-Code - 16% the time and a G 84% of the time. - The second is an A 57% of -\end_layout - -\begin_layout LyX-Code - the time, a C 10% of the time, a G 29% of the time, and a T 4% of -\end_layout - -\begin_layout LyX-Code - the time. - -\end_layout - -\begin_layout LyX-Code - C1 C2 C3 C4 C5 C6 C7 C8 -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - A 16 57 0 95 0 18 0 0 -\end_layout - -\begin_layout LyX-Code - C 0 10 80 0 100 60 0 50 -\end_layout - -\begin_layout LyX-Code - G 84 29 0 0 0 20 100 50 -\end_layout - -\begin_layout LyX-Code - T 0 4 20 5 0 2 0 0 -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - One could use the following pattern unit to search for inexact -\end_layout - -\begin_layout LyX-Code - matches related to such a "weight matrix": -\end_layout - -\begin_layout LyX-Code - {(16,0,84,0),(57,10,29,4),(0,80,0,20),(95,0,0,5), -\end_layout - -\begin_layout LyX-Code - (0,100,0,0),(18,60,20,2),(0,0,100,0),(0,50,50,0)} > 450 -\end_layout - -\begin_layout LyX-Code - This pattern unit will attempt to match exactly eight characters. -\end_layout - -\begin_layout LyX-Code - For each character in the sequence, the entry in the corresponding -\end_layout - -\begin_layout LyX-Code - tuple is added to an accumulated sum. - If the sum is greater than -\end_layout - -\begin_layout LyX-Code - 450, the match succeeds; else it fails. -\end_layout - -\begin_layout LyX-Code - Recently, this feature was upgraded to allow ranges. - Thus, -\end_layout - -\begin_layout LyX-Code - 600 > {(16,0,84,0),(57,10,29,4),(0,80,0,20),(95,0,0,5), -\end_layout - -\begin_layout LyX-Code - (0,100,0,0),(18,60,20,2),(0,0,100,0),(0,50,50,0)} > 450 -\end_layout - -\begin_layout LyX-Code - will work, as well. -\end_layout - -\begin_layout LyX-Code -Allowing Alternatives: -\end_layout - -\begin_layout LyX-Code - Very occasionally, you may wish to allow alternative pattern units -\end_layout - -\begin_layout LyX-Code - (i.e., "match either A or B"). - You can do this using something -\end_layout - -\begin_layout LyX-Code - like -\end_layout - -\begin_layout LyX-Code - ( GAGA | GCGCA) -\end_layout - -\begin_layout LyX-Code - which says "match either GAGA or GCGCA". - You may take -\end_layout - -\begin_layout LyX-Code - alternatives of a list of pattern units, for example -\end_layout - -\begin_layout LyX-Code - (p1=3...3 3...8 ~p1 | p1=5...5 4...4 ~p1 GGG) -\end_layout - -\begin_layout LyX-Code - would match one of two sequences of pattern units. - There is one -\end_layout - -\begin_layout LyX-Code - clumsy aspect of the syntax: to match a list of alternatives, you -\end_layout - -\begin_layout LyX-Code - need to fully the request. - Thus, -\end_layout - -\begin_layout LyX-Code - (GAGA | (GCGCA | TTCGA)) -\end_layout - -\begin_layout LyX-Code - would be needed to try the three alternatives. -\end_layout - -\begin_layout LyX-Code -One Minor Extension -\end_layout - -\begin_layout LyX-Code - Sometimes a pattern will contain a sequence of distinct ranges, -\end_layout - -\begin_layout LyX-Code - and you might wish to limit the sum of the lengths of the matched -\end_layout - -\begin_layout LyX-Code - subsequences. - For example, suppose that you basically wanted to -\end_layout - -\begin_layout LyX-Code - match something like -\end_layout - -\begin_layout LyX-Code - ARRYYTT p1=0...5 GCA[1,0,0] p2=1...6 ~p1 4...8 ~p2 p3=4...10 CCT -\end_layout - -\begin_layout LyX-Code - but that the sum of the lengths of p1, p2, and p3 must not exceed -\end_layout - -\begin_layout LyX-Code - eight characters. - To do this, you could add -\end_layout - -\begin_layout LyX-Code - length(p1+p2+p3) < 9 -\end_layout - -\begin_layout LyX-Code - as the last pattern unit. - It will just succeed or fail (but does -\end_layout - -\begin_layout LyX-Code - not actually match any characters in the sequence). -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code -Matching Protein Sequences -\end_layout - -\begin_layout LyX-Code - Suppose that the input file contains protein sequences. - In this -\end_layout - -\begin_layout LyX-Code - case, you must invoke scan_for_matches with the "-p" option. - You -\end_layout - -\begin_layout LyX-Code - cannot use aspects of the language that relate directly to -\end_layout - -\begin_layout LyX-Code - nucleotide sequences (e.g., the -c command line option or pattern -\end_layout - -\begin_layout LyX-Code - constructs referring to the reverse complement of a previously -\end_layout - -\begin_layout LyX-Code - matched unit). - -\end_layout - -\begin_layout LyX-Code - You also have two additional constructs that allow you to match -\end_layout - -\begin_layout LyX-Code - either "one of a set of amino acids" or "any amino acid other than -\end_layout - -\begin_layout LyX-Code - those a given set". - For example, -\end_layout - -\begin_layout LyX-Code - p1=0...4 any(HQD) 1...3 notany(HK) p1 -\end_layout - -\begin_layout LyX-Code - would successfully match a string like -\end_layout - -\begin_layout LyX-Code - YWV D AA C YWV -\end_layout - -\begin_layout LyX-Code -Using the show_hits Utility -\end_layout - -\begin_layout LyX-Code - When viewing a large set of complex matches, you might find it -\end_layout - -\begin_layout LyX-Code - convenient to post-process the scan_for_matches output to get a -\end_layout - -\begin_layout LyX-Code - more readable version. - We provide a simple post-processor called -\end_layout - -\begin_layout LyX-Code - "show_hits". - To see its effect, just pipe the output of a -\end_layout - -\begin_layout LyX-Code - scan_for_matches into show_hits: -\end_layout - -\begin_layout LyX-Code - Normal Output: -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches -c pat_file < tmp -\end_layout - -\begin_layout LyX-Code - >tst1:[1,28] -\end_layout - -\begin_layout LyX-Code - gtacguaacc ggttaac cgguuacgtac -\end_layout - -\begin_layout LyX-Code - >tst1:[28,1] -\end_layout - -\begin_layout LyX-Code - gtacgtaacc ggttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - >tst2:[2,31] -\end_layout - -\begin_layout LyX-Code - CGTACGUAAC C GGTTAACC GGUUACGTACG -\end_layout - -\begin_layout LyX-Code - >tst2:[31,2] -\end_layout - -\begin_layout LyX-Code - CGTACGTAAC C GGTTAACC GGTTACGTACG -\end_layout - -\begin_layout LyX-Code - >tst3:[3,32] -\end_layout - -\begin_layout LyX-Code - gtacguaacc g gttaactt cgguuacgtac -\end_layout - -\begin_layout LyX-Code - >tst3:[32,3] -\end_layout - -\begin_layout LyX-Code - gtacgtaacc g aagttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - Piped Through show_hits: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches -c pat_file < tmp | show_hits -\end_layout - -\begin_layout LyX-Code - tst1:[1,28]: gtacguaacc ggttaac cgguuacgtac -\end_layout - -\begin_layout LyX-Code - tst1:[28,1]: gtacgtaacc ggttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - tst2:[2,31]: CGTACGUAAC C GGTTAACC GGUUACGTACG -\end_layout - -\begin_layout LyX-Code - tst2:[31,2]: CGTACGTAAC C GGTTAACC GGTTACGTACG -\end_layout - -\begin_layout LyX-Code - tst3:[3,32]: gtacguaacc g gttaactt cgguuacgtac -\end_layout - -\begin_layout LyX-Code - tst3:[32,3]: gtacgtaacc g aagttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - clone% -\end_layout - -\begin_layout LyX-Code - Optionally, you can specify which of the "fields" in the matches -\end_layout - -\begin_layout LyX-Code - you wish to sort on, and show_hits will sort them. - The field -\end_layout - -\begin_layout LyX-Code - numbers start with 0. - So, you might get something like -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches -c pat_file < tmp | show_hits 2 1 -\end_layout - -\begin_layout LyX-Code - tst2:[2,31]: CGTACGUAAC C GGTTAACC GGUUACGTACG -\end_layout - -\begin_layout LyX-Code - tst2:[31,2]: CGTACGTAAC C GGTTAACC GGTTACGTACG -\end_layout - -\begin_layout LyX-Code - tst3:[32,3]: gtacgtaacc g aagttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - tst1:[1,28]: gtacguaacc ggttaac cgguuacgtac -\end_layout - -\begin_layout LyX-Code - tst1:[28,1]: gtacgtaacc ggttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - tst3:[3,32]: gtacguaacc g gttaactt cgguuacgtac -\end_layout - -\begin_layout LyX-Code - clone% -\end_layout - -\begin_layout LyX-Code - In this case, the hits have been sorted on fields 2 and 1 (that is, -\end_layout - -\begin_layout LyX-Code - the third and second matched subfields). -\end_layout - -\begin_layout LyX-Code - show_hits is just one possible little post-processor, and you -\end_layout - -\begin_layout LyX-Code - might well wish to write a customized one for yourself. -\end_layout - -\begin_layout LyX-Code -Reducing the Cost of a Search -\end_layout - -\begin_layout LyX-Code - The scan_for_matches utility uses a fairly simple search, and may -\end_layout - -\begin_layout LyX-Code - consume large amounts of CPU time for complex patterns. - Someday, -\end_layout - -\begin_layout LyX-Code - I may decide to optimize the code. - However, until then, let me -\end_layout - -\begin_layout LyX-Code - mention one useful technique. - -\end_layout - -\begin_layout LyX-Code - When you have a complex pattern that includes a number of varying -\end_layout - -\begin_layout LyX-Code - ranges, imprecise matches, and so forth, it is useful to -\end_layout - -\begin_layout LyX-Code - "pipeline" matches. - That is, form a simpler pattern that can be -\end_layout - -\begin_layout LyX-Code - used to scan through a large database extracting sections that -\end_layout - -\begin_layout LyX-Code - might be matched by the more complex pattern. - Let me illustrate -\end_layout - -\begin_layout LyX-Code - with a short example. - Suppose that you really wished to match the -\end_layout - -\begin_layout LyX-Code - pattern -\end_layout - -\begin_layout LyX-Code - p1=3...5 0...8 ~p1[1,1,0] p2=6...7 3...6 AGC 3...5 RYGC ~p2[1,0,0] -\end_layout - -\begin_layout LyX-Code - In this case, the pattern units AGC 3...5 RYGC can be used to rapidly -\end_layout - -\begin_layout LyX-Code - constrain the overall search. - You can preprocess the overall -\end_layout - -\begin_layout LyX-Code - database using the pattern: -\end_layout - -\begin_layout LyX-Code - 31...31 AGC 3...5 RYGC 7...7 -\end_layout - -\begin_layout LyX-Code - Put the complex pattern in pat_file1 and the simpler pattern in -\end_layout - -\begin_layout LyX-Code - pat_file2. - Then use, -\end_layout - -\begin_layout LyX-Code - scan_for_matches -c pat_file2 < nucleotide_database | -\end_layout - -\begin_layout LyX-Code - scan_for_matches pat_file1 -\end_layout - -\begin_layout LyX-Code - The output will show things like -\end_layout - -\begin_layout LyX-Code - >seqid:[232,280][2,47] -\end_layout - -\begin_layout LyX-Code - matches pieces -\end_layout - -\begin_layout LyX-Code - Then, the actual section of the sequence that was matched can be -\end_layout - -\begin_layout LyX-Code - easily computed as [233,278] (remember, the positions start from -\end_layout - -\begin_layout LyX-Code - 1, not 0). -\end_layout - -\begin_layout LyX-Code - Let me finally add, you should do a few short experiments to see -\end_layout - -\begin_layout LyX-Code - whether or not such pipelining actually improves performance -- it -\end_layout - -\begin_layout LyX-Code - is not always obvious where the time is going, and I have -\end_layout - -\begin_layout LyX-Code - sometimes found that the added complexity of pipelining actually -\end_layout - -\begin_layout LyX-Code - slowed things up. - It gets its best improvements when there are -\end_layout - -\begin_layout LyX-Code - exact matches of more than just a few characters that can be -\end_layout - -\begin_layout LyX-Code - rapidly used to eliminate large sections of the database. -\end_layout - -\begin_layout LyX-Code -============= -\end_layout - -\begin_layout LyX-Code -Additions: -\end_layout - -\begin_layout LyX-Code -Feb 9, 1995: the pattern units ^ and $ now work as in normal regular -\end_layout - -\begin_layout LyX-Code - expressions. - That is -\end_layout - -\begin_layout LyX-Code - TTF $ -\end_layout - -\begin_layout LyX-Code - matches only TTF at the end of the string and -\end_layout - -\begin_layout LyX-Code - ^ TTF -\end_layout - -\begin_layout LyX-Code - matches only an initial TTF -\end_layout - -\begin_layout LyX-Code - The pattern unit -\end_layout - -\begin_layout LyX-Code - >>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<< -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -# Stuff that enables biotools. -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -export TOOLS_DIR="/home/m.hansen/tools" # Contains binaries for BLAST - and Vmatch. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -export INST_DIR="/home/m.hansen/maasha" # Contains scripts and modules. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -export DATA_DIR="/home/m.hansen/DATA" # Contains genomic data etc. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -export TMP_DIR="/home/m.hansen/maasha/tmp" # Required temporary directory. -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -export PATH="$PATH:$TOOLS_DIR/blast-2.2.17/bin:$TOOLS_DIR/vmatch.distribution" -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -export PATH="$INST_DIR/bin/:$INST_DIR/perl_scripts/:$INST_DIR/perl_scripts/b -iotools:$PATH" -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -export PERL5LIB="$PERL5LIB:$INST_DIR" -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -# Alias allowing power scripting with biotools -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -alias bioscript="perl -I $INST_DIR/Maasha -MBiotools=read_stream,get_recor -d,put_record -e" -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -# >>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<< -\end_layout - -\begin_layout Section -Getting Started -\end_layout - -\begin_layout Standard -The biotool -\series bold -list_biotools -\series default - lists all the biotools along with a description: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -list_biotools -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -align_seq Align sequences in stream using Muscle. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -analyze_seq Analysis the residue composition of each sequence - in stream. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -analyze_vals Determine type, count, min, max, sum and mean for - values in stream. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -blast_seq BLAST sequences in stream against a specified database. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -blat_seq BLAT sequences in stream against a specified genome. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -complement_seq Complement sequences in stream. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -count_records Count the number of records in stream. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -count_seq Count sequences in stream. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -count_vals Count the number of times values of given keys exists - in stream. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -create_blast_db Create a BLAST database from sequences in stream for - use with BLAST. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -... -\end_layout - -\begin_layout Standard -To list the biotools for writing different formats, you can use unix's grep - like this: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -list_biotools | grep write -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -write_align Write aligned sequences in pretty alignment format. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -write_bed Write records from stream as BED lines. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -write_blast Write BLAST records from stream in BLAST tabular format - (-m8 and 9). -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -write_fasta Write sequences in FASTA format. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -write_psl Write records from stream in PSL format. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -write_tab Write records from stream as tab separated table. -\end_layout - -\begin_layout Standard -In order to find out how a specific biotool works, you just type the program - name without any arguments and press return and the usage of the biotool - will be displayed. - E.g. - -\series bold -read_fasta -\series default - : -\end_layout - -\begin_layout Standard -\begin_inset Box Frameless -position "t" -hor_pos "c" -has_inner_box 1 -inner_pos "t" -use_parbox 0 -width "100col%" -special "none" -height "1in" -height_special "totalheight" -status open - -\begin_layout LyX-Code - -\size scriptsize -Program name: read_fasta -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Author: Martin Asser Hansen - Copyright (C) - All rights reserved -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Contact: mail@maasha.dk -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Date: August 2007 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -License: GNU General Public License version 2 (http://www.gnu.org/copyleft/ -gpl.html) -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Description: Read FASTA entries. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Usage: read_fasta [options] -i -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Options: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-i | --data_in=] - Comma separated list of files - to read. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-n | --num=] - Limit number of records to read. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-I | --stream_in=] - Read input stream from file - - Default=STDIN -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - [-O | --stream_out=] - Write output stream to file - - Default=STDOUT -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -Examples: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i test.fna - Read FASTA entries from file. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i test1.fna,test2,fna - Read FASTA entries from files. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i '*.fna' - Read FASTA entries from files. -\end_layout - -\begin_layout LyX-Code - -\size scriptsize - read_fasta -i test.fna -n 10 - Read first 10 FASTA entries from - file. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Section -The Data Stream -\end_layout - -\begin_layout Subsection -How to read the data stream from file? -\begin_inset LatexCommand label -name "sub:How-to-read-stream" - -\end_inset - - -\end_layout - -\begin_layout Standard -You want to read a data stream that you previously have saved to file in - biotools format. - This can be done implicetly or explicitly. - The implicit way uses the 'stdout' stream of the Unix terminal: -\end_layout - -\begin_layout LyX-Code -cat | -\end_layout - -\begin_layout Standard -cat is the Unix command that reads a file and output the result to 'stdout' - --- which in this case is piped to any biotool represented by the . - It is also possible to read the data stream using '<' to direct the 'stdout' - stream into the biotool like this: -\end_layout - -\begin_layout LyX-Code - < -\end_layout - -\begin_layout Standard -However, that will not work if you pipe more biotools together. - Then it is much safer to read the stream from a file explicitly like this: -\end_layout - -\begin_layout LyX-Code - --stream_in= -\end_layout - -\begin_layout Standard -Here the filename is explicetly given to the biotool with - the switch -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_in. - This switch works with all biotools. - It is also possible to read in data from multiple sources by repeating - the explicit read step: -\end_layout - -\begin_layout LyX-Code - --stream_in= | --stream_in= -\end_layout - -\begin_layout Subsection -How to write the data stream to file? -\begin_inset LatexCommand label -name "sub:How-to-write-stream" - -\end_inset - - -\end_layout - -\begin_layout Standard -In order to save the output stream from a biotool to file, so you can read - in the stream again at a later time, you can do one of two things: -\end_layout - -\begin_layout LyX-Code - > -\end_layout - -\begin_layout Standard -All, the biotools write the data stream to 'stdout' by default which can - be written to a file by redirecting 'stdout' to file using '>' , however, - if one of the biotools for writing other formats is used then the both - the biotools records as well as the result output will go to 'stdout' in - a mixture causing havock! To avoid this you must use the switch -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_out that explictly tells the biotool to write the output stream to - file: -\end_layout - -\begin_layout LyX-Code - --stream_out= -\end_layout - -\begin_layout Standard -The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_out switch works with all biotools. -\end_layout - -\begin_layout Subsection -How to terminate the data stream? -\end_layout - -\begin_layout Standard -The data stream is never stops unless the user want to save the stream or - by supplying the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch that will terminate the stream: -\end_layout - -\begin_layout LyX-Code - --no_stream -\end_layout - -\begin_layout Standard -The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch only works with those biotools where it makes sense that - the user might want to terminale the data stream, -\emph on -i.e -\emph default -. - after an analysis step where the user wants to output the result, but not - the data stream. -\end_layout - -\begin_layout Subsection -How to write my results to file? -\begin_inset LatexCommand label -name "sub:How-to-write-result" - -\end_inset - - -\end_layout - -\begin_layout Standard -Saving the result of an analysis to file can be done implicitly or explicitly. - The implicit way: -\end_layout - -\begin_layout LyX-Code - --no_stream > -\end_layout - -\begin_layout Standard -If you use '>' to redirect 'stdout' to file then it is important to use - the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch to avoid writing a mix of biotools records and result to - the same file causing havock. - The safe way is to use the -\begin_inset ERT -status open - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out switch which explicetly tells the biotool to write the result - to a given file: -\end_layout - -\begin_layout LyX-Code - --result_out= -\end_layout - -\begin_layout Standard -Using the above method will not terminate the stream, so it is possible - to pipe that into another biotool generating different results: -\end_layout - -\begin_layout LyX-Code - --result_out= | --result_out= -\end_layout - -\begin_layout Standard -And still the data stream will continue unless terminated with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream: -\end_layout - -\begin_layout LyX-Code - --result_out= --no_stream -\end_layout - -\begin_layout Standard -Or written to file using implicitly or explicity -\begin_inset LatexCommand eqref -reference "sub:How-to-write-result" - -\end_inset - -. - The explicit way: -\end_layout - -\begin_layout LyX-Code - --result_out= --stream_out= -\end_layout - -\begin_layout Subsection -How to read data from multiple sources? -\end_layout - -\begin_layout Standard -To read multiple data sources, with the same type or different type of data - do: -\end_layout - -\begin_layout LyX-Code - --data_in= | --data_in= -\end_layout - -\begin_layout Standard -where type is the data type a specific biotool reads. -\end_layout - -\begin_layout Section -Reading input -\end_layout - -\begin_layout Subsection -How to read biotools input? -\end_layout - -\begin_layout Standard -See -\begin_inset LatexCommand eqref -reference "sub:How-to-read-stream" - -\end_inset - -. -\end_layout - -\begin_layout Subsection -How to read in data? -\end_layout - -\begin_layout Standard -Data in different formats can be read with the appropriate biotool for that - format. - The biotools are typicalled named 'read_' such as -\series bold -read_fasta -\series default -, -\series bold -read_bed -\series default -, -\series bold -read_tab -\series default -, etc., and all behave in a similar manner. - Data can be read by supplying the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -data_in switch and a file name to the file containing the data: -\end_layout - -\begin_layout LyX-Code - --data_in= -\end_layout - -\begin_layout Standard -It is also possible to read in a saved biotools stream (see -\begin_inset LatexCommand ref -reference "sub:How-to-read-stream" - -\end_inset - -) as well as reading data in one go: -\end_layout - -\begin_layout LyX-Code - --stream_in= --data_in= -\end_layout - -\begin_layout Standard -If you want to read data from several files you can do this: -\end_layout - -\begin_layout LyX-Code - --data_in= | --data_in= -\end_layout - -\begin_layout Standard -If you have several data files you can read in all explicitly with a comma - separated list: -\end_layout - -\begin_layout LyX-Code - --data_in=file1,file2,file3 -\end_layout - -\begin_layout Standard -And it is also possible to use file globbing -\begin_inset Foot -status open - -\begin_layout Standard -using the short option will only work if you quote the argument -i '*.fna' -\end_layout - -\end_inset - -: -\end_layout - -\begin_layout LyX-Code - --data_in=*.fna -\end_layout - -\begin_layout Standard -Or in a combination: -\end_layout - -\begin_layout LyX-Code - --data_in=file1,/dir/*.fna -\end_layout - -\begin_layout Standard -Finally, it is possible to read in data in different formats using the appropria -te biotool for each format: -\end_layout - -\begin_layout LyX-Code - --data_in= | --data_in= ... -\end_layout - -\begin_layout Subsection -How to read FASTA input? -\end_layout - -\begin_layout Standard -Sequences in FASTA format can be read explicitly using -\series bold -read_fasta -\series default -: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= -\end_layout - -\begin_layout Subsection -How to read alignment input? -\end_layout - -\begin_layout Standard -If your alignment if FASTA formatted then you can -\series bold -read_align -\series default -. - It is also possible to use -\series bold -read_fasta -\series default - since the data is FASTA formatted, however, with -\series bold -read_fasta -\series default - the key ALIGN will be omitted. - The ALIGN key is used to determine which sequences belong to what alignment - which is required for -\series bold -write_align -\series default -. -\end_layout - -\begin_layout LyX-Code -read_align --data_in= -\end_layout - -\begin_layout Subsection -How to read tabular input? -\begin_inset LatexCommand label -name "sub:How-to-read-table" - -\end_inset - - -\end_layout - -\begin_layout Standard -Tabular input can be read with -\series bold -read_tab -\series default - which will read in all rows and chosen columns (separated by a given delimter) - from a table in text format. -\end_layout - -\begin_layout Standard -The table below: -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Tabular - - - - - - - -\begin_inset Text - -\begin_layout Standard -Human -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -ATACGTCAG -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -23524 -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Dog -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -AGCATGAC -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -2442 -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Mouse -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -GACTG -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -234 -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Cat -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -AAATGCA -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -2342 -\end_layout - -\end_inset - - - - -\end_inset - - -\end_layout - -\begin_layout Standard -Can be read using the command: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= -\end_layout - -\begin_layout Standard -Which will result in four records, one for each row, where the keys V0, - V1, V2 are the default keys for the organism, sequence, and count, respectively. - It is possible to select a subset of colums to read by using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -cols switch which takes a comma separated list of columns numbers (first - column is designated 0) as argument. - So to read in only the sequence and the count so that the count comes before - the sequence do: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= --cols=2,1 -\end_layout - -\begin_layout Standard -It is also possible to name the columns with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys switch: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= --cols=2,1 --keys=COUNT,SEQ -\end_layout - -\begin_layout Subsection -How to read BED input? -\end_layout - -\begin_layout Standard -The BED (Browser Extensible Data -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://genome.ucsc.edu/FAQ/FAQformat" - -\end_inset - - -\end_layout - -\end_inset - -) format is a tabular format for data pertaining to one of the Eukaryotic - genomes in the UCSC genome brower -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://genome.ucsc.edu/" - -\end_inset - - -\end_layout - -\end_inset - -. - The BED format consists of up to 12 columns, where the first three are - mandatory CHR, CHR_BEG, and CHR_END. - The mandatory columns and any of the optional columns can all be read in - easily with the -\series bold -read_bed -\series default - biotool. -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= -\end_layout - -\begin_layout Standard -It is also possible to read the BED file with -\series bold -read_tab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-read-table" - -\end_inset - -), however, that will be more cumbersome because you need to specify the - keys: -\end_layout - -\begin_layout LyX-Code -read_tab --data_in= --keys=CHR,CHR_BEG,CHR_END ... -\end_layout - -\begin_layout Subsection -How to read PSL input? -\end_layout - -\begin_layout Standard -The PSL format is the output from BLAT and contains 21 mandatory fields - that can be read with -\series bold -read_psl -\series default -: -\end_layout - -\begin_layout LyX-Code -read_psl --data_in= -\end_layout - -\begin_layout Section -Writing output -\end_layout - -\begin_layout Standard -All result output can be written explicitly to file using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out switch which all result generating biotools have. - It is also possible to write the result to file implicetly by directing - 'stdout' to file using '>', however, that requires the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream swich to prevent a mixture of data stream and results in the file. - The explicit (and safe) way: -\end_layout - -\begin_layout LyX-Code -... - | --result_out= -\end_layout - -\begin_layout Standard -The implicit way: -\end_layout - -\begin_layout LyX-Code -... - | --no_stream > -\end_layout - -\begin_layout Subsection -How to write biotools output? -\end_layout - -\begin_layout Standard -See -\begin_inset LatexCommand eqref -reference "sub:How-to-write-stream" - -\end_inset - -. -\end_layout - -\begin_layout Subsection -How to write FASTA output? -\begin_inset LatexCommand label -name "sub:How-to-write-fasta" - -\end_inset - - -\end_layout - -\begin_layout Standard -FASTA output can be written with -\series bold -write_fasta -\series default -. -\end_layout - -\begin_layout LyX-Code -... - | write_fasta --result_out= -\end_layout - -\begin_layout Standard -It is also possible to wrap the sequences to a given width using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -wrap switch allthough wrapping of sequence is generally an evil thing: -\end_layout - -\begin_layout LyX-Code -... - | write_fasta --no_stream --wrap=80 -\end_layout - -\begin_layout Subsection -How to write alignment output? -\begin_inset LatexCommand label -name "sub:How-to-write-alignment" - -\end_inset - - -\end_layout - -\begin_layout Standard -Pretty alignments with ruler -\begin_inset Foot -status collapsed - -\begin_layout Standard -'.' for every 10 residues, ':' for every 50, and '|' for every 100 -\end_layout - -\end_inset - - and consensus sequence -\begin_inset Note Note -status collapsed - -\begin_layout Standard -which reminds me to make that an option. -\end_layout - -\end_inset - - can be created with -\series bold -write_align -\series default -, what also have the optional -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -wrap switch to break the alignment into blocks of a given width: -\end_layout - -\begin_layout LyX-Code -... - | write_align --result_out= --wrap=80 -\end_layout - -\begin_layout Standard -If the number of sequnces in the alignment is 2 then a pairwise alignment - will be output otherwise a multiple alignment. - And if the sequence type, determined automagically, is protein, then residues - and symbols (+,\InsetSpace ~ -:,\InsetSpace ~ -.) will be used to show consensus according to the Blosum62 - matrix. -\end_layout - -\begin_layout Subsection -How to write tabular output? -\begin_inset LatexCommand label -name "sub:How-to-write-tab" - -\end_inset - - -\end_layout - -\begin_layout Standard -Outputting the data stream as a table can be done with -\series bold -write_tab -\series default -, which will write generate one row per record with the values as columns. - If you supply the optional -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -comment switch, when the first row in the table will be a 'comment' line - prefixed with a '#': -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --comment -\end_layout - -\begin_layout Standard -You can also change the delimiter from the default (tab) to -\emph on -e.g. - -\emph default - ',': -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --delimit=',' -\end_layout - -\begin_layout Standard -If you want the values output in a specific order you have to supply a comma - separated list using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys switch that will print only those keys in that order: -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --keys=SEQ_NAME,COUNT -\end_layout - -\begin_layout Standard -Alternatively, if you have some keys that you don't want in the tabular - output, use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_keys switch. - So to print all keys except SEQ and SEQ_TYPE do: -\end_layout - -\begin_layout LyX-Code -... - | write_tab --result_out= --no_keys=SEQ,SEQ_TYPE -\end_layout - -\begin_layout Standard -Finally, if you have a stream containing a mix of different records types, - -\emph on -e.g. - -\emph default - records with sequences and records with matches, then you can use -\series bold -write_tab -\series default - to output all the records in tabluar format, however, the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -comment, -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys, and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_keys switches will only respond to records of the first type encountered. - The reason is that outputting mixed records is probably not what you want - anyway, and you should remove all the unwanted records from the stream - before outputting the table: -\series bold -grab -\series default - is your friend (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-grab" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to write a BED output? -\begin_inset LatexCommand label -name "sub:How-to-write-BED" - -\end_inset - - -\end_layout - -\begin_layout Standard -Data in BED format can be output if the records contain the mandatory keys - CHR, CHR_BEG, and CHR_END using -\series bold -write_bed -\series default -. - If the optional keys are also present, they will be output as well: -\end_layout - -\begin_layout LyX-Code -write_bed --result_out= -\end_layout - -\begin_layout Subsection -How to write PSL output? -\begin_inset LatexCommand label -name "sub:How-to-write-PSL" - -\end_inset - - -\end_layout - -\begin_layout Standard -Data in PSL format can be output using -\series bold -write_psl: -\end_layout - -\begin_layout LyX-Code -write_psl --result_out= -\end_layout - -\begin_layout Section -Manipulating Records -\end_layout - -\begin_layout Subsection -How to select a few records? -\begin_inset LatexCommand label -name "sub:How-to-select-a-few-records" - -\end_inset - - -\end_layout - -\begin_layout Standard -To quickly get an overview of your data you can limit the data stream to - show a few records. - This also very useful to test the pipeline with a few records if you are - setting up a complex analysis using several biotools. - That way you can inspect that all goes well before analyzing and waiting - for the full data set. - All of the read_ biotools have the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num switch which will take a number as argument and only that number of - records will be read. - So to read in the first 10 FASTA entries from a file: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna --num=10 -\end_layout - -\begin_layout Standard -Another way of doing this is to use -\series bold -head_records -\series default - will limit the stream to show the first 10 records (default): -\end_layout - -\begin_layout LyX-Code -... - | head_records -\end_layout - -\begin_layout Standard -Using -\series bold -head_records -\series default - directly after one of the read_ biotools will be a lot slower than - using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num switch with the read_ biotools, however, -\series bold -head_records -\series default - can also be used to limit the output from all the other biotools. - It is also possible to give -\series bold -head_records -\series default - a number of records to show using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num switch. - So to display the first 100 records do: -\end_layout - -\begin_layout LyX-Code -... - | head_records --num=100 -\end_layout - -\begin_layout Subsection -How to select random records? -\begin_inset LatexCommand label -name "sub:How-to-select-random-records" - -\end_inset - - -\end_layout - -\begin_layout Standard -If you want to inspect a number of random records from the stream this can - be done with the -\series bold -random_records -\series default - biotool. - So if you have 1 mio records in the stream and you want to select 1000 - random records do: -\end_layout - -\begin_layout LyX-Code -... - | random_records --num=1000 -\end_layout - -\begin_layout Subsection -How to count all records in the data stream? -\end_layout - -\begin_layout Standard -To count all the records in the data stream use -\series bold -count_records -\series default -, which adds one record (which is not included in the count) to the data - stream. - So to count the number of sequences in a FASTA file you can do this: -\end_layout - -\begin_layout LyX-Code -cat test.fna | read_fasta | count_records --no_stream -\end_layout - -\begin_layout Standard -Which will write the last record containing the count to 'stdout': -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -count_records: 630 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize ---- -\end_layout - -\begin_layout Standard -It is also possible to write the count to file using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out switch. -\end_layout - -\begin_layout Subsection -How to get the length of record values? -\begin_inset LatexCommand label -name "sub:How-to-get-value_length" - -\end_inset - - -\end_layout - -\begin_layout Standard -Use the -\series bold -length_vals -\series default - biotool to get the length of each value for a comma separated list of keys: -\end_layout - -\begin_layout LyX-Code -... - | length_vals --keys=HIT,PATTERN -\end_layout - -\begin_layout Subsection -How to grab specific records? -\begin_inset LatexCommand label -name "sub:How-to-grab" - -\end_inset - - -\end_layout - -\begin_layout Standard -The biotool -\series bold -grab -\series default - is related to the Unix grep and locates records based on matching keys - and/or values using either a pattern, a Perl regex, or a numerical evaluation. - To easily -\series bold -grab -\series default - all records in the stream that has any mentioning of the pattern 'human' - just pipe the data stream through -\series bold -grab -\series default - like this: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human -\end_layout - -\begin_layout Standard -This will search for the pattern 'human' in all keys and all values. - The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern switch takes a comma separated list of patterns, so in order to - match multiple patterns do: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human,mouse -\end_layout - -\begin_layout Standard -It is also possible to use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in switch instead of -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern. - -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in is used to read a file with one pattern per line: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern_in=patterns.txt -\end_layout - -\begin_layout Standard -If you want the opposite result --- to find all records that does not match - the patterns, add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -invert switch, which not only works with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern switch, but also with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -regex and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -eval: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human --invert -\end_layout - -\begin_layout Standard -If you want to search the record keys only, -\emph on -e.g. - -\emph default - to find all records containing the key SEQ you can add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys_only switch. - This will prevent matching of SEQ in any record value, and in fact SEQ - is a not uncommon peptide sequence you could get an unwanted record. - Also, this will give an increase in speed since only the keys are searched: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=SEQ --keys_only -\end_layout - -\begin_layout Standard -However, if you are interested in finding the peptide sequence SEQ and not - the SEQ key, just add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -vals_only switch instead: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=SEQ --vals_only -\end_layout - -\begin_layout Standard -Also, if you want to grab for certain key/value pairs you can supply a comma - separated list of keys whos values will then be searched using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -keys switch. - This is handy if your records contain large genomic sequences and you dont - want to search the entire sequence for -\emph on -e.g. - -\emph default - the organism name --- it is much faster to tell -\series bold -grab -\series default - which keys to search the value for: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern=human --keys=SEQ_NAME -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout Standard -It is also possible to invoke flexible matching using regex (regular expressions -) instead of simple pattern matching. - In -\series bold -grab -\series default - the regex engine is Perl based and allows use of different type of wild - cards, alternatives, -\emph on -etc -\emph default - -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://perldoc.perl.org/perlreref.html" - -\end_inset - - -\end_layout - -\end_inset - -. - If you want to -\series bold -grab -\series default - records withs the sequence ATCG or GCTA you can do this: -\end_layout - -\begin_layout LyX-Code -... - | grab --regex='ATCG|GCTA' -\end_layout - -\begin_layout Standard -Or if you want to find sequences beginning with ATCG: -\end_layout - -\begin_layout LyX-Code -... - | grab --regex='^ATCG' -\end_layout - -\begin_layout Standard -You can also use -\series bold -grab -\series default - to locate records that fulfill a numerical property using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -eval switch witch takes an expression in three parts. - The first part is the key that holds the value we want to evaluate, the - second part holds one the six operators: -\end_layout - -\begin_layout Enumerate -Greater than: > -\end_layout - -\begin_layout Enumerate -Greater than or equal to: >= -\end_layout - -\begin_layout Enumerate -Less than: < -\end_layout - -\begin_layout Enumerate -Less than or equal to: <= -\end_layout - -\begin_layout Enumerate -Equal to: = -\end_layout - -\begin_layout Enumerate -Not equal to: != -\end_layout - -\begin_layout Enumerate -String wise equal to: eq -\end_layout - -\begin_layout Enumerate -String wise not equal to: ne -\end_layout - -\begin_layout Standard -And finally comes the number used in the evaluation. - So to -\series bold -grab -\series default - all records with a sequence length greater than 30: -\end_layout - -\begin_layout LyX-Code -... - length_seq | grab --eval='SEQ_LEN > 30' -\end_layout - -\begin_layout Standard -If you want to locate all records containing the pattern 'human' and where - the sequence length is greater that 30, you do this by running the stream - through -\series bold -grab -\series default - twice: -\end_layout - -\begin_layout LyX-Code -... - | grab --pattern='human' | length_seq | grab --eval='SEQ_LEN > 30' -\end_layout - -\begin_layout Standard -Finally, it is possible to do fast matching of expressions from a file using - the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -exact switch. - Each of these expressions has to be matched exactly over the entrie length, - which if useful if you have a file with accession numbers, that you want - to locate in the stream: -\end_layout - -\begin_layout LyX-Code -... - | grab --exact acc_no.txt | ... -\end_layout - -\begin_layout Standard -Using -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -exact is much faster than using -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in, because with -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -exact the expression has to be complete matches, where -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in looks for subpatterns. -\end_layout - -\begin_layout Standard -NB! To get the best speed performance, use the most restrictive -\series bold -grab -\series default - first. -\end_layout - -\begin_layout Subsection -How to remove keys from records? -\end_layout - -\begin_layout Standard -To remove one or more specific keys from all records in the data stream - use -\series bold -remove_keys -\series default - like this: -\end_layout - -\begin_layout LyX-Code -... - | remove_keys --keys=SEQ,SEQ_NAME -\end_layout - -\begin_layout Standard -In the above example SEQ and SEQ_NAME will be removed from all records if - they exists in these. - If all keys are removed from a record, then the record will be removed. -\end_layout - -\begin_layout Subsection -How to rename keys in records? -\end_layout - -\begin_layout Standard -Sometimes you want to rename a record key, -\emph on -e.g. - -\emph default - if you have read in a two column table with sequence name and sequence - in each column (see -\begin_inset LatexCommand ref -reference "sub:How-to-read-table" - -\end_inset - -) without specifying the key names, then the sequence name will be called - V0 and the sequence V1 as default in the -\series bold -read_tab -\series default - biotool. - To rename the V0 and V1 keys we need to run the stream through -\series bold -rename_keys -\series default - twice (one for each key to rename): -\end_layout - -\begin_layout LyX-Code -... - | rename_keys --keys=V0,SEQ_NAME | rename_keys --keys=V1,SEQ -\end_layout - -\begin_layout Standard -The first instance of -\series bold -rename_keys -\series default - replaces all the V0 keys with SEQ_NAME, and the second instance of -\series bold -rename_keys -\series default - replaces all the V1 keys with SEQ. - -\emph on -Et viola -\emph default - the data can now be used in the biotools that requires these keys. -\end_layout - -\begin_layout Section -Manipulating Sequences -\end_layout - -\begin_layout Subsection -How to get sequence lengths? -\end_layout - -\begin_layout Standard -The length for sequences in records can be determined with -\series bold -length_seq -\series default -, which adds the key SEQ_LEN to each record with the sequence length as - the value. - It also generates an extra record that is emitted last with the key TOTAL_SEQ_L -EN showing the total length of all the sequences. -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_seq -\end_layout - -\begin_layout Standard -It is also possible to determine the sequence length using the generic tool - -\series bold -length_vals -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-get-value_length" - -\end_inset - -, which determines the length of the values for a given list of keys: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_vals --keys=SEQ -\end_layout - -\begin_layout Standard -To obtain the total length of all sequences use -\series bold -sum_vals -\series default - like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_vals --keys=SEQ -\end_layout - -\begin_layout LyX-Code -| sum_vals --keys=SEQ_LEN -\end_layout - -\begin_layout Standard -The biotool -\series bold -analyze_seq -\series default - will also determine the length of each sequence (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-analyze" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to analyze sequence composition? -\begin_inset LatexCommand label -name "sub:How-to-analyze" - -\end_inset - - -\end_layout - -\begin_layout Standard -If you want to find out the sequence type, composition, length, as well - as GC content, indel content and proportions of soft and hard masked sequence, - then use -\series bold -analyze_seq -\series default -. - This handy biotool will determine all these things per sequence from which - it is easy to get an overview using the -\series bold -write_tab -\series default - biotool to output a table (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-tab" - -\end_inset - -). - So in order to determine the sequence composition of a FASTA file with - just one entry containing the sequence 'ATCG' we just read the data with - -\series bold -read_fasta -\series default - and run the output through -\series bold -analyze_seq -\series default - which will add the analysis to the record like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq ... -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:D: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -MIX_INDEX: 0.55 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:W: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:G: 16 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SOFT_MASK%: 63.75 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:B: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:V: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -HARD_MASK%: 0.00 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:H: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:S: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:N: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:.: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -GC%: 35.00 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:A: 8 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:Y: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:M: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:T: 44 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SEQ_TYPE: DNA -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:K: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:~: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SEQ: TTTCAGTTTGGGACGGAGTAAGGCCTTCCtttttttttttttttttttttttttttttgagaccgagtcttgctc -tgtcg -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -SEQ_LEN: -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -80 RES:R: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:C: 12 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:-: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize -RES:U: 0 -\end_layout - -\begin_layout LyX-Code - -\size scriptsize ---- -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout Standard -Now to make a table of how may As, Ts, Cs, and Gs you can add the following: -\end_layout - -\begin_layout LyX-Code -... - | analyze_seq | write_tab --keys=RES:A,RES:T,RES:C,RES:G -\end_layout - -\begin_layout Standard -Or if you want to see the proportions of hard and soft masked sequence: -\end_layout - -\begin_layout LyX-Code -... - | analyse_seq | write_tab --keys=HARD_MASK%,SOFT_MASK% -\end_layout - -\begin_layout Standard -If you have a stack of sequences in one file and you want to determine the - mean GC content you can do it using the -\series bold -mean_vals -\series default - biotool: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq | mean_vals --keys=GC% -\end_layout - -\begin_layout Standard -Or if you want the total count of Ns you can use -\series bold -sum_vals -\series default - like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq | sum_vals --keys=RES:N -\end_layout - -\begin_layout Standard -The MIX_INDEX key is calculated as the count of the most common residue - over the sequence length, and can be used as a cut-off for removing sequence - tags consisting of mostly one nucleotide: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | analyze_seq | grab --eval='MIX_INDEX<0.85' -\end_layout - -\begin_layout Subsection -How to extract subsequences? -\begin_inset LatexCommand label -name "sub:How-to-extract" - -\end_inset - - -\end_layout - -\begin_layout Standard -In order to extract a subsequence from a longer sequence use the biotool - extract_seq, which will replace the sequence in the record with the subsequence - (this behaviour should probably be modified to be dependant of a -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -replace or a -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_replace switch -\begin_inset Note Note -status collapsed - -\begin_layout Standard -also in split_seq -\end_layout - -\end_inset - -). - So to extract the first 20 residues from all sequences do (first residue - is designated 1): -\end_layout - -\begin_layout LyX-Code -... - | extract_seq --beg=1 --len=20 -\end_layout - -\begin_layout Standard -You can also specify a begin and end coordinate set: -\end_layout - -\begin_layout LyX-Code -... - | extract_seq --beg=20 --end=40 -\end_layout - -\begin_layout Standard -If you want the subsequences from position 20 to the sequence end do: -\end_layout - -\begin_layout LyX-Code -... - | extract_seq --beg=20 -\end_layout - -\begin_layout Standard -If you want to extract subsequences a given distance from the sequence end - you can do this by reversing the sequence with the biotool -\series bold -reverse_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-reverse-seq" - -\end_inset - -, followed by -\series bold -extract_seq -\series default - to get the subsequence, and then -\series bold -reverse_seq -\series default - again to get the subsequence back in the original orientation: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in=test.fna | reverse_seq -\end_layout - -\begin_layout LyX-Code -| extract_seq --beg=10 --len=10 | reverse_seq -\end_layout - -\begin_layout Subsection -How to get genomic sequence? -\begin_inset LatexCommand label -name "sub:How-to-get-genomic-sequence" - -\end_inset - - -\end_layout - -\begin_layout Standard -The biotool -\series bold -get_genomic_seq -\series default - can extract subsequences for a given genome specified with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -genome switch explicitly using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -beg and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -end/ -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -len switches: -\end_layout - -\begin_layout LyX-Code -get_genome_seq --genome= --beg=1 --len=100 -\end_layout - -\begin_layout Standard -Alternatively, -\series bold -get_genome_seq -\series default - can be used to append the corresponding sequence to BED, PSL, and BLAST - records: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= | get_genome_seq --genome= -\end_layout - -\begin_layout Standard -It is also possible to include flaking sequence using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -flank switch. - So to include 50 nucleotides upstream and 50 nucleotides downstream for - each BED entry do: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= | get_genome_seq --genome= --flank=50 -\end_layout - -\begin_layout Subsection -How to upper-case sequences? -\end_layout - -\begin_layout Standard -Sequences can be shifted from lower case to upper case using -\series bold -uppercase_seq -\series default -: -\end_layout - -\begin_layout LyX-Code -... - | uppercase_seq -\end_layout - -\begin_layout Subsection -How to reverse sequences? -\begin_inset LatexCommand label -name "sub:How-to-reverse-seq" - -\end_inset - - -\end_layout - -\begin_layout Standard -The order of residues in a sequence can be reversed using reverse_seq: -\end_layout - -\begin_layout LyX-Code -... - | reverse_seq -\end_layout - -\begin_layout Standard -Note that in order to reverse/complement a sequence you also need the -\series bold -complement_seq -\series default - biotool (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-complement" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to complement sequences? -\begin_inset LatexCommand label -name "sub:How-to-complement" - -\end_inset - - -\end_layout - -\begin_layout Standard -DNA and RNA sequences can be complemented with -\series bold -complement_seq -\series default -, which automagically determines the sequence type: -\end_layout - -\begin_layout LyX-Code -... - | complement_seq -\end_layout - -\begin_layout Standard -Note that in order to reverse/complement a sequence you also need the -\series bold -reverse_seq -\series default - biotool (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-reverse-seq" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to remove indels from sequnces? -\end_layout - -\begin_layout Standard -Indels can be removed from sequences with the -\series bold -remove_indels -\series default - biotool. - This is useful if you have aligned some sequences (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-align" - -\end_inset - -) and extracted (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-extract" - -\end_inset - -) a block of subsequences from the alignment and you want to use these sequence - in a search where you need to remove the indels first. - '-', '~', and '.' are considered indels: -\end_layout - -\begin_layout LyX-Code -... - | remove_indels -\end_layout - -\begin_layout Subsection -How to shuffle sequences? -\end_layout - -\begin_layout Standard -All residues in sequences in the stream can be shuffled to random positions - using the -\series bold -shuffle_seq -\series default - biotool: -\end_layout - -\begin_layout LyX-Code -... - | shuffle_seq -\end_layout - -\begin_layout Subsection -How to split sequences into overlapping subsequences? -\end_layout - -\begin_layout Standard -Sequences can be slit into overlapping subsequences with the -\series bold -split_seq -\series default - biotool. -\end_layout - -\begin_layout LyX-Code -... - | split_seq --word_size=20 --uniq -\end_layout - -\begin_layout Subsection -How to determine the oligo frequency? -\end_layout - -\begin_layout Standard -In order to determine if any oligo usage is over represented in one or more - sequences you can determine the frequency of oligos of a given size with - -\series bold -oligo_freq -\series default -: -\end_layout - -\begin_layout LyX-Code -... - | oligo_freq --word_size=4 -\end_layout - -\begin_layout Standard -And if you have more than one sequence and want to accumulate the frequences - you need the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -all switch: -\end_layout - -\begin_layout LyX-Code -... - | oligo_freq --word_size=4 --all -\end_layout - -\begin_layout Standard -To get a meaningful result you need to write the resulting frequencies as - a table with -\series bold -write_tab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-tab" - -\end_inset - -), but first it is important to -\series bold -grab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-grab" - -\end_inset - -) the records with the frequencies to avoid full length sequences in the - table: -\end_layout - -\begin_layout LyX-Code -... - | oligo_freq --word_size=4 --all | grab --pattern=OLIGO --keys_only -\end_layout - -\begin_layout LyX-Code -| write_tab --no_stream -\end_layout - -\begin_layout Standard -And the resulting frequency table can be sorted with Unix sort (man sort). -\end_layout - -\begin_layout Subsection -How to search for sequences in genomes? -\end_layout - -\begin_layout Standard -See the following biotool: -\end_layout - -\begin_layout Itemize - -\series bold -patscan_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-patscan" - -\end_inset - - -\end_layout - -\begin_layout Itemize - -\series bold -blat_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAT" - -\end_inset - - -\end_layout - -\begin_layout Itemize - -\series bold -blast_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAST" - -\end_inset - - -\end_layout - -\begin_layout Itemize - -\series bold -vmatch_seq -\series default - -\begin_inset LatexCommand eqref -reference "sub:How-to-use-Vmatch" - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to search sequences for a pattern? -\begin_inset LatexCommand label -name "sub:How-to-use-patscan" - -\end_inset - - -\end_layout - -\begin_layout Standard -It is possible to search sequences in the data stream for patterns using - the -\series bold -patscan_seq -\series default - biotool which utilizes the powerful scan_for_matches engine. - Consult the documentation for scan_for_matches in order to learn how to - define patterns (the documentation is included in Appendix\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sec:scan_for_matches-README" - -\end_inset - -). -\end_layout - -\begin_layout Standard -To search all sequences for a simple pattern consisting of the sequence - ATCGATCG allowing for 3 mismatches, 2 insertions and 1 deletion: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | patscan_seq --pattern='ATCGATCG[3,2,1]' -\end_layout - -\begin_layout Standard -The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern switch takes a comma seperated list of patterns, so if you want - to search for more that one pattern do: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern='ATCGATCG[3,2,1],GCTAGCTA[3,2,1]' -\end_layout - -\begin_layout Standard -It is also possible to have a list of different patterns to search for in - a file with one pattern per line. - In order to get -\series bold -patscan_seq -\series default - to read these patterns use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -pattern_in switch: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern_in= -\end_layout - -\begin_layout Standard -To also scan the complementary strand in nucleotide sequences ( -\series bold -patscan_seq -\series default - automagically determines the sequence type) you need to add the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -comp switch: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern= --comp -\end_layout - -\begin_layout Standard -It is also possible to use -\series bold -patscan_seq -\series default - to output those records that does not contain a certain pattern by using - the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -invert switch: -\end_layout - -\begin_layout LyX-Code -... - | patscan_seq --pattern= --invert -\end_layout - -\begin_layout Standard -Finally, -\series bold -patscan_seq -\series default - can also scan for patterns in a given genome sequence, instead of sequences - in the stream, using the -\begin_inset ERT -status open - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -genome switch: -\end_layout - -\begin_layout LyX-Code -patscan --pattern= --genome= -\end_layout - -\begin_layout Subsection -How to use BLAT for sequence search? -\begin_inset LatexCommand label -name "sub:How-to-use-BLAT" - -\end_inset - - -\end_layout - -\begin_layout Standard -Sequences in the data stream can be matched against supported genomes using - -\series bold -blat_seq -\series default - which is a biotool using BLAT as the name might suggest. - Currently only Mouse and Human genomes are available and it is not possible - to use OOC files since there is still a need for a local repository for - genome files. - Otherwise it is just: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | blat_seq --genome= -\end_layout - -\begin_layout Standard -The search results can then be written to file with -\series bold -write_psl -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-PSL" - -\end_inset - -) or -\series bold -write_bed -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-BED" - -\end_inset - -) allthough with -\series bold -write_bed -\series default - some information will be lost). - It is also possible to plot chromosome distribution of the search results - using -\series bold -plot_chrdist -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-plot-chrdist" - -\end_inset - -) or the distribution of the match lengths using -\series bold -plot_lendist -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-plot-lendist" - -\end_inset - -) or a karyogram with the hits using -\series bold -plot_karyogram -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-plot-karyogram" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to use BLAST for sequence search? -\begin_inset LatexCommand label -name "sub:How-to-use-BLAST" - -\end_inset - - -\end_layout - -\begin_layout Standard -Two biotools exist for blasting sequences: -\series bold -create_blast_db -\series default - is used to create the BLAST database required for BLAST which is queried - using the biotool -\series bold -blast_seq -\series default -. - So in order to create a BLAST database from sequences in the data stream - you simple run: -\end_layout - -\begin_layout LyX-Code -... - | create_blast_db --database=my_database --no_stream -\end_layout - -\begin_layout Standard -The type of sequence to use for the database is automagically determined - by -\series bold -create_blast_db -\series default -, but don't have a mixture of peptide and nucleic acids sequences in the - stream. - The -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -database switch takes a path as argument, but will default to 'blastdb_ if not set. -\end_layout - -\begin_layout Standard -The resulting database can now be queried with sequences in another data - stream using -\series bold -blast_seq -\series default -: -\end_layout - -\begin_layout LyX-Code -... - | blast_seq --database=my_database -\end_layout - -\begin_layout Standard -Again, the sequence type is determined automagically and the appropriate - BLAST program is guessed (see below table), however, the program name can - be overruled with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -program switch. -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Tabular - - - - - - - -\begin_inset Text - -\begin_layout Standard -Subject sequence -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Query sequence -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Program guess -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -blastn -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -blastp -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -blastx -\end_layout - -\end_inset - - - - -\begin_inset Text - -\begin_layout Standard -Nucleotide -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -Protein -\end_layout - -\end_inset - - -\begin_inset Text - -\begin_layout Standard -tblastn -\end_layout - -\end_inset - - - - -\end_inset - - -\end_layout - -\begin_layout Standard -Finally, it is also possible to use -\series bold -blast_seq -\series default - for blasting sequences agains a preformatted genome using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -genome switch instead of the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -database switch: -\end_layout - -\begin_layout LyX-Code -... - | blast_seq --genome= -\end_layout - -\begin_layout Subsection -How to use Vmatch for sequence search? -\begin_inset LatexCommand label -name "sub:How-to-use-Vmatch" - -\end_inset - - -\end_layout - -\begin_layout Standard -The powerful suffix array software package Vmatch -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://www.vmatch.de/" - -\end_inset - - -\end_layout - -\end_inset - - can be used for exact mapping of sequences against indexed genomes using - the biotool -\series bold -vmatch_seq -\series default -, which will e.g. - map 700000 ESTs to the human genome locating all 160 mio hits in less than - an hour. - Only nucleotide sequences and sequences longer than 11 nucleotides will - be mapped. - It is recommended that sequences consisting of mostly one nucleotide type - are removed. - This can be done with the -\series bold -analyze_seq -\series default - biotool -\begin_inset LatexCommand eqref -reference "sub:How-to-analyze" - -\end_inset - -. -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= -\end_layout - -\begin_layout Standard -It is also possible to allow for mismatches using the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -hamming_dist switch. - So to allow for 2 mismatches: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=2 -\end_layout - -\begin_layout Standard -Or to allow for 10% mismathing nucleotides: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=10p -\end_layout - -\begin_layout Standard -To allow both indels and mismatches use the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -edit_dist switch. - So to allow for one mismatch or one indel: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=1 -\end_layout - -\begin_layout Standard -Or to allow for 5% indels or mismatches: -\end_layout - -\begin_layout LyX-Code -... - | vmatch_seq --genome= --hamming_dist=5p -\end_layout - -\begin_layout Standard -Note that using -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -hamming_dist or -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -edit_dist greatly slows down vmatch considerably --- use with care. -\end_layout - -\begin_layout Standard -The resulting SCORE key can be replaced to hold the number of genome matches - of a given sequence (multi-mappers) is the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -count switch is given. -\end_layout - -\begin_layout Subsection -How to find all matches between sequences? -\begin_inset LatexCommand label -name "sub:How-to-find-matches" - -\end_inset - - -\end_layout - -\begin_layout Standard -All matches between two sequences can be determined with the biotool -\series bold -match_seq -\series default -. - The match finding engine underneath the hood of -\series bold -match_seq -\series default - is the super fast suffix tree program MUMmer -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://mummer.sourceforge.net/" - -\end_inset - - -\end_layout - -\end_inset - -, which will locate all forward and reverse matches between huge sequences - in a matter of minutes (if the repeat count is not too high and if the - word size used is appropriate). - Matching two -\emph on -Helicobacter pylori -\emph default - genomes (1.7Mbp) takes around 10 seconds: -\end_layout - -\begin_layout LyX-Code -... - | match_seq --word_size=20 --direction=both -\end_layout - -\begin_layout Standard -The output from -\series bold -match_seq -\series default - can be used to generate a dot plot with -\series bold -plot_matches -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-generate-dotplot" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to align sequences? -\begin_inset LatexCommand label -name "sub:How-to-align" - -\end_inset - - -\end_layout - -\begin_layout Standard -Sequences in the stream can be aligned with the -\series bold -align_seq -\series default - biotool that uses Muscle -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://www.drive5.com/muscle/muscle.html" - -\end_inset - - -\end_layout - -\end_inset - - as aligment engine. - Currently you cannot change any of the Muscle alignment parameters and - -\series bold -align_seq -\series default - will create an alignment based on the defaults (which are really good!): -\end_layout - -\begin_layout LyX-Code -... - | align_seq -\end_layout - -\begin_layout Standard -The aligned output can be written to file in FASTA format using -\series bold -write_fasta -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-fasta" - -\end_inset - -) or in pretty text using -\series bold -write_align -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-alignment" - -\end_inset - -). -\end_layout - -\begin_layout Subsection -How to create a weight matrix? -\end_layout - -\begin_layout Standard -If you want a weight matrix to show the sequence composition of a stack - of sequences you can use the biotool create_weight_matrix: -\end_layout - -\begin_layout LyX-Code -... - | create_weight_matrix -\end_layout - -\begin_layout Standard -The result can be output in percent using the -\begin_inset ERT -status open - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -percent switch: -\end_layout - -\begin_layout LyX-Code -... - | create_weight_matrix --percent -\end_layout - -\begin_layout Standard -The weight matrix can be written as tabular output with -\series bold -write_tab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-tab" - -\end_inset - -) after removeing the records containing SEQ with -\series bold -grab -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-grab" - -\end_inset - -): -\end_layout - -\begin_layout LyX-Code -... - | create_weight_matrix | grab --invert --keys=SEQ --keys_only -\end_layout - -\begin_layout LyX-Code -| write_tab --no_stream -\end_layout - -\begin_layout Standard -The V0 column will hold the residue, while the rest of the columns will - hold the frequencies for each sequence position. -\end_layout - -\begin_layout Section -Plotting -\end_layout - -\begin_layout Standard -There exists several biotools for plotting. - Some of these are based on GNUplot -\begin_inset Foot -status open - -\begin_layout Standard -\begin_inset LatexCommand url -target "http://www.gnuplot.info/" - -\end_inset - - -\end_layout - -\end_inset - -, which is an extremely powerful platform to generate all sorts of plots - and even though GNUplot has quite a steep learning curve, the biotools - utilizing GNUplot are simple to use. - GNUplot is able to output a lot of different formats (called terminals - in GNUplot), but the biotools focusses on three formats only: -\end_layout - -\begin_layout Enumerate -The 'dumb' terminal is default to the GNUplot based biotools and will output - a plot in crude ASCII text (Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Dumb-terminal" - -\end_inset - -). - This is quite nice for a quick and dirty plot to get an overview of your - data . -\end_layout - -\begin_layout Enumerate -The 'post' or 'postscript' terminal output postscript code which is publication - grade graphics that can be viewed with applications such as Ghostview, - Photoshop, and Preview. -\end_layout - -\begin_layout Enumerate -The 'svg' terminal output's scalable vector graphics (SVG) which is a vector - based format. - SVG is great because you can edit the resulting plot using Photoshop or - Inkscape -\begin_inset Foot -status collapsed - -\begin_layout Standard -Inkscape is a really handy drawing program that is free and open source. - Availble at -\begin_inset LatexCommand htmlurl -target "http://www.inkscape.org" - -\end_inset - - -\end_layout - -\end_inset - - if you want to add additional labels, captions, arrows, and so on and then - save the result in different formats, such as postscript without loosing - resolution. -\end_layout - -\begin_layout Standard -The biotools for plotting that are not based on GNUplot only output SVG - (that may change in the future). -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename lendist_ascii.png - lyxscale 70 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Dumb-terminal" - -\end_inset - -Dumb terminal -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -The output of a length distribution plot in the default 'dumb terminal' - to the terminal window. - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a histogram? -\begin_inset LatexCommand label -name "How-to-plot-histogram" - -\end_inset - - -\end_layout - -\begin_layout Standard -A generic histogram for a given value can be plotted with the biotool -\series bold -plot_histogram -\series default - (Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Histogram" - -\end_inset - -): -\end_layout - -\begin_layout LyX-Code -... - | plot_histogram --key=TISSUE --no_stream -\end_layout - -\begin_layout Standard -(Figure missing) -\end_layout - -\begin_layout Standard -\noindent -\align left -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename histogram.png - lyxscale 70 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Histogram" - -\end_inset - -Histogram -\end_layout - -\end_inset - - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a length distribution? -\begin_inset LatexCommand label -name "sub:How-to-plot-lendist" - -\end_inset - - -\end_layout - -\begin_layout Standard -Plotting of length distributions, weather sequence lengths, patterns lengths, - hit lengths, -\emph on -etc. - -\emph default - is a really handy thing and can be done with the the biotool -\series bold -plot_lendist -\series default -. - If you have a file with FASTA entries and want to plot the length distribution - you do it like this: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | length_seq -\end_layout - -\begin_layout LyX-Code -| plot_lendist --key=SEQ_LEN --no_stream -\end_layout - -\begin_layout Standard -The result will be written to the default dumb terminal and will look like - Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Dumb-terminal" - -\end_inset - -. -\end_layout - -\begin_layout Standard -If you instead want the result in postscript format you can do: -\end_layout - -\begin_layout LyX-Code -... - | plot_lendist --key=SEQ_LEN --terminal=post --result_out=file.ps -\end_layout - -\begin_layout Standard -That will generate the plot and save it to file, but not interrupt the data - stream which can then be used in further analysis. - You can also save the plot implicetly using '>', however, it is then important - to terminate the stream with the -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream switch: -\end_layout - -\begin_layout LyX-Code -... - | plot_lendist --key=SEQ_LEN --terminal=post --no_stream > file.ps -\end_layout - -\begin_layout Standard -The resulting plot can be seen in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Length-distribution" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard - -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename lendist.ps - lyxscale 50 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Length-distribution" - -\end_inset - -Length distribution -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -Length distribution of 630 piRNA like RNAs. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a chromosome distribution? -\begin_inset LatexCommand label -name "sub:How-to-plot-chrdist" - -\end_inset - - -\end_layout - -\begin_layout Standard -If you have the result of a sequence search against a multi chromosome genome, - it is very practical to be able to plot the distribution of search hits - on the different chromosomes. - This can be done with -\series bold -plot_chrdist -\series default -: -\end_layout - -\begin_layout LyX-Code -read_fasta --data_in= | blat_genome | plot_chrdist --no_stream -\end_layout - -\begin_layout Standard -The above example will result in a crude plot using the 'dumb' terminal, - and if you want to mess around with the results from the BLAT search you - probably want to save the result to file first (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-write-PSL" - -\end_inset - -). - To plot the chromosome distribution from the saved search result you can - do: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in=file.bed | plot_chrdist --terminal=post --result_out=plot.ps -\end_layout - -\begin_layout Standard -That will result in the output show in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Chromosome-distribution" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard - -\end_layout - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename chrdist.ps - lyxscale 50 - width 12cm - rotateAngle 90 - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Chromosome-distribution" - -\end_inset - -Chromosome distribution -\end_layout - -\end_inset - - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to generate a dotplot? -\begin_inset LatexCommand label -name "sub:How-to-generate-dotplot" - -\end_inset - - -\end_layout - -\begin_layout Standard -A dotplot is a powerful way to get an overview of the size and location - of sequence insertions, deletions, and duplications between two sequences. - Generating a dotplot with biotools is a two step process where you initially - find all matches between two sequences using the tool -\series bold -match_seq -\series default - (see\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "sub:How-to-find-matches" - -\end_inset - -) and plot the resulting matches with -\series bold -plot_matches -\series default -. - Matching and plotting two -\emph on -Helicobacter pylori -\emph default - genomes (1.7Mbp) takes around 10 seconds: -\end_layout - -\begin_layout LyX-Code -... - | match_seq | plot_matches --terminal=post --result_out=plot.ps -\end_layout - -\begin_layout Standard -The resulting dotplot is in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Dotplot" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename dotplot.ps - lyxscale 50 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Dotplot" - -\end_inset - -Dotplot -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -Forward matches are displayed in green while reverse matches are displayed - in red. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a sequence logo? -\end_layout - -\begin_layout Standard -Sequence logos can be generate with -\series bold -plot_seqlogo -\series default -. - The sequnce type is determined automagically and an entropy scale of 2 - bits and 4 bits is used for nucleotide and peptide sequences, respectively -\begin_inset Foot -status collapsed - -\begin_layout Standard -\begin_inset LatexCommand htmlurl -target "http://www.ccrnp.ncifcrf.gov/~toms/paper/hawaii/latex/node5.html" - -\end_inset - - -\end_layout - -\end_inset - -. -\end_layout - -\begin_layout LyX-Code -... - | plot_seqlogo --no_stream --result_out=seqlogo.svg -\end_layout - -\begin_layout Standard -An example of a sequence logo can be seen in Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Sequence-logo" - -\end_inset - -. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename seqlogo.png - lyxscale 50 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Sequence-logo" - -\end_inset - -Sequence logo -\end_layout - -\end_inset - - -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Subsection -How to plot a karyogram? -\begin_inset LatexCommand label -name "sub:How-to-plot-karyogram" - -\end_inset - - -\end_layout - -\begin_layout Standard -To plot search hits on genomes use -\series bold -plot_karyogram -\series default -, which will output a nice karyogram in SVG graphics: -\end_layout - -\begin_layout LyX-Code -... - | plot_karyogram --result_out=karyogram.svg -\end_layout - -\begin_layout Standard -The banding data is taken from the UCSC genome browser database and currently - only Human and Mouse is supported. - Fig.\InsetSpace ~ - -\begin_inset LatexCommand ref -reference "fig:Karyogram" - -\end_inset - - shows the distribution of piRNA like RNAs matched to the Human genome. -\end_layout - -\begin_layout Standard -\begin_inset Float figure -wide false -sideways false -status open - -\begin_layout Standard -\noindent -\align center -\begin_inset Graphics - filename karyogram.png - lyxscale 35 - width 12cm - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset Caption - -\begin_layout Standard -\begin_inset LatexCommand label -name "fig:Karyogram" - -\end_inset - -Karyogram -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Quote -Hits from a search of piRNA like RNAs in the Human genome is displayed as - short horizontal bars. -\end_layout - -\end_inset - - -\end_layout - -\begin_layout Section -Uploading Results -\end_layout - -\begin_layout Subsection -How do I display my results in the UCSC Genome Browser? -\end_layout - -\begin_layout Standard -Results from the list of biotools below can be uploaded directly to a local - mirror of the UCSC Genome Browser using the biotool -\series bold -upload_to_ucsc -\series default -: -\end_layout - -\begin_layout Itemize -patscan_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-patscan" - -\end_inset - - -\end_layout - -\begin_layout Itemize -blat_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAT" - -\end_inset - - -\end_layout - -\begin_layout Itemize -blast_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-BLAST" - -\end_inset - - -\end_layout - -\begin_layout Itemize -vmatch_seq -\begin_inset LatexCommand eqref -reference "sub:How-to-use-Vmatch" - -\end_inset - - -\end_layout - -\begin_layout Standard -The syntax for uploading data the most simple way requires two mandatory - switches: -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -database, which is the UCSC database name (such as hg18, mm9, etc.) and -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -table which should be the users initials followed by an underscore and a - short description of the data: -\end_layout - -\begin_layout LyX-Code -... - | upload_to_ucsc --database=hg18 --table=mah_snoRNAs -\end_layout - -\begin_layout Standard -The -\series bold -upload_to_ucsc -\series default - biotool modifies the users ~/ucsc/my_tracks.ra file automagically (a backup - is created with the name ~/ucsc/my_tracks.ra~) with default values that - can be overridden using the following switches: -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -short_label - Short label for track - Default=database->table -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -long_label - Long label for track - Default=database->table -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -group - Track group name - Default= -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -priority - Track display priority - Default=1 -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -color - Track color - Default=147,73,42 -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -chunk_size - Chunks for loading - Default=10000000 -\end_layout - -\begin_layout Itemize -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -visibility - Track visibility - Default=pack -\end_layout - -\begin_layout Standard -Also, data in BED or PSL format can be uploaded with -\series bold -upload_to_ucsc -\series default - as long as these reference to genomes and chromosomes existing in the UCSC - Genome Browser: -\end_layout - -\begin_layout LyX-Code -read_bed --data_in= | upload_to_ucsc ... -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code -read_psl --data_in= | upload_to_ucsc ... -\end_layout - -\begin_layout Section -Power Scripting -\end_layout - -\begin_layout Standard -It is possible to do commandline scripting of biotool records using Perl. - Because a biotool record essentially is a hash structure, you can pass - records to -\series bold -bioscript -\series default - command, which is a wrapper around the Perl executable that allows direct - manipulations of the records using the power of Perl. -\end_layout - -\begin_layout Standard -In the below example we replace in all records the value to the CHR key - with a forthrunning number: -\end_layout - -\begin_layout LyX-Code -... - | bioscript 'while($r=get_record( -\backslash -*STDIN)){$r->{CHR}=$i++; put_record($r)}' -\end_layout - -\begin_layout Standard -Something more useful would probably be to create custom FASTA headers. - E.g. - if we read in a BED file, lookup the genomic sequence, create a custom - FASTA header with -\series bold -bioscript -\series default - and output FASTA entries: -\end_layout - -\begin_layout LyX-Code -... - | bioscript 'while($r=get_record( -\backslash -*STDIN)){$r->{SEQ_NAME}= // -\end_layout - -\begin_layout LyX-Code -join("_",$r->{CHR},$r->{CHR_BEG},$r->{CHR_END}); put_record($r)}' -\end_layout - -\begin_layout Standard -And the output: -\end_layout - -\begin_layout LyX-Code ->chr2L_21567527_21567550 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_693380_693403 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_13859534_13859557 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_9005090_9005113 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_2106825_2106848 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout LyX-Code ->chr2L_14649031_14649054 -\end_layout - -\begin_layout LyX-Code -taccaaacggatgcctcagacatc -\end_layout - -\begin_layout Section -Trouble shooting -\end_layout - -\begin_layout Standard -Shoot the messenger! -\end_layout - -\begin_layout Section -\start_of_appendix -Keys -\begin_inset LatexCommand label -name "sec:Keys" - -\end_inset - - -\end_layout - -\begin_layout Standard -HIT -\end_layout - -\begin_layout Standard -HIT_BEG -\end_layout - -\begin_layout Standard -HIT_END -\end_layout - -\begin_layout Standard -HIT_LEN -\end_layout - -\begin_layout Standard -HIT_NAME -\end_layout - -\begin_layout Standard -PATTERN -\end_layout - -\begin_layout Section -Switches -\begin_inset LatexCommand label -name "sec:Switches" - -\end_inset - - -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_in -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -stream_out -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -no_stream -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -data_in -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -result_out -\end_layout - -\begin_layout Standard -\begin_inset ERT -status collapsed - -\begin_layout Standard - -- -\backslash -/- -\end_layout - -\end_inset - -num -\end_layout - -\begin_layout Section -scan_for_matches README -\begin_inset LatexCommand label -name "sec:scan_for_matches-README" - -\end_inset - - -\end_layout - -\begin_layout LyX-Code - scan_for_matches: -\end_layout - -\begin_layout LyX-Code - A Program to Scan Nucleotide or Protein Sequences for Matching Patterns -\end_layout - -\begin_layout LyX-Code - Ross Overbeek -\end_layout - -\begin_layout LyX-Code - MCS -\end_layout - -\begin_layout LyX-Code - Argonne National Laboratory -\end_layout - -\begin_layout LyX-Code - Argonne, IL 60439 -\end_layout - -\begin_layout LyX-Code - USA -\end_layout - -\begin_layout LyX-Code -Scan_for_matches is a utility that we have written to search for -\end_layout - -\begin_layout LyX-Code -patterns in DNA and protein sequences. - I wrote most of the code, -\end_layout - -\begin_layout LyX-Code -although David Joerg and Morgan Price wrote sections of an -\end_layout - -\begin_layout LyX-Code -earlier version. - The whole notion of pattern matching has a rich -\end_layout - -\begin_layout LyX-Code -history, and we borrowed liberally from many sources. - However, it is -\end_layout - -\begin_layout LyX-Code -worth noting that we were strongly influenced by the elegant tools -\end_layout - -\begin_layout LyX-Code -developed and distributed by David Searls. - My intent is to make the -\end_layout - -\begin_layout LyX-Code -existing tool available to anyone in the research community that might -\end_layout - -\begin_layout LyX-Code -find it useful. - I will continue to try to fix bugs and make suggested -\end_layout - -\begin_layout LyX-Code -enhancements, at least until I feel that a superior tool exists. -\end_layout - -\begin_layout LyX-Code -Hence, I would appreciate it if all bug reports and suggestions are -\end_layout - -\begin_layout LyX-Code -directed to me at Overbeek@mcs.anl.gov. - -\end_layout - -\begin_layout LyX-Code -I will try to log all bug fixes and report them to users that send me -\end_layout - -\begin_layout LyX-Code -their email addresses. - I do not require that you give me your name -\end_layout - -\begin_layout LyX-Code -and address. - However, if you do give it to me, I will try to notify -\end_layout - -\begin_layout LyX-Code -you of serious problems as they are discovered. -\end_layout - -\begin_layout LyX-Code -Getting Started: -\end_layout - -\begin_layout LyX-Code - The distribution should contain at least the following programs: -\end_layout - -\begin_layout LyX-Code - README - This document -\end_layout - -\begin_layout LyX-Code - ggpunit.c - One of the two source files -\end_layout - -\begin_layout LyX-Code - scan_for_matches.c - The second source file -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - run_tests - A perl script to test things -\end_layout - -\begin_layout LyX-Code - show_hits - A handy perl script -\end_layout - -\begin_layout LyX-Code - test_dna_input - Test sequences for DNA -\end_layout - -\begin_layout LyX-Code - test_dna_patterns - Test patterns for DNA scan -\end_layout - -\begin_layout LyX-Code - test_output - Desired output from test -\end_layout - -\begin_layout LyX-Code - test_prot_input - Test protein sequences -\end_layout - -\begin_layout LyX-Code - test_prot_patterns - Test patterns for proteins -\end_layout - -\begin_layout LyX-Code - testit - a perl script used for test -\end_layout - -\begin_layout LyX-Code - Only the first three files are required. - The others are useful, -\end_layout - -\begin_layout LyX-Code - but only if you have Perl installed on your system. - If you do -\end_layout - -\begin_layout LyX-Code - have Perl, I suggest that you type -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - which perl -\end_layout - -\begin_layout LyX-Code - to find out where it installed. - On my system, I get the following -\end_layout - -\begin_layout LyX-Code - response: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - clone% which perl -\end_layout - -\begin_layout LyX-Code - /usr/local/bin/perl -\end_layout - -\begin_layout LyX-Code - indicating that Perl is installed in /usr/local/bin. - Anyway, once -\end_layout - -\begin_layout LyX-Code - you know where it is installed, edit the first line of files -\end_layout - -\begin_layout LyX-Code - testit -\end_layout - -\begin_layout LyX-Code - show_hits -\end_layout - -\begin_layout LyX-Code - replacing /usr/local/bin/perl with the appropriate location. - I -\end_layout - -\begin_layout LyX-Code - will assume that you can do this, although it is not critical (it -\end_layout - -\begin_layout LyX-Code - is needed only to test the installation and to use the "show_hits" -\end_layout - -\begin_layout LyX-Code - utility). - Perl is not required to actually install and run -\end_layout - -\begin_layout LyX-Code - scan_for_matches. - -\end_layout - -\begin_layout LyX-Code - If you do not have Perl, I suggest you get it and install it (it -\end_layout - -\begin_layout LyX-Code - is a wonderful utility). - Information about Perl and how to get it -\end_layout - -\begin_layout LyX-Code - can be found in the book "Programming Perl" by Larry Wall and -\end_layout - -\begin_layout LyX-Code - Randall L. - Schwartz, published by O'Reilly & Associates, Inc. -\end_layout - -\begin_layout LyX-Code - To get started, you will need to compile the program. - I do this -\end_layout - -\begin_layout LyX-Code - using -\end_layout - -\begin_layout LyX-Code - gcc -O -o scan_for_matches ggpunit.c scan_for_matches.c -\end_layout - -\begin_layout LyX-Code - If you do not use GNU C, use -\end_layout - -\begin_layout LyX-Code - cc -O -DCC -o scan_for_matches ggpunit.c scan_for_matches.c -\end_layout - -\begin_layout LyX-Code - which works on my Sun. - -\end_layout - -\begin_layout LyX-Code - Once you have compiled scan_for_matches, you can verify that it -\end_layout - -\begin_layout LyX-Code - works with -\end_layout - -\begin_layout LyX-Code - clone% run_tests tmp -\end_layout - -\begin_layout LyX-Code - clone% diff tmp test_output -\end_layout - -\begin_layout LyX-Code - You may get a few strange lines of the sort -\end_layout - -\begin_layout LyX-Code - clone% run_tests tmp -\end_layout - -\begin_layout LyX-Code - rm: tmp: No such file or directory -\end_layout - -\begin_layout LyX-Code - clone% diff tmp test_output -\end_layout - -\begin_layout LyX-Code - These should cause no concern. - However, if the "diff" shows that -\end_layout - -\begin_layout LyX-Code - tmp and test_output are different, contact me (you have a -\end_layout - -\begin_layout LyX-Code - problem). - -\end_layout - -\begin_layout LyX-Code - You should now be able to use scan_for_matches by following the -\end_layout - -\begin_layout LyX-Code - instructions given below (which is all the normal user should have -\end_layout - -\begin_layout LyX-Code - to understand, once things are installed properly). -\end_layout - -\begin_layout LyX-Code - ============================================================== -\end_layout - -\begin_layout LyX-Code -How to run scan_for_matches: -\end_layout - -\begin_layout LyX-Code - To run the program, you type need to create two files -\end_layout - -\begin_layout LyX-Code - 1. - the first file contains the pattern you wish to scan for; I'll -\end_layout - -\begin_layout LyX-Code - call this file pat_file in what follows (but any name is ok) -\end_layout - -\begin_layout LyX-Code - 2. - the second file contains a set of sequences to scan. - These -\end_layout - -\begin_layout LyX-Code - should be in "fasta format". - Just look at the contents of -\end_layout - -\begin_layout LyX-Code - test_dna_input to see examples of this format. - Basically, -\end_layout - -\begin_layout LyX-Code - each sequence begins with a line of the form -\end_layout - -\begin_layout LyX-Code - >sequence_id -\end_layout - -\begin_layout LyX-Code - and is followed by one or more lines containing the sequence. -\end_layout - -\begin_layout LyX-Code - Once these files have been created, you just use -\end_layout - -\begin_layout LyX-Code - scan_for_matches pat_file < input_file -\end_layout - -\begin_layout LyX-Code - to scan all of the input sequences for the given pattern. - As an -\end_layout - -\begin_layout LyX-Code - example, suppose that pat_file contains a single line of the form -\end_layout - -\begin_layout LyX-Code - p1=4...7 3...8 ~p1 -\end_layout - -\begin_layout LyX-Code - Then, -\end_layout - -\begin_layout LyX-Code - scan_for_matches pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - should produce two "hits". - When I run this on my machine, I get -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - >tst1:[6,27] -\end_layout - -\begin_layout LyX-Code - cguaacc ggttaacc gguuacg -\end_layout - -\begin_layout LyX-Code - >tst2:[6,27] -\end_layout - -\begin_layout LyX-Code - CGUAACC GGTTAACC GGUUACG -\end_layout - -\begin_layout LyX-Code - clone% -\end_layout - -\begin_layout LyX-Code -Simple Patterns Built by Matching Ranges and Reverse Complements -\end_layout - -\begin_layout LyX-Code - Let me first explain this simple pattern: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - p1=4...7 3...8 ~p1 -\end_layout - -\begin_layout LyX-Code - The pattern consists of three "pattern units" separated by spaces. -\end_layout - -\begin_layout LyX-Code - The first pattern unit is -\end_layout - -\begin_layout LyX-Code - p1=4...7 -\end_layout - -\begin_layout LyX-Code - which means "match 4 to 7 characters and call them p1". - The -\end_layout - -\begin_layout LyX-Code - second pattern unit is -\end_layout - -\begin_layout LyX-Code - 3...8 -\end_layout - -\begin_layout LyX-Code - which means "then match 3 to 8 characters". - The last pattern unit -\end_layout - -\begin_layout LyX-Code - is -\end_layout - -\begin_layout LyX-Code - ~p1 -\end_layout - -\begin_layout LyX-Code - which means "match the reverse complement of p1". - The first -\end_layout - -\begin_layout LyX-Code - reported hit is shown as -\end_layout - -\begin_layout LyX-Code - >tst1:[6,27] -\end_layout - -\begin_layout LyX-Code - cguaacc ggttaacc gguuacg -\end_layout - -\begin_layout LyX-Code - which states that characters 6 through 27 of sequence tst1 were -\end_layout - -\begin_layout LyX-Code - matched. - "cguaac" matched the first pattern unit, "ggttaacc" the -\end_layout - -\begin_layout LyX-Code - second, and "gguuacg" the third. - This is an example of a common -\end_layout - -\begin_layout LyX-Code - type of pattern used to search for sections of DNA or RNA that -\end_layout - -\begin_layout LyX-Code - would fold into a hairpin loop. -\end_layout - -\begin_layout LyX-Code -Searching Both Strands -\end_layout - -\begin_layout LyX-Code - Now for a short aside: scan_for_matches only searched the -\end_layout - -\begin_layout LyX-Code - sequences in the input file; it did not search the opposite -\end_layout - -\begin_layout LyX-Code - strand. - With a pattern of the sort we just used, there is not -\end_layout - -\begin_layout LyX-Code - need o search the opposite strand. - However, it is normally the -\end_layout - -\begin_layout LyX-Code - case that you will wish to search both the sequence and the -\end_layout - -\begin_layout LyX-Code - opposite strand (i.e., the reverse complement of the sequence). -\end_layout - -\begin_layout LyX-Code - To do that, you would just use the "-c" command line. - For example, -\end_layout - -\begin_layout LyX-Code - scan_for_matches -c pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - Hits on the opposite strand will show a beginning location greater -\end_layout - -\begin_layout LyX-Code - than te end location of the match. -\end_layout - -\begin_layout LyX-Code -Defining Pairing Rules and Allowing Mismatches, Insertions, and Deletions -\end_layout - -\begin_layout LyX-Code - Let us stop now and ask "What additional features would one need to -\end_layout - -\begin_layout LyX-Code - really find the kinds of loop structures that characterize tRNAs, -\end_layout - -\begin_layout LyX-Code - rRNAs, and so forth?" I can immediately think of two: -\end_layout - -\begin_layout LyX-Code - a) you will need to be able to allow non-standard pairings -\end_layout - -\begin_layout LyX-Code - (those other than G-C and A-U), and -\end_layout - -\begin_layout LyX-Code - b) you will need to be able to tolerate some number of -\end_layout - -\begin_layout LyX-Code - mismatches and bulges. -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - Let me first show you how to handle non-standard "rules for -\end_layout - -\begin_layout LyX-Code - pairing in reverse complements". - Consider the following pattern, -\end_layout - -\begin_layout LyX-Code - which I show as two line (you may use as many lines as you like in -\end_layout - -\begin_layout LyX-Code - forming a pattern, although you can only break a pattern at points -\end_layout - -\begin_layout LyX-Code - where space would be legal): -\end_layout - -\begin_layout LyX-Code - r1={au,ua,gc,cg,gu,ug,ga,ag} -\end_layout - -\begin_layout LyX-Code - p1=2...3 0...4 p2=2...5 1...5 r1~p2 0...4 ~p1 -\end_layout - -\begin_layout LyX-Code - The first "pattern unit" does not actually match anything; rather, -\end_layout - -\begin_layout LyX-Code - it defines a "pairing rule" in which standard pairings are -\end_layout - -\begin_layout LyX-Code - allowed, as well as G-A and A-G (in case you wondered, Us and Ts -\end_layout - -\begin_layout LyX-Code - and upper and lower case can be used interchangably; for example -\end_layout - -\begin_layout LyX-Code - r1={AT,UA,gc,cg} could be used to define the "standard rule" for -\end_layout - -\begin_layout LyX-Code - pairings). - The second line consists of six pattern units which -\end_layout - -\begin_layout LyX-Code - may be interpreted as follows: -\end_layout - -\begin_layout LyX-Code - p1=2...3 match 2 or 3 characters (call it p1) -\end_layout - -\begin_layout LyX-Code - 0...4 match 0 to 4 characters -\end_layout - -\begin_layout LyX-Code - p2=2...5 match 2 to 5 characters (call it p2) -\end_layout - -\begin_layout LyX-Code - 1...5 match 1 to 5 characters -\end_layout - -\begin_layout LyX-Code - r1~p2 match the reverse complement of p2, -\end_layout - -\begin_layout LyX-Code - allowing G-A and A-G pairs -\end_layout - -\begin_layout LyX-Code - 0...4 match 0 to 4 characters -\end_layout - -\begin_layout LyX-Code - ~p1 match the reverse complement of p1 -\end_layout - -\begin_layout LyX-Code - allowing only G-C, C-G, A-T, and T-A pairs -\end_layout - -\begin_layout LyX-Code - Thus, r1~p2 means "match the reverse complement of p2 using rule r1". -\end_layout - -\begin_layout LyX-Code - Now let us consider the issue of tolerating mismatches and bulges. -\end_layout - -\begin_layout LyX-Code - You may add a "qualifier" to the pattern unit that gives the -\end_layout - -\begin_layout LyX-Code - tolerable number of "mismatches, deletions, and insertions". -\end_layout - -\begin_layout LyX-Code - Thus, -\end_layout - -\begin_layout LyX-Code - p1=10...10 3...8 ~p1[1,2,1] -\end_layout - -\begin_layout LyX-Code - means that the third pattern unit must match 10 characters, -\end_layout - -\begin_layout LyX-Code - allowing one "mismatch" (a pairing other than G-C, C-G, A-T, or -\end_layout - -\begin_layout LyX-Code - T-A), two deletions (a deletion is a character that occurs in p1, -\end_layout - -\begin_layout LyX-Code - but has been "deleted" from the string matched by ~p1), and one -\end_layout - -\begin_layout LyX-Code - insertion (an "insertion" is a character that occurs in the string -\end_layout - -\begin_layout LyX-Code - matched by ~p1, but not for which no corresponding character -\end_layout - -\begin_layout LyX-Code - occurs in p1). - In this case, the pattern would match -\end_layout - -\begin_layout LyX-Code - ACGTACGTAC GGGGGGGG GCGTTACCT -\end_layout - -\begin_layout LyX-Code - which is, you must admit, a fairly weak loop. - It is common to -\end_layout - -\begin_layout LyX-Code - allow mismatches, but you will find yourself using insertions and -\end_layout - -\begin_layout LyX-Code - deletions much more rarely. - In any event, you should note that -\end_layout - -\begin_layout LyX-Code - allowing mismatches, insertions, and deletions does force the -\end_layout - -\begin_layout LyX-Code - program to try many additional possible pairings, so it does slow -\end_layout - -\begin_layout LyX-Code - things down a bit. -\end_layout - -\begin_layout LyX-Code -How Patterns Are Matched -\end_layout - -\begin_layout LyX-Code - Now is as good a time as any to discuss the basic flow of control -\end_layout - -\begin_layout LyX-Code - when matching patterns. - Recall that a "pattern" is a sequence of -\end_layout - -\begin_layout LyX-Code - "pattern units". - Suppose that the pattern units were -\end_layout - -\begin_layout LyX-Code - u1 u2 u3 u4 ... - un -\end_layout - -\begin_layout LyX-Code - The scan of a sequence S begins by setting the current position -\end_layout - -\begin_layout LyX-Code - to 1. - Then, an attempt is made to match u1 starting at the -\end_layout - -\begin_layout LyX-Code - current position. - Each attempt to match a pattern unit can -\end_layout - -\begin_layout LyX-Code - succeed or fail. - If it succeeds, then an attempt is made to match -\end_layout - -\begin_layout LyX-Code - the next unit. - If it fails, then an attempt is made to find an -\end_layout - -\begin_layout LyX-Code - alternative match for the immediately preceding pattern unit. - If -\end_layout - -\begin_layout LyX-Code - this succeeds, then we proceed forward again to the next unit. - If -\end_layout - -\begin_layout LyX-Code - it fails we go back to the preceding unit. - This process is called -\end_layout - -\begin_layout LyX-Code - "backtracking". - If there are no previous units, then the current -\end_layout - -\begin_layout LyX-Code - position is incremented by one, and everything starts again. - This -\end_layout - -\begin_layout LyX-Code - proceeds until either the current position goes past the end of -\end_layout - -\begin_layout LyX-Code - the sequence or all of the pattern units succeed. - On success, -\end_layout - -\begin_layout LyX-Code - scan_for_matches reports the "hit", the current position is set -\end_layout - -\begin_layout LyX-Code - just past the hit, and an attempt is made to find another hit. -\end_layout - -\begin_layout LyX-Code - If you wish to limit the scan to simply finding a maximum of, say, -\end_layout - -\begin_layout LyX-Code - 10 hits, you can use the -n option (-n 10 would set the limit to -\end_layout - -\begin_layout LyX-Code - 10 reported hits). - For example, -\end_layout - -\begin_layout LyX-Code - scan_for_matches -c -n 1 pat_file < test_dna_input -\end_layout - -\begin_layout LyX-Code - would search for just the first hit (and would stop searching the -\end_layout - -\begin_layout LyX-Code - current sequences or any that follow in the input file). -\end_layout - -\begin_layout LyX-Code -Searching for repeats: -\end_layout - -\begin_layout LyX-Code - In the last section, I discussed almost all of the details -\end_layout - -\begin_layout LyX-Code - required to allow you to look for repeats. - Consider the following -\end_layout - -\begin_layout LyX-Code - set of patterns: -\end_layout - -\begin_layout LyX-Code - p1=6...6 3...8 p1 (find exact 6 character repeat separated -\end_layout - -\begin_layout LyX-Code - by to 8 characters) -\end_layout - -\begin_layout LyX-Code - p1=6...6 3..8 p1[1,0,0] (allow one mismatch) -\end_layout - -\begin_layout LyX-Code - p1=3...3 p1[1,0,0] p1[1,0,0] p1[1,0,0] -\end_layout - -\begin_layout LyX-Code - (match 12 characters that are the remains -\end_layout - -\begin_layout LyX-Code - of a 3-character sequence occurring 4 times) -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - p1=4...8 0...3 p2=6...8 p1 0...3 p2 -\end_layout - -\begin_layout LyX-Code - (This would match things like -\end_layout - -\begin_layout LyX-Code - ATCT G TCTTT ATCT TG TCTTT -\end_layout - -\begin_layout LyX-Code - ) -\end_layout - -\begin_layout LyX-Code -Searching for particular sequences: -\end_layout - -\begin_layout LyX-Code - Occasionally, one wishes to match a specific, known sequence. -\end_layout - -\begin_layout LyX-Code - In such a case, you can just give the sequence (along with an -\end_layout - -\begin_layout LyX-Code - optional statement of the allowable mismatches, insertions, and -\end_layout - -\begin_layout LyX-Code - deletions). - Thus, -\end_layout - -\begin_layout LyX-Code - p1=6...8 GAGA ~p1 (match a hairpin with GAGA as the loop) -\end_layout - -\begin_layout LyX-Code - RRRRYYYY (match 4 purines followed by 4 pyrimidines) -\end_layout - -\begin_layout LyX-Code - TATAA[1,0,0] (match TATAA, allowing 1 mismatch) -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code -Matches against a "weight matrix": -\end_layout - -\begin_layout LyX-Code - I will conclude my examples of the types of pattern units -\end_layout - -\begin_layout LyX-Code - available for matching against nucleotide sequences by discussing a -\end_layout - -\begin_layout LyX-Code - crude implemetation of matching using a "weight matrix". - While I -\end_layout - -\begin_layout LyX-Code - am less than overwhelmed with the syntax that I chose, I think that -\end_layout - -\begin_layout LyX-Code - the reader should be aware that I was thinking of generating -\end_layout - -\begin_layout LyX-Code - patterns containing such pattern units automatically from -\end_layout - -\begin_layout LyX-Code - alignments (and did not really plan on typing such things in by -\end_layout - -\begin_layout LyX-Code - hand very often). - Anyway, suppose that you wanted to match a -\end_layout - -\begin_layout LyX-Code - sequence of eight characters. - The "consensus" of these eight -\end_layout - -\begin_layout LyX-Code - characters is GRCACCGS, but the actual "frequencies of occurrence" -\end_layout - -\begin_layout LyX-Code - are given in the matrix below. - Thus, the first character is an A -\end_layout - -\begin_layout LyX-Code - 16% the time and a G 84% of the time. - The second is an A 57% of -\end_layout - -\begin_layout LyX-Code - the time, a C 10% of the time, a G 29% of the time, and a T 4% of -\end_layout - -\begin_layout LyX-Code - the time. - -\end_layout - -\begin_layout LyX-Code - C1 C2 C3 C4 C5 C6 C7 C8 -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - A 16 57 0 95 0 18 0 0 -\end_layout - -\begin_layout LyX-Code - C 0 10 80 0 100 60 0 50 -\end_layout - -\begin_layout LyX-Code - G 84 29 0 0 0 20 100 50 -\end_layout - -\begin_layout LyX-Code - T 0 4 20 5 0 2 0 0 -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - One could use the following pattern unit to search for inexact -\end_layout - -\begin_layout LyX-Code - matches related to such a "weight matrix": -\end_layout - -\begin_layout LyX-Code - {(16,0,84,0),(57,10,29,4),(0,80,0,20),(95,0,0,5), -\end_layout - -\begin_layout LyX-Code - (0,100,0,0),(18,60,20,2),(0,0,100,0),(0,50,50,0)} > 450 -\end_layout - -\begin_layout LyX-Code - This pattern unit will attempt to match exactly eight characters. -\end_layout - -\begin_layout LyX-Code - For each character in the sequence, the entry in the corresponding -\end_layout - -\begin_layout LyX-Code - tuple is added to an accumulated sum. - If the sum is greater than -\end_layout - -\begin_layout LyX-Code - 450, the match succeeds; else it fails. -\end_layout - -\begin_layout LyX-Code - Recently, this feature was upgraded to allow ranges. - Thus, -\end_layout - -\begin_layout LyX-Code - 600 > {(16,0,84,0),(57,10,29,4),(0,80,0,20),(95,0,0,5), -\end_layout - -\begin_layout LyX-Code - (0,100,0,0),(18,60,20,2),(0,0,100,0),(0,50,50,0)} > 450 -\end_layout - -\begin_layout LyX-Code - will work, as well. -\end_layout - -\begin_layout LyX-Code -Allowing Alternatives: -\end_layout - -\begin_layout LyX-Code - Very occasionally, you may wish to allow alternative pattern units -\end_layout - -\begin_layout LyX-Code - (i.e., "match either A or B"). - You can do this using something -\end_layout - -\begin_layout LyX-Code - like -\end_layout - -\begin_layout LyX-Code - ( GAGA | GCGCA) -\end_layout - -\begin_layout LyX-Code - which says "match either GAGA or GCGCA". - You may take -\end_layout - -\begin_layout LyX-Code - alternatives of a list of pattern units, for example -\end_layout - -\begin_layout LyX-Code - (p1=3...3 3...8 ~p1 | p1=5...5 4...4 ~p1 GGG) -\end_layout - -\begin_layout LyX-Code - would match one of two sequences of pattern units. - There is one -\end_layout - -\begin_layout LyX-Code - clumsy aspect of the syntax: to match a list of alternatives, you -\end_layout - -\begin_layout LyX-Code - need to fully the request. - Thus, -\end_layout - -\begin_layout LyX-Code - (GAGA | (GCGCA | TTCGA)) -\end_layout - -\begin_layout LyX-Code - would be needed to try the three alternatives. -\end_layout - -\begin_layout LyX-Code -One Minor Extension -\end_layout - -\begin_layout LyX-Code - Sometimes a pattern will contain a sequence of distinct ranges, -\end_layout - -\begin_layout LyX-Code - and you might wish to limit the sum of the lengths of the matched -\end_layout - -\begin_layout LyX-Code - subsequences. - For example, suppose that you basically wanted to -\end_layout - -\begin_layout LyX-Code - match something like -\end_layout - -\begin_layout LyX-Code - ARRYYTT p1=0...5 GCA[1,0,0] p2=1...6 ~p1 4...8 ~p2 p3=4...10 CCT -\end_layout - -\begin_layout LyX-Code - but that the sum of the lengths of p1, p2, and p3 must not exceed -\end_layout - -\begin_layout LyX-Code - eight characters. - To do this, you could add -\end_layout - -\begin_layout LyX-Code - length(p1+p2+p3) < 9 -\end_layout - -\begin_layout LyX-Code - as the last pattern unit. - It will just succeed or fail (but does -\end_layout - -\begin_layout LyX-Code - not actually match any characters in the sequence). -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code -Matching Protein Sequences -\end_layout - -\begin_layout LyX-Code - Suppose that the input file contains protein sequences. - In this -\end_layout - -\begin_layout LyX-Code - case, you must invoke scan_for_matches with the "-p" option. - You -\end_layout - -\begin_layout LyX-Code - cannot use aspects of the language that relate directly to -\end_layout - -\begin_layout LyX-Code - nucleotide sequences (e.g., the -c command line option or pattern -\end_layout - -\begin_layout LyX-Code - constructs referring to the reverse complement of a previously -\end_layout - -\begin_layout LyX-Code - matched unit). - -\end_layout - -\begin_layout LyX-Code - You also have two additional constructs that allow you to match -\end_layout - -\begin_layout LyX-Code - either "one of a set of amino acids" or "any amino acid other than -\end_layout - -\begin_layout LyX-Code - those a given set". - For example, -\end_layout - -\begin_layout LyX-Code - p1=0...4 any(HQD) 1...3 notany(HK) p1 -\end_layout - -\begin_layout LyX-Code - would successfully match a string like -\end_layout - -\begin_layout LyX-Code - YWV D AA C YWV -\end_layout - -\begin_layout LyX-Code -Using the show_hits Utility -\end_layout - -\begin_layout LyX-Code - When viewing a large set of complex matches, you might find it -\end_layout - -\begin_layout LyX-Code - convenient to post-process the scan_for_matches output to get a -\end_layout - -\begin_layout LyX-Code - more readable version. - We provide a simple post-processor called -\end_layout - -\begin_layout LyX-Code - "show_hits". - To see its effect, just pipe the output of a -\end_layout - -\begin_layout LyX-Code - scan_for_matches into show_hits: -\end_layout - -\begin_layout LyX-Code - Normal Output: -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches -c pat_file < tmp -\end_layout - -\begin_layout LyX-Code - >tst1:[1,28] -\end_layout - -\begin_layout LyX-Code - gtacguaacc ggttaac cgguuacgtac -\end_layout - -\begin_layout LyX-Code - >tst1:[28,1] -\end_layout - -\begin_layout LyX-Code - gtacgtaacc ggttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - >tst2:[2,31] -\end_layout - -\begin_layout LyX-Code - CGTACGUAAC C GGTTAACC GGUUACGTACG -\end_layout - -\begin_layout LyX-Code - >tst2:[31,2] -\end_layout - -\begin_layout LyX-Code - CGTACGTAAC C GGTTAACC GGTTACGTACG -\end_layout - -\begin_layout LyX-Code - >tst3:[3,32] -\end_layout - -\begin_layout LyX-Code - gtacguaacc g gttaactt cgguuacgtac -\end_layout - -\begin_layout LyX-Code - >tst3:[32,3] -\end_layout - -\begin_layout LyX-Code - gtacgtaacc g aagttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - Piped Through show_hits: -\end_layout - -\begin_layout LyX-Code - -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches -c pat_file < tmp | show_hits -\end_layout - -\begin_layout LyX-Code - tst1:[1,28]: gtacguaacc ggttaac cgguuacgtac -\end_layout - -\begin_layout LyX-Code - tst1:[28,1]: gtacgtaacc ggttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - tst2:[2,31]: CGTACGUAAC C GGTTAACC GGUUACGTACG -\end_layout - -\begin_layout LyX-Code - tst2:[31,2]: CGTACGTAAC C GGTTAACC GGTTACGTACG -\end_layout - -\begin_layout LyX-Code - tst3:[3,32]: gtacguaacc g gttaactt cgguuacgtac -\end_layout - -\begin_layout LyX-Code - tst3:[32,3]: gtacgtaacc g aagttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - clone% -\end_layout - -\begin_layout LyX-Code - Optionally, you can specify which of the "fields" in the matches -\end_layout - -\begin_layout LyX-Code - you wish to sort on, and show_hits will sort them. - The field -\end_layout - -\begin_layout LyX-Code - numbers start with 0. - So, you might get something like -\end_layout - -\begin_layout LyX-Code - clone% scan_for_matches -c pat_file < tmp | show_hits 2 1 -\end_layout - -\begin_layout LyX-Code - tst2:[2,31]: CGTACGUAAC C GGTTAACC GGUUACGTACG -\end_layout - -\begin_layout LyX-Code - tst2:[31,2]: CGTACGTAAC C GGTTAACC GGTTACGTACG -\end_layout - -\begin_layout LyX-Code - tst3:[32,3]: gtacgtaacc g aagttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - tst1:[1,28]: gtacguaacc ggttaac cgguuacgtac -\end_layout - -\begin_layout LyX-Code - tst1:[28,1]: gtacgtaacc ggttaac cggttacgtac -\end_layout - -\begin_layout LyX-Code - tst3:[3,32]: gtacguaacc g gttaactt cgguuacgtac -\end_layout - -\begin_layout LyX-Code - clone% -\end_layout - -\begin_layout LyX-Code - In this case, the hits have been sorted on fields 2 and 1 (that is, -\end_layout - -\begin_layout LyX-Code - the third and second matched subfields). -\end_layout - -\begin_layout LyX-Code - show_hits is just one possible little post-processor, and you -\end_layout - -\begin_layout LyX-Code - might well wish to write a customized one for yourself. -\end_layout - -\begin_layout LyX-Code -Reducing the Cost of a Search -\end_layout - -\begin_layout LyX-Code - The scan_for_matches utility uses a fairly simple search, and may -\end_layout - -\begin_layout LyX-Code - consume large amounts of CPU time for complex patterns. - Someday, -\end_layout - -\begin_layout LyX-Code - I may decide to optimize the code. - However, until then, let me -\end_layout - -\begin_layout LyX-Code - mention one useful technique. - -\end_layout - -\begin_layout LyX-Code - When you have a complex pattern that includes a number of varying -\end_layout - -\begin_layout LyX-Code - ranges, imprecise matches, and so forth, it is useful to -\end_layout - -\begin_layout LyX-Code - "pipeline" matches. - That is, form a simpler pattern that can be -\end_layout - -\begin_layout LyX-Code - used to scan through a large database extracting sections that -\end_layout - -\begin_layout LyX-Code - might be matched by the more complex pattern. - Let me illustrate -\end_layout - -\begin_layout LyX-Code - with a short example. - Suppose that you really wished to match the -\end_layout - -\begin_layout LyX-Code - pattern -\end_layout - -\begin_layout LyX-Code - p1=3...5 0...8 ~p1[1,1,0] p2=6...7 3...6 AGC 3...5 RYGC ~p2[1,0,0] -\end_layout - -\begin_layout LyX-Code - In this case, the pattern units AGC 3...5 RYGC can be used to rapidly -\end_layout - -\begin_layout LyX-Code - constrain the overall search. - You can preprocess the overall -\end_layout - -\begin_layout LyX-Code - database using the pattern: -\end_layout - -\begin_layout LyX-Code - 31...31 AGC 3...5 RYGC 7...7 -\end_layout - -\begin_layout LyX-Code - Put the complex pattern in pat_file1 and the simpler pattern in -\end_layout - -\begin_layout LyX-Code - pat_file2. - Then use, -\end_layout - -\begin_layout LyX-Code - scan_for_matches -c pat_file2 < nucleotide_database | -\end_layout - -\begin_layout LyX-Code - scan_for_matches pat_file1 -\end_layout - -\begin_layout LyX-Code - The output will show things like -\end_layout - -\begin_layout LyX-Code - >seqid:[232,280][2,47] -\end_layout - -\begin_layout LyX-Code - matches pieces -\end_layout - -\begin_layout LyX-Code - Then, the actual section of the sequence that was matched can be -\end_layout - -\begin_layout LyX-Code - easily computed as [233,278] (remember, the positions start from -\end_layout - -\begin_layout LyX-Code - 1, not 0). -\end_layout - -\begin_layout LyX-Code - Let me finally add, you should do a few short experiments to see -\end_layout - -\begin_layout LyX-Code - whether or not such pipelining actually improves performance -- it -\end_layout - -\begin_layout LyX-Code - is not always obvious where the time is going, and I have -\end_layout - -\begin_layout LyX-Code - sometimes found that the added complexity of pipelining actually -\end_layout - -\begin_layout LyX-Code - slowed things up. - It gets its best improvements when there are -\end_layout - -\begin_layout LyX-Code - exact matches of more than just a few characters that can be -\end_layout - -\begin_layout LyX-Code - rapidly used to eliminate large sections of the database. -\end_layout - -\begin_layout LyX-Code -============= -\end_layout - -\begin_layout LyX-Code -Additions: -\end_layout - -\begin_layout LyX-Code -Feb 9, 1995: the pattern units ^ and $ now work as in normal regular -\end_layout - -\begin_layout LyX-Code - expressions. - That is -\end_layout - -\begin_layout LyX-Code - TTF $ -\end_layout - -\begin_layout LyX-Code - matches only TTF at the end of the string and -\end_layout - -\begin_layout LyX-Code - ^ TTF -\end_layout - -\begin_layout LyX-Code - matches only an initial TTF -\end_layout - -\begin_layout LyX-Code - The pattern unit -\end_layout - -\begin_layout LyX-Code - -\end{lyxcode} -cat is the Unix command that reads a file and output the result to -'stdout' --- which in this case is piped to any biotool represented -by the . It is also possible to read the data stream using -'<' to direct the 'stdout' stream into the biotool like this: - -\begin{lyxcode} -~<~ -\end{lyxcode} -However, that will not work if you pipe more biotools together. Then -it is much safer to read the stream from a file explicitly like this: - -\begin{lyxcode} -~-{}-stream\_in= -\end{lyxcode} -Here the filename is explicetly given to the biotool -with the switch -\/-stream\_in. This switch works with all biotools. -It is also possible to read in data from multiple sources by repeating -the explicit read step: - -\begin{lyxcode} -~-{}-stream\_in=~|~~-{}-stream\_in= -\end{lyxcode} - -\subsection{How to write the data stream to file?\label{sub:How-to-write-stream}} - -In order to save the output stream from a biotool to file, so you -can read in the stream again at a later time, you can do one of two -things: - -\begin{lyxcode} -~>~ -\end{lyxcode} -All, the biotools write the data stream to 'stdout' by default which -can be written to a file by redirecting 'stdout' to file using '>' -, however, if one of the biotools for writing other formats is used -then the both the biotools records as well as the result output will -go to 'stdout' in a mixture causing havock! To avoid this you must -use the switch -\/-stream\_out that explictly tells the biotool to -write the output stream to file: - -\begin{lyxcode} -~-{}-stream\_out= -\end{lyxcode} -The -\/-stream\_out switch works with all biotools. - - -\subsection{How to terminate the data stream?} - -The data stream is never stops unless the user want to save the stream -or by supplying the -\/-no\_stream switch that will terminate the -stream: - -\begin{lyxcode} -~-{}-no\_stream -\end{lyxcode} -The -\/-no\_stream switch only works with those biotools where it -makes sense that the user might want to terminale the data stream, -\emph{i.e}. after an analysis step where the user wants to output -the result, but not the data stream. - - -\subsection{How to write my results to file?\label{sub:How-to-write-result}} - -Saving the result of an analysis to file can be done implicitly or -explicitly. The implicit way: - -\begin{lyxcode} -~-{}-no\_stream~>~ -\end{lyxcode} -If you use '>' to redirect 'stdout' to file then it is important to -use the -\/-no\_stream switch to avoid writing a mix of biotools -records and result to the same file causing havock. The safe way is -to use the -\/-result\_out switch which explicetly tells the biotool -to write the result to a given file: - -\begin{lyxcode} -~-{}-result\_out= -\end{lyxcode} -Using the above method will not terminate the stream, so it is possible -to pipe that into another biotool generating different results: - -\begin{lyxcode} -~-{}-result\_out=~|~~-{}-result\_out= -\end{lyxcode} -And still the data stream will continue unless terminated with -\/-no\_stream: - -\begin{lyxcode} -~-{}-result\_out=~-{}-no\_stream -\end{lyxcode} -Or written to file using implicitly or explicity \eqref{sub:How-to-write-result}. -The explicit way: - -\begin{lyxcode} -~-{}-result\_out=~-{}-stream\_out= -\end{lyxcode} - -\subsection{How to read data from multiple sources?} - -To read multiple data sources, with the same type or different type -of data do: - -\begin{lyxcode} -~-{}-data\_in=~|~~-{}-data\_in= -\end{lyxcode} -where type is the data type a specific biotool reads. - - -\section{Reading input} - - -\subsection{How to read biotools input?} - -See \eqref{sub:How-to-read-stream}. - - -\subsection{How to read in data?} - -Data in different formats can be read with the appropriate biotool -for that format. The biotools are typicalled named 'read\_' -such as \textbf{read\_fasta}, \textbf{read\_bed}, \textbf{read\_tab}, -etc., and all behave in a similar manner. Data can be read by supplying -the -\/-data\_in switch and a file name to the file containing the -data: - -\begin{lyxcode} -~-{}-data\_in= -\end{lyxcode} -It is also possible to read in a saved biotools stream (see \ref{sub:How-to-read-stream}) -as well as reading data in one go: - -\begin{lyxcode} -~-{}-stream\_in=~-{}-data\_in= -\end{lyxcode} -If you want to read data from several files you can do this: - -\begin{lyxcode} -~-{}-data\_in=~|~~-{}-data\_in= -\end{lyxcode} -If you have several data files you can read in all explicitly with -a comma separated list: - -\begin{lyxcode} -~-{}-data\_in=file1,file2,file3 -\end{lyxcode} -And it is also possible to use file globbing: - -\begin{lyxcode} -~-{}-data\_in={*}.fna -\end{lyxcode} -Or in a combination: - -\begin{lyxcode} -~-{}-data\_in=file1,/dir/{*}.fna -\end{lyxcode} -Finally, it is possible to read in data in different formats using -the appropriate biotool for each format: - -\begin{lyxcode} -~-{}-data\_in=~|~~-{}-data\_in=~... -\end{lyxcode} - -\subsection{How to read FASTA input?} - -Sequences in FASTA format can be read explicitly using \textbf{read\_fasta}: - -\begin{lyxcode} -read\_fasta~-{}-data\_in= -\end{lyxcode} - -\subsection{How to read alignment input?} - -If your alignment if FASTA formatted then you can \textbf{read\_align}. -It is also possible to use \textbf{read\_fasta} since the data is -FASTA formatted, however, with \textbf{read\_fasta} the key ALIGN -will be omitted. The ALIGN key is used to determine which sequences -belong to what alignment which is required for \textbf{write\_align}. - -\begin{lyxcode} -read\_align~-{}-data\_in= -\end{lyxcode} - -\subsection{How to read tabular input?\label{sub:How-to-read-table}} - -Tabular input can be read with \textbf{read\_tab} which will read -in all rows and chosen columns (separated by a given delimter) from -a table in text format. - -The table below: - -\noindent \begin{center} -\begin{tabular}{lll} -Human & ATACGTCAG & 23524\tabularnewline -Dog & AGCATGAC & 2442\tabularnewline -Mouse & GACTG & 234\tabularnewline -Cat & AAATGCA & 2342\tabularnewline -\end{tabular} -\par\end{center} - -Can be read using the command: - -\begin{lyxcode} -read\_tab~-{}-data\_in= -\end{lyxcode} -Which will result in four records, one for each row, where the keys -V0, V1, V2 are the default keys for the organism, sequence, and count, -respectively. It is possible to select a subset of colums to read -by using the -\/-cols switch which takes a comma separated list of -columns numbers (first column is designated 0) as argument. So to -read in only the sequence and the count so that the count comes before -the sequence do: - -\begin{lyxcode} -read\_tab~-{}-data\_in=~-{}-cols=2,1 -\end{lyxcode} -It is also possible to name the columns with the -\/-keys switch: - -\begin{lyxcode} -read\_tab~-{}-data\_in=~-{}-cols=2,1~-{}-keys=COUNT,SEQ -\end{lyxcode} - -\subsection{How to read BED input?} - -The BED (Browser Extensible Data% -\footnote{\url{http://genome.ucsc.edu/FAQ/FAQformat}% -}) format is a tabular format for data pertaining to one of the Eukaryotic -genomes in the UCSC genome brower% -\footnote{\url{http://genome.ucsc.edu/}% -}. The BED format consists of up to 12 columns, where the first three -are mandatory CHR, CHR\_BEG, and CHR\_END. The mandatory columns and -any of the optional columns can all be read in easily with the \textbf{read\_bed} -biotool. - -\begin{lyxcode} -read\_bed~-{}-data\_in= -\end{lyxcode} -It is also possible to read the BED file with \textbf{read\_tab} (see~\ref{sub:How-to-read-table}), -however, that will be more cumbersome because you need to specify -the keys: - -\begin{lyxcode} -read\_tab~-{}-data\_in=~-{}-keys=CHR,CHR\_BEG,CHR\_END~... -\end{lyxcode} - -\subsection{How to read PSL input?} - -The PSL format is the output from BLAT and contains 21 mandatory fields -that can be read with \textbf{read\_psl}: - -\begin{lyxcode} -read\_psl~-{}-data\_in= -\end{lyxcode} - -\section{Writing output} - -All result output can be written explicitly to file using the -\/-result\_out -switch which all result generating biotools have. It is also possible -to write the result to file implicetly by directing 'stdout' to file -using '>', however, that requires the -\/-no\_stream swich to prevent -a mixture of data stream and results in the file. The explicit (and -safe) way: - -\begin{lyxcode} -...~|~~-{}-result\_out= -\end{lyxcode} -The implicit way: - -\begin{lyxcode} -...~|~~-{}-no\_stream~>~ -\end{lyxcode} - -\subsection{How to write biotools output?} - -See \eqref{sub:How-to-write-stream}. - - -\subsection{How to write FASTA output?\label{sub:How-to-write-fasta}} - -FASTA output can be written with \textbf{write\_fasta}. - -\begin{lyxcode} -...~|~write\_fasta~-{}-result\_out= -\end{lyxcode} -It is also possible to wrap the sequences to a given width using the --\/-wrap switch allthough wrapping of sequence is generally an evil -thing: - -\begin{lyxcode} -...~|~write\_fasta~-{}-no\_stream~-{}-wrap=80 -\end{lyxcode} - -\subsection{How to write alignment output?\label{sub:How-to-write-alignment}} - -Pretty alignments with ruler% -\footnote{'.' for every 10 residues, ':' for every 50, and '|' for every 100% -} and consensus sequence can be created with \textbf{write\_align}, -what also have the optional -\/-wrap switch to break the alignment -into blocks of a given width: - -\begin{lyxcode} -...~|~write\_align~-{}-result\_out=~-{}-wrap=80 -\end{lyxcode} -If the number of sequnces in the alignment is 2 then a pairwise alignment -will be output otherwise a multiple alignment. And if the sequence -type, determined automagically, is protein, then residues and symbols -(+,~:,~.) will be used to show consensus according to the Blosum62 -matrix. - - -\subsection{How to write tabular output?\label{sub:How-to-write-tab}} - -Outputting the data stream as a table can be done with \textbf{write\_tab}, -which will write generate one row per record with the values as columns. -If you supply the optional -\/-comment switch, when the first row -in the table will be a 'comment' line prefixed with a '\#': - -\begin{lyxcode} -...~|~write\_tab~-{}-result\_out=~-{}-comment -\end{lyxcode} -You can also change the delimiter from the default (tab) to \emph{e.g.} -',': - -\begin{lyxcode} -...~|~write\_tab~-{}-result\_out=~-{}-delimit=',' -\end{lyxcode} -If you want the values output in a specific order you have to supply -a comma separated list using the -\/-keys switch that will print -only those keys in that order: - -\begin{lyxcode} -...~|~write\_tab~-{}-result\_out=~-{}-keys=SEQ\_NAME,COUNT -\end{lyxcode} -Alternatively, if you have some keys that you don't want in the tabular -output, use the -\/-no\_keys switch. So to print all keys except -SEQ and SEQ\_TYPE do: - -\begin{lyxcode} -...~|~write\_tab~-{}-result\_out=~-{}-no\_keys=SEQ,SEQ\_TYPE -\end{lyxcode} -Finally, if you have a stream containing a mix of different records -types, \emph{e.g.} records with sequences and records with matches, -then you can use \textbf{write\_tab} to output all the records in -tabluar format, however, the -\/-comment, -\/-keys, and -\/-no\_keys -switches will only respond to records of the first type encountered. -The reason is that outputting mixed records is probably not what you -want anyway, and you should remove all the unwanted records from the -stream before outputting the table: \textbf{grab} is your friend (see~\ref{sub:How-to-grab}). - - -\subsection{How to write a BED output?\label{sub:How-to-write-BED}} - -Data in BED format can be output if the records contain the mandatory -keys CHR, CHR\_BEG, and CHR\_END using \textbf{write\_bed}. If the -optional keys are also present, they will be output as well: - -\begin{lyxcode} -write\_bed~-{}-result\_out= -\end{lyxcode} - -\subsection{How to write PSL output?\label{sub:How-to-write-PSL}} - -Data in PSL format can be output using \textbf{write\_psl:} - -\begin{lyxcode} -write\_psl~-{}-result\_out= -\end{lyxcode} - -\section{Manipulating Records} - - -\subsection{How to select a few records?\label{sub:How-to-select-a-few-records}} - -To quickly get an overview of your data you can limit the data stream -to show a few records. This also very useful to test the pipeline -with a few records if you are setting up a complex analysis using -several biotools. That way you can inspect that all goes well before -analyzing and waiting for the full data set. All of the read\_ -biotools have the -\/-num switch which will take a number as argument -and only that number of records will be read. So to read in the first -10 FASTA entries from a file: - -\begin{lyxcode} -read\_fasta~-{}-data\_in=test.fna~-{}-num=10 -\end{lyxcode} -Another way of doing this is to use \textbf{head\_records} will limit -the stream to show the first 10 records (default): - -\begin{lyxcode} -...~|~head\_records -\end{lyxcode} -Using \textbf{head\_records} directly after one of the read\_ -biotools will be a lot slower than using the -\/-num switch with -the read\_ biotools, however, \textbf{head\_records} can also -be used to limit the output from all the other biotools. It is also -possible to give \textbf{head\_records} a number of records to show -using the -\/-num switch. So to display the first 100 records do: - -\begin{lyxcode} -...~|~head\_records~-{}-num=100 -\end{lyxcode} - -\subsection{How to count all records in the data stream?} - -To count all the records in the data stream use \textbf{count\_records}, -which adds one record (which is not included in the count) to the -data stream. So to count the number of sequences in a FASTA file you -can do this: - -\begin{lyxcode} -cat~test.fna~|~read\_fasta~|~count\_records~-{}-no\_stream -\end{lyxcode} -Which will write the last record containing the count to 'stdout': - -\begin{lyxcode} --{}-{}- - -count\_records:~630 -\end{lyxcode} -It is also possible to write the count to file using the -\/-result\_out -switch. - - -\subsection{How to grab specific records?\label{sub:How-to-grab}} - -The biotool \textbf{grab} is related to the Unix grep and locates -records based on matching keys and/or values using either a pattern, -a Perl regex, or a numerical evaluation. To easily \textbf{grab} all -records in the stream that has any mentioning of the pattern 'human' -just pipe the data stream through \textbf{grab} like this: - -\begin{lyxcode} -...~|~grab~-{}-pattern=human -\end{lyxcode} -This will search for the pattern 'human' in all keys and all values. -The -\/-pattern switch takes a comma separated list of patterns, -so in order to match multiple patterns do: - -\begin{lyxcode} -...~|~grab~-{}-pattern=human,mouse -\end{lyxcode} -It is also possible to use the -\/-pattern\_in switch instead of --\/-pattern. -\/-pattern\_in is used to read a file with one pattern -per line: - -\begin{lyxcode} -...~|~grab~-{}-pattern\_in=patterns.txt -\end{lyxcode} -If you want the opposite result --- to find all records that does -not match the patterns, add the -\/-invert switch, which not only -works with the -\/-pattern switch, but also with -\/-regex and -\/-eval: - -\begin{lyxcode} -...~|~grab~-{}-pattern=human~-{}-invert -\end{lyxcode} -If you want to search the record keys only, \emph{e.g.} to find all -records containing the key SEQ you can add the -\/-keys\_only switch. -This will prevent matching of SEQ in any record value, and in fact -SEQ is a not uncommon peptide sequence you could get an unwanted record. -Also, this will give an increase in speed since only the keys are -searched: - -\begin{lyxcode} -...~|~grab~-{}-pattern=SEQ~-{}-keys\_only -\end{lyxcode} -However, if you are interested in finding the peptide sequence SEQ -and not the SEQ key, just add the -\/-vals\_only switch instead: - -\begin{lyxcode} -...~|~grab~-{}-pattern=SEQ~-{}-vals\_only -\end{lyxcode} -Also, if you want to grab for certain key/value pairs you can supply -a comma separated list of keys whos values will then be searched using -the -\/-keys switch. This is handy if your records contain large -genomic sequences and you dont want to search the entire sequence -for \emph{e.g.} the organism name --- it is much faster to tell \textbf{grab} -which keys to search the value for: - -\begin{lyxcode} -...~|~grab~-{}-pattern=human~-{}-keys=SEQ\_NAME - - -\end{lyxcode} -It is also possible to invoke flexible matching using regex (regular -expressions) instead of simple pattern matching. In \textbf{grab} -the regex engine is Perl based and allows use of different type of -wild cards, alternatives, \emph{etc}% -\footnote{\url{http://perldoc.perl.org/perlreref.html}% -}. If you want to \textbf{grab} records withs the sequence ATCG or -GCTA you can do this: - -\begin{lyxcode} -...~|~grab~-{}-regex='ATCG|GCTA' -\end{lyxcode} -Or if you want to find sequences beginning with ATCG: - -\begin{lyxcode} -...~|~grab~-{}-regex='\textasciicircum{}ATCG' -\end{lyxcode} -You can also use \textbf{grab} to locate records that fulfill a numerical -property using the -\/-eval switch witch takes an expression in three -parts. The first part is the key that holds the number we want to -evaluate, the second part holds one the six operators: - -\begin{enumerate} -\item Greater than: > -\item Greater than or equal to: >= -\item Less than: < -\item Less than or equal to: <= -\item Equal to: = -\item Not equal to: != -\end{enumerate} -And finally comes the number used in the evaluation. So to \textbf{grab} -all records with a sequence length greater than 30: - -\begin{lyxcode} -...~length\_seq~|~grab~-{}-eval='SEQ\_LEN~>~30' -\end{lyxcode} -If you want to locate all records containing the pattern 'human' and -where the sequence length is greater that 30, you do this by running -the stream through \textbf{grab} twice: - -\begin{lyxcode} -...~|~grab~-{}-pattern='human'~|~length\_seq~|~grab~-{}-eval='SEQ\_LEN~>~30' -\end{lyxcode} -To get the best speed performance, use the most restrictive \textbf{grab} -first. - - -\subsection{How to remove keys from records?} - -To remove one or more specific keys from all records in the data stream -use \textbf{remove\_keys} like this: - -\begin{lyxcode} -...~|~remove\_keys~-{}-keys=SEQ,SEQ\_NAME -\end{lyxcode} -In the above example SEQ and SEQ\_NAME will be removed from all records -if they exists in these. If all keys are removed from a record, then -the record will be removed. - - -\subsection{How to rename keys in records?} - -Sometimes you want to rename a record key, \emph{e.g.} if you have -read in a two column table with sequence name and sequence in each -column (see \ref{sub:How-to-read-table}) without specifying the key -names, then the sequence name will be called V0 and the sequence V1 -as default in the \textbf{read\_tab} biotool. To rename the V0 and -V1 keys we need to run the stream through \textbf{rename\_keys} twice -(one for each key to rename): - -\begin{lyxcode} -...~|~rename\_keys~-{}-keys=V0,SEQ\_NAME~|~rename\_keys~-{}-keys=V1,SEQ -\end{lyxcode} -The first instance of \textbf{rename\_keys} replaces all the V0 keys -with SEQ\_NAME, and the second instance of \textbf{rename\_keys} replaces -all the V1 keys with SEQ. \emph{Et viola} the data can now be used -in the biotools that requires these keys. - - -\section{Manipulating Sequences} - - -\subsection{How to get sequence lengths?} - -The length for sequences in records can be determined with \textbf{length\_seq}, -which adds the key SEQ\_LEN to each record with the sequence length -as the value. It also generates an extra record that is emitted last -with the key TOTAL\_SEQ\_LEN showing the total length of all the sequences. - -\begin{lyxcode} -read\_fasta~-{}-data\_in=~|~length\_seq -\end{lyxcode} -It is also possible to determine the sequence length using the generic -tool \textbf{length\_vals} (see \#\#\#), which determines the length -of the values for a given list of keys: - -\begin{lyxcode} -read\_fasta~-{}-data\_in=~|~length\_vals~-{}-keys=SEQ -\end{lyxcode} -To obtain the total length of all sequences use \textbf{sum\_vals} -like this: - -\begin{lyxcode} -read\_fasta~-{}-data\_in=~|~length\_vals~-{}-keys=SEQ - -|~sum\_vals~-{}-keys=SEQ\_LEN -\end{lyxcode} -The biotool \textbf{analyze\_seq} will also determine the length of -each sequence (see~\ref{sub:How-to-analyze}). - - -\subsection{How to analyze sequence composition?\label{sub:How-to-analyze}} - -If you want to find out the sequence type, composition, length, as -well as GC content, indel content and proportions of soft and hard -masked sequence, then use \textbf{analyze\_seq}. This handy biotool -will determine all these things per sequence from which it is easy -to get an overview using the \textbf{write\_tab} biotool to output -a table (see~\ref{sub:How-to-write-tab}). So in order to determine -the sequence composition of a FASTA file with just one entry containing -the sequence 'ATCG' we just read the data with \textbf{read\_fasta} -and run the output through \textbf{analyze\_seq} which will add the -analysis to the record like this: - -\begin{lyxcode} -read\_fasta~-{}-data\_in=test.fna~|~analyze\_seq~... - - - --{}-{}- - -GC\%:~50.00 - -HARD\_MASK\%:~0.00 - -RES:-:~0 - -RES:.:~0 - -RES:A:~1 - -RES:B:~0 - -RES:C:~1 - -RES:D:~0 - -RES:G:~1 - -RES:H:~0 - -RES:K:~0 - -RES:M:~0 - -RES:N:~0 - -RES:R:~0 - -RES:S:~0 - -RES:T:~1 - -RES:U:~0 - -RES:V:~0 - -RES:W:~0 - -RES:Y:~0 - -RES:\textasciitilde{}:~0 - -SEQ:~ATCG - -SEQ\_LEN:~4 - -SEQ\_NAME:~test - -SEQ\_TYPE:~DNA - -SOFT\_MASK\%:~0.00 -\end{lyxcode} -Now to make a table of how may As, Ts, Cs, and Gs you can add the -following: - -\begin{lyxcode} -...~|~analyze\_seq~|~write\_tab~-{}-keys=RES:A,RES:T,RES:C,RES:G -\end{lyxcode} -Or if you want to see the proportions of hard and soft masked sequence: - -\begin{lyxcode} -...~|~analyse\_seq~|~write\_tab~-{}-keys=HARD\_MASK\%,SOFT\_MASK\% -\end{lyxcode} -If you have a stack of sequences in one file and you want to determine -the mean GC content you can do it using the \textbf{mean\_vals} biotool: - -\begin{lyxcode} -read\_fasta~-{}-data\_in=test.fna~|~analyze\_seq~|~mean\_vals~-{}-keys=GC\% -\end{lyxcode} -Or if you want the total count of Ns you can use \textbf{sum\_vals} -like this: - -\begin{lyxcode} -read\_fasta~-{}-data\_in=test.fna~|~analyze\_seq~|~sum\_vals~-{}-keys=RES:N -\end{lyxcode} - -\subsection{How to extract subsequences?\label{sub:How-to-extract}} - -In order to extract a subsequence from a longer sequence use the biotool -extract\_seq, which will replace the sequence in the record with the -subsequence (this behaviour should probably be modified to be dependant -of a -\/-replace or a -\/-no\_replace switch). So to extract the -first 20 residues from all sequences do (first residue is designated -1): - -\begin{lyxcode} -...~|~extract\_seq~-{}-beg=1~-{}-len=20 -\end{lyxcode} -You can also specify a begin and end coordinate set: - -\begin{lyxcode} -...~|~extract\_seq~-{}-beg=20~-{}-end=40 -\end{lyxcode} -If you want the subsequences from position 20 to the sequence end -do: - -\begin{lyxcode} -...~|~extract\_seq~-{}-beg=20 -\end{lyxcode} -If you want to extract subsequences a given distance from the sequence -end you can do this by reversing the sequence with the biotool \textbf{reverse\_seq} -\eqref{sub:How-to-reverse-seq}, followed by \textbf{extract\_seq} -to get the subsequence, and then \textbf{reverse\_seq} again to get -the subsequence back in the original orientation: - -\begin{lyxcode} -read\_fasta~-{}-data\_in=test.fna~|~reverse\_seq - -|~extract\_seq~-{}-beg=10~-{}-len=10~|~reverse\_seq -\end{lyxcode} - -\subsection{How to get genomic sequence?\label{sub:How-to-get-genomic-sequence}} - -The biotool \textbf{get\_genomic\_seq} can extract subsequences for -a given genome specified with the -\/-genome switch explicitly using -the -\/-beg and -\/-end/-\/-len switches: - -\begin{lyxcode} -get\_genome\_seq~-{}-genome=~-{}-beg=1~-{}-len=100 -\end{lyxcode} -Alternatively, \textbf{get\_genome\_seq} can be used to append the -corresponding sequence to BED, PSL, and BLAST records: - -\begin{lyxcode} -read\_bed~-{}-data\_in=~|~get\_genome\_seq~-{}-genome= -\end{lyxcode} - -\subsection{How to upper-case sequences?} - -Sequences can be shifted from lower case to upper case using \textbf{uppercase\_seq}: - -\begin{lyxcode} -...~|~uppercase\_seq -\end{lyxcode} - -\subsection{How to reverse sequences?\label{sub:How-to-reverse-seq}} - -The order of residues in a sequence can be reversed using reverse\_seq: - -\begin{lyxcode} -...~|~reverse\_seq -\end{lyxcode} -Note that in order to reverse/complement a sequence you also need -the \textbf{complement\_seq} biotool (see~\ref{sub:How-to-complement}). - - -\subsection{How to complement sequences?\label{sub:How-to-complement}} - -DNA and RNA sequences can be complemented with \textbf{complement\_seq}, -which automagically determines the sequence type: - -\begin{lyxcode} -...~|~complement\_seq -\end{lyxcode} -Note that in order to reverse/complement a sequence you also need -the \textbf{reverse\_seq} biotool (see~\ref{sub:How-to-reverse-seq}). - - -\subsection{How to remove indels from sequnces?} - -Indels can be removed from sequences with the \textbf{remove\_indels} -biotool. This is useful if you have aligned some sequences (see~\ref{sub:How-to-align}) -and extracted (see~\ref{sub:How-to-extract}) a block of subsequences -from the alignment and you want to use these sequence in a search -where you need to remove the indels first. '-', '\textasciitilde{}', -and '.' are considered indels: - -\begin{lyxcode} -...~|~remove\_indels -\end{lyxcode} - -\subsection{How to split sequences into overlapping subsequences?} - -Sequences can be slit into overlapping subsequences with the \textbf{split\_seq} -biotool. - -\begin{lyxcode} -...~|~split\_seq~-{}-word\_size=20~-{}-uniq -\end{lyxcode} - -\subsection{How to determine the oligo frequency?} - -In order to determine if any oligo usage is over represented in one -or more sequences you can determine the frequency of oligos of a given -size with \textbf{oligo\_freq}: - -\begin{lyxcode} -...~|~oligo\_freq~-{}-word\_size=4 -\end{lyxcode} -And if you have more than one sequence and want to accumulate the -frequences you need the -\/-all switch: - -\begin{lyxcode} -...~|~oligo\_freq~-{}-word\_size=4~-{}-all -\end{lyxcode} -To get a meaningful result you need to write the resulting frequencies -as a table with \textbf{write\_tab} (see~\ref{sub:How-to-write-tab}), -but first it is important to \textbf{grab} (see~\ref{sub:How-to-grab}) -the records with the frequencies to avoid full length sequences in -the table: - -\begin{lyxcode} -...~|~oligo\_freq~-{}-word\_size=4~-{}-all~|~grab~-{}-pattern=OLIGO~-{}-keys\_only - -|~write\_tab~-{}-no\_stream -\end{lyxcode} -And the resulting frequency table can be sorted with Unix sort (man -sort). - - -\subsection{How to search for sequences in genomes?} - -See the following biotool: - -\begin{itemize} -\item \textbf{patscan\_seq} \eqref{sub:How-to-use-patscan} -\item \textbf{blat\_seq} \eqref{sub:How-to-use-BLAT} -\item \textbf{blast\_seq} \eqref{sub:How-to-use-BLAST} -\item \textbf{vmatch\_seq} \eqref{sub:How-to-use-Vmatch} -\end{itemize} - -\subsection{How to search sequences for a pattern?\label{sub:How-to-use-patscan}} - -It is possible to search sequences in the data stream for patterns -using the \textbf{patscan\_seq} biotool which utilizes the powerful -scan\_for\_matches engine. Consult the documentation for scan\_for\_matches -in order to learn how to define patterns (the documentation is included -in Appendix~\ref{sec:scan_for_matches-README}). - -To search all sequences for a simple pattern consisting of the sequence -ATCGATCG allowing for 3 mismatches, 2 insertions and 1 deletion: - -\begin{lyxcode} -read\_fasta~-{}-data\_in=~|~patscan\_seq~-{}-pattern='ATCGATCG{[}3,2,1]' -\end{lyxcode} -The -\/-pattern switch takes a comma seperated list of patterns, -so if you want to search for more that one pattern do: - -\begin{lyxcode} -...~|~patscan\_seq~-{}-pattern='ATCGATCG{[}3,2,1],GCTAGCTA{[}3,2,1]' -\end{lyxcode} -It is also possible to have a list of different patterns to search -for in a file with one pattern per line. In order to get \textbf{patscan\_seq} -to read these patterns use the -\/-pattern\_in switch: - -\begin{lyxcode} -...~|~patscan\_seq~-{}-pattern\_in= -\end{lyxcode} -To also scan the complementary strand in nucleotide sequences (\textbf{patscan\_seq} -automagically determines the sequence type) you need to add the -\/-comp -switch: - -\begin{lyxcode} -...~|~patscan\_seq~-{}-pattern=~-{}-comp -\end{lyxcode} -It is also possible to use \textbf{patscan\_seq} to output those records -that does not contain a certain pattern by using the -\/-invert switch: - -\begin{lyxcode} -...~|~patscan\_seq~-{}-pattern=~-{}-invert -\end{lyxcode} -Finally, \textbf{patscan\_seq} can also scan for patterns in a given -genome sequence, instead of sequences in the stream, using the -\/-genome -switch: - -\begin{lyxcode} -patscan~-{}-pattern=~-{}-genome= -\end{lyxcode} - -\subsection{How to use BLAT for sequence search?\label{sub:How-to-use-BLAT}} - -Sequences in the data stream can be matched against supported genomes -using \textbf{blat\_seq} which is a biotool using BLAT as the name -might suggest. Currently only Mouse and Human genomes are available -and it is not possible to use OOC files since there is still a need -for a local repository for genome files. Otherwise it is just: - -\begin{lyxcode} -read\_fasta~-{}-data\_in=~|~blat\_seq~-{}-genome= -\end{lyxcode} -The search results can then be written to file with \textbf{write\_psl} -(see~\ref{sub:How-to-write-PSL}) or \textbf{write\_bed} (see~\ref{sub:How-to-write-BED}) -allthough with \textbf{write\_bed} some information will be lost). -It is also possible to plot chromosome distribution of the search -results using \textbf{plot\_chrdist} (see~\ref{sub:How-to-plot-chrdist}) -or the distribution of the match lengths using \textbf{plot\_lendist} -(see~\ref{sub:How-to-plot-lendist}) or a karyogram with the hits -using \textbf{plot\_karyogram} (see~\ref{sub:How-to-plot-karyogram}). - - -\subsection{How to use BLAST for sequence search?\label{sub:How-to-use-BLAST}} - -Two biotools exist for blasting sequences: \textbf{create\_blast\_db} -is used to create the BLAST database required for BLAST which is queried -using the biotool \textbf{blast\_seq}. So in order to create a BLAST -database from sequences in the data stream you simple run: - -\begin{lyxcode} -...~|~create\_blast\_db~-{}-database=my\_database~-{}-no\_stream -\end{lyxcode} -The type of sequence to use for the database is automagically determined -by \textbf{create\_blast\_db}, but don't have a mixture of peptide -and nucleic acids sequences in the stream. The -\/-database switch -takes a path as argument, but will default to 'blastdb\_ -if not set. - -The resulting database can now be queried with sequences in another -data stream using \textbf{blast\_seq}: - -\begin{lyxcode} -...~|~blast\_seq~-{}-database=my\_database -\end{lyxcode} -Again, the sequence type is determined automagically and the appropriate -BLAST program is guessed (see below table), however, the program name -can be overruled with the -\/-program switch. - -\noindent \begin{center} -\begin{tabular}{ccc} -Subject sequence & Query sequence & Program guess\tabularnewline -\hline -Nucleotide & Nucleotide & blastn\tabularnewline -Protein & Protein & blastp\tabularnewline -Protein & Nucleotide & blastx\tabularnewline -Nucleotide & Protein & tblastn\tabularnewline -\end{tabular} -\par\end{center} - -Finally, it is also possible to use \textbf{blast\_seq} for blasting -sequences agains a preformatted genome using the -\/-genome switch -instead of the -\/-database switch: - -\begin{lyxcode} -...~|~blast\_seq~-{}-genome= -\end{lyxcode} - -\subsection{How to use Vmatch for sequence search?\label{sub:How-to-use-Vmatch}} - -The powerful suffix array software package Vmatch% -\footnote{\url{http://www.vmatch.de/}% -} can be used for exact mapping of sequences against indexed genomes -using the biotool \textbf{vmatch\_seq}, which will e.g. map 700000 -ESTs to the human genome locating all 160 mio hits in less than an -hour. - -\begin{lyxcode} -...~|~vmatch\_seq~-{}-genome= -\end{lyxcode} -Only nucleotide sequences and sequences longer than 11 nucleotides -will be mapped. The resulting SCORE key will hold the number of genome -matches of a given sequence (multi-mappers). - - -\subsection{How to find all matches between sequences?\label{sub:How-to-find-matches}} - -All matches between two sequences can be determined with the biotool -\textbf{match\_seq}. The match finding engine underneath the hood -of \textbf{match\_seq} is the super fast suffix tree program MUMmer% -\footnote{\url{http://mummer.sourceforge.net/}% -}, which will locate all forward and reverse matches between huge sequences -in a matter of minutes (if the repeat count is not too high and if -the word size used is appropriate). Matching two \emph{Helicobacter -pylori} genomes (1.7Mbp) takes around 10 seconds: - -\begin{lyxcode} -...~|~match\_seq~-{}-word\_size=20~-{}-direction=both -\end{lyxcode} -The output from \textbf{match\_seq} can be used to generate a dot -plot with \textbf{plot\_matches} (see~\ref{sub:How-to-generate-dotplot}). - - -\subsection{How to align sequences?\label{sub:How-to-align}} - -Sequences in the stream can be aligned with the \textbf{align\_seq} -biotool that uses Muscle% -\footnote{\url{http://www.drive5.com/muscle/muscle.html}% -} as aligment engine. Currently you cannot change any of the Muscle -alignment parameters and \textbf{align\_seq} will create an alignment -based on the defaults (which are really good!): - -\begin{lyxcode} -...~|~align\_seq -\end{lyxcode} -The aligned output can be written to file in FASTA format using \textbf{write\_fasta} -(see~\ref{sub:How-to-write-fasta}) or in pretty text using \textbf{write\_align} -(see~\ref{sub:How-to-write-alignment}). - - -\subsection{How to create a weight matrix?} - -If you want a weight matrix to show the sequence composition of a -stack of sequences you can use the biotool create\_weight\_matrix: - -\begin{lyxcode} -...~|~create\_weight\_matrix -\end{lyxcode} -The result can be output in percent using the -\/-percent switch: - -\begin{lyxcode} -...~|~create\_weight\_matrix~-{}-percent -\end{lyxcode} -The weight matrix can be written as tabular output with \textbf{write\_tab} -(see~\ref{sub:How-to-write-tab}) after removeing the records containing -SEQ with \textbf{grab} (see~\ref{sub:How-to-grab}): - -\begin{lyxcode} -...~|~create\_weight\_matrix~|~grab~-{}-invert~-{}-keys=SEQ~-{}-keys\_only - -|~write\_tab~-{}-no\_stream -\end{lyxcode} -The V0 column will hold the residue, while the rest of the columns -will hold the frequencies for each sequence position. - - -\section{Plotting} - -There exists several biotools for plotting. Some of these are based -on GNUplot% -\footnote{\url{http://www.gnuplot.info/}% -}, which is an extremely powerful platform to generate all sorts of -plots and even though GNUplot has quite a steep learning curve, the -biotools utilizing GNUplot are simple to use. GNUplot is able to output -a lot of different formats (called terminals in GNUplot), but the -biotools focusses on three formats only: - -\begin{enumerate} -\item The 'dumb' terminal is default to the GNUplot based biotools and will -output a plot in crude ASCII text (Fig.~\ref{fig:Dumb-terminal}). -This is quite nice for a quick and dirty plot to get an overview of -your data . -\item The 'post' or 'postscript' terminal output postscript code which is -publication grade graphics that can be viewed with applications such -as Ghostview, Photoshop, and Preview. -\item The 'svg' terminal output's scalable vector graphics (SVG) which is -a vector based format. SVG is great because you can edit the resulting -plot using Photoshop or Inkscape% -\footnote{Inkscape is a really handy drawing program that is free and open source. -Availble at \url{http://www.inkscape.org}% -} if you want to add additional labels, captions, arrows, and so on -and then save the result in different formats, such as postscript -without loosing resolution. -\end{enumerate} -The biotools for plotting that are not based on GNUplot only output -SVG (that may change in the future). - -% -\begin{figure} -\noindent \begin{centering} -\includegraphics[width=12cm]{lendist_ascii} -\par\end{centering} - -\caption{\label{fig:Dumb-terminal}Dumb terminal} - - -\begin{quote} -The output of a length distribution plot in the default 'dumb terminal' -to the terminal window. -\end{quote} - -\end{figure} - - - -\subsection{How to plot a histogram?\label{How-to-plot-histogram}} - -A generic histogram for a given value can be plotted with the biotool -\textbf{plot\_histogram} (Fig.~\ref{fig:Histogram}): - -\begin{lyxcode} -...~|~plot\_histogram~-{}-key=TISSUE~-{}-no\_stream -\end{lyxcode} -(Figure missing) - -\noindent \begin{flushleft} -% -\begin{figure} -\noindent \begin{centering} -\includegraphics[width=12cm]{histogram} -\par\end{centering} - -\caption{\label{fig:Histogram}Histogram} - -\end{figure} - -\par\end{flushleft} - - -\subsection{How to plot a length distribution?\label{sub:How-to-plot-lendist}} - -Plotting of length distributions, weather sequence lengths, patterns -lengths, hit lengths, \emph{etc.} is a really handy thing and can -be done with the the biotool \textbf{plot\_lendist}. If you have a -file with FASTA entries and want to plot the length distribution you -do it like this: - -\begin{lyxcode} -read\_fasta~-{}-data\_in=~|~length\_seq - -|~plot\_lendist~-{}-key=SEQ\_LEN~-{}-no\_stream -\end{lyxcode} -The result will be written to the default dumb terminal and will look -like Fig.~\ref{fig:Dumb-terminal}. - -If you instead want the result in postscript format you can do: - -\begin{lyxcode} -...~|~plot\_lendist~-{}-key=SEQ\_LEN~-{}-terminal=post~-{}-result\_out=file.ps -\end{lyxcode} -That will generate the plot and save it to file, but not interrupt -the data stream which can then be used in further analysis. You can -also save the plot implicetly using '>', however, it is then important -to terminate the stream with the -\/-no\_stream switch: - -\begin{lyxcode} -...~|~plot\_lendist~-{}-key=SEQ\_LEN~-{}-terminal=post~-{}-no\_stream~>~file.ps -\end{lyxcode} -The resulting plot can be seen in Fig.~\ref{fig:Length-distribution}. - -% -\begin{figure} - - -\noindent \begin{centering} -\includegraphics[width=12cm]{lendist} -\par\end{centering} - -\caption{\label{fig:Length-distribution}Length distribution} - - -\begin{quote} -Length distribution of 630 piRNA like RNAs. -\end{quote} - -\end{figure} - - - -\subsection{How to plot a chromosome distribution?\label{sub:How-to-plot-chrdist}} - -If you have the result of a sequence search against a multi chromosome -genome, it is very practical to be able to plot the distribution of -search hits on the different chromosomes. This can be done with \textbf{plot\_chrdist}: - -\begin{lyxcode} -read\_fasta~-{}-data\_in=~|~blat\_genome~|~plot\_chrdist~-{}-no\_stream -\end{lyxcode} -The above example will result in a crude plot using the 'dumb' terminal, -and if you want to mess around with the results from the BLAT search -you probably want to save the result to file first (see~\ref{sub:How-to-write-PSL}). -To plot the chromosome distribution from the saved search result you -can do: - -\begin{lyxcode} -read\_bed~-{}-data\_in=file.bed~|~plot\_chrdist~-{}-terminal=post~-{}-result\_out=plot.ps -\end{lyxcode} -That will result in the output show in Fig.~\ref{fig:Chromosome-distribution}. - -% -\begin{figure} - - -\noindent \begin{centering} -\includegraphics[angle=90,width=12cm]{chrdist} -\par\end{centering} - -\caption{\label{fig:Chromosome-distribution}Chromosome distribution} - -\end{figure} - - - -\subsection{How to generate a dotplot?\label{sub:How-to-generate-dotplot}} - -A dotplot is a powerful way to get an overview of the size and location -of sequence insertions, deletions, and duplications between two sequences. -Generating a dotplot with biotools is a two step process where you -initially find all matches between two sequences using the tool \textbf{match\_seq} -(see~\ref{sub:How-to-find-matches}) and plot the resulting matches -with \textbf{plot\_matches}. Matching and plotting two \emph{Helicobacter -pylori} genomes (1.7Mbp) takes around 10 seconds: - -\begin{lyxcode} -...~|~match\_seq~|~plot\_matches~-{}-terminal=post~-{}-result\_out=plot.ps -\end{lyxcode} -The resulting dotplot is in Fig.~\ref{fig:Dotplot}. - -% -\begin{figure} -\noindent \begin{centering} -\includegraphics[width=12cm]{dotplot} -\par\end{centering} - -\caption{\label{fig:Dotplot}Dotplot} - - -\begin{quote} -Forward matches are displayed in green while reverse matches are displayed -in red. -\end{quote} - -\end{figure} - - - -\subsection{How to plot a sequence logo?} - -Sequence logos can be generate with \textbf{plot\_seqlogo}. The sequnce -type is determined automagically and an entropy scale of 2 bits and -4 bits is used for nucleotide and peptide sequences, respectively% -\footnote{\url{http://www.ccrnp.ncifcrf.gov/~toms/paper/hawaii/latex/node5.html}% -}. - -\begin{lyxcode} -...~|~plot\_seqlogo~-{}-no\_stream~-{}-result\_out=seqlogo.svg -\end{lyxcode} -An example of a sequence logo can be seen in Fig.~\ref{fig:Sequence-logo}. - -% -\begin{figure} -\noindent \begin{centering} -\includegraphics[width=12cm]{seqlogo} -\par\end{centering} - -\caption{\label{fig:Sequence-logo}Sequence logo} - -\end{figure} - - - -\subsection{How to plot a karyogram?\label{sub:How-to-plot-karyogram}} - -To plot search hits on genomes use \textbf{plot\_karyogram}, which -will output a nice karyogram in SVG graphics: - -\begin{lyxcode} -...~|~plot\_karyogram~-{}-result\_out=karyogram.svg -\end{lyxcode} -The banding data is taken from the UCSC genome browser database and -currently only Human and Mouse is supported. Fig.~\ref{fig:Karyogram} -shows the distribution of piRNA like RNAs matched to the Human genome. - -% -\begin{figure} -\noindent \begin{centering} -\includegraphics[width=12cm]{karyogram} -\par\end{centering} - -\caption{\label{fig:Karyogram}Karyogram} - - -\begin{quote} -Hits from a search of piRNA like RNAs in the Human genome is displayed -as short horizontal bars. -\end{quote} - -\end{figure} - - - -\section{Uploading Results} - - -\subsection{How do I display my results in the UCSC Genome Browser?} - -Results from the list of biotools below can be uploaded directly to -a local mirror of the UCSC Genome Browser using the biotool \textbf{upload\_to\_ucsc}: - -\begin{itemize} -\item patscan\_seq \eqref{sub:How-to-use-patscan} -\item blat\_seq \eqref{sub:How-to-use-BLAT} -\item blast\_seq \eqref{sub:How-to-use-BLAST} -\item vmatch\_seq \eqref{sub:How-to-use-Vmatch} -\end{itemize} -The syntax for uploading data the most simple way requires two mandatory -switches: -\/-database, which is the UCSC database name (such as -hg18, mm9, etc.) and-\/-table which should be the users initials -followed by an underscore and a short description of the data: - -\begin{lyxcode} -...~|~upload\_to\_ucsc~-{}-database=hg18~-{}-table=mah\_snoRNAs -\end{lyxcode} -The \textbf{upload\_to\_ucsc} biotool modifies the users \textasciitilde{}/ucsc/my\_tracks.ra -file automagically (a backup is created with the name \textasciitilde{}/ucsc/my\_tracks.ra\textasciitilde{}) -with default values that can be overridden using the following switches: - -\begin{itemize} -\item -\/-short\_label - Short label for track - Default=database->table -\item -\/-long\_label - Long label for track - Default=database->table -\item -\/-group - Track group name - Default= -\item -\/-priority - Track display priority - Default=1 -\item -\/-color - Track color - Default=147,73,42 -\item -\/-chunk\_size - Chunks for loading - Default=10000000 -\item -\/-visibility - Track visibility - Default=pack -\end{itemize} -Also, data in BED or PSL format can be uploaded with \textbf{upload\_to\_ucsc} -as long as these reference to genomes and chromosomes existing in -the UCSC Genome Browser: - -\begin{lyxcode} -read\_bed~-{}-data\_in=~|~upload\_to\_ucsc~... - - - -read\_psl~-{}-data\_in=~|~upload\_to\_ucsc~... -\end{lyxcode} - -\section{Trouble shooting} - -Shoot the messenger! - -\appendix - -\section{Keys\label{sec:Keys}} - -HIT - -HIT\_BEG - -HIT\_END - -HIT\_LEN - -HIT\_NAME - -PATTERN - - -\section{Switches\label{sec:Switches}} - --\/-stream\_in - --\/-stream\_out - --\/-no\_stream - --\/-data\_in - --\/-result\_out - --\/-num - - -\section{scan\_for\_matches README\label{sec:scan_for_matches-README}} - -\begin{lyxcode} -~~~~~~~~~~~~~~~~~~~~~~~~~~scan\_for\_matches: - -~~~~A~Program~to~Scan~Nucleotide~or~Protein~Sequences~for~Matching~Patterns - -~~~~~~~~~~~~~~~~~~~~~~~~Ross~Overbeek - -~~~~~~~~~~~~~~~~~~~~~~~~MCS - -~~~~~~~~~~~~~~~~~~~~~~~~Argonne~National~Laboratory - -~~~~~~~~~~~~~~~~~~~~~~~~Argonne,~IL~60439 - -~~~~~~~~~~~~~~~~~~~~~~~~USA - -Scan\_for\_matches~is~a~utility~that~we~have~written~to~search~for - -patterns~in~DNA~and~protein~sequences.~~I~wrote~most~of~the~code, - -although~David~Joerg~and~Morgan~Price~wrote~sections~of~an - -earlier~version.~~The~whole~notion~of~pattern~matching~has~a~rich - -history,~and~we~borrowed~liberally~from~many~sources.~~However,~it~is - -worth~noting~that~we~were~strongly~influenced~by~the~elegant~tools - -developed~and~distributed~by~David~Searls.~~My~intent~is~to~make~the - -existing~tool~available~to~anyone~in~the~research~community~that~might - -find~it~useful.~~I~will~continue~to~try~to~fix~bugs~and~make~suggested - -enhancements,~at~least~until~I~feel~that~a~superior~tool~exists. - -Hence,~I~would~appreciate~it~if~all~bug~reports~and~suggestions~are - -directed~to~me~at~Overbeek@mcs.anl.gov.~~ - -I~will~try~to~log~all~bug~fixes~and~report~them~to~users~that~send~me - -their~email~addresses.~~I~do~not~require~that~you~give~me~your~name - -and~address.~~However,~if~you~do~give~it~to~me,~I~will~try~to~notify - -you~of~serious~problems~as~they~are~discovered. - -Getting~Started: - -~~~~The~distribution~should~contain~at~least~the~following~programs: - -~~~~~~~~~~~~~~~~README~~~~~~~~~~~~~~~~~~-~~~~~This~document - -~~~~~~~~~~~~~~~~ggpunit.c~~~~~~~~~~~~~~~-~~~~~One~of~the~two~source~files - -~~~~~~~~~~~~~~~~scan\_for\_matches.c~~~~~~-~~~~~The~second~source~file - -~~~~~~~~~~~~~~~~ - -~~~~~~~~~~~~~~~~run\_tests~~~~~~~~~~~~~~~-~~~~~A~perl~script~to~test~things - -~~~~~~~~~~~~~~~~show\_hits~~~~~~~~~~~~~~~-~~~~~A~handy~perl~script - -~~~~~~~~~~~~~~~~test\_dna\_input~~~~~~~~~~-~~~~~Test~sequences~for~DNA - -~~~~~~~~~~~~~~~~test\_dna\_patterns~~~~~~~-~~~~~Test~patterns~for~DNA~scan - -~~~~~~~~~~~~~~~~test\_output~~~~~~~~~~~~~-~~~~~Desired~output~from~test - -~~~~~~~~~~~~~~~~test\_prot\_input~~~~~~~~~-~~~~~Test~protein~sequences - -~~~~~~~~~~~~~~~~test\_prot\_patterns~~~~~~-~~~~~Test~patterns~for~proteins - -~~~~~~~~~~~~~~~~testit~~~~~~~~~~~~~~~~~~-~~~~~a~perl~script~used~for~test - -~~~~Only~the~first~three~files~are~required.~~The~others~are~useful, - -~~~~but~only~if~you~have~Perl~installed~on~your~system.~~If~you~do - -~~~~have~Perl,~I~suggest~that~you~type - -~~~~~~~~ - -~~~~~~~~~~~~~~~~which~perl - -~~~~to~find~out~where~it~installed.~~On~my~system,~I~get~the~following - -~~~~response: - -~~~~~~~~ - -~~~~~~~~~~~~~~~~clone\%~which~perl - -~~~~~~~~~~~~~~~~/usr/local/bin/perl - -~~~~indicating~that~Perl~is~installed~in~/usr/local/bin.~~Anyway,~once - -~~~~you~know~where~it~is~installed,~edit~the~first~line~of~files~ - -~~~~~~~~testit - -~~~~~~~~show\_hits - -~~~~replacing~/usr/local/bin/perl~with~the~appropriate~location.~~I - -~~~~will~assume~that~you~can~do~this,~although~it~is~not~critical~(it - -~~~~is~needed~only~to~test~the~installation~and~to~use~the~\char`\"{}show\_hits\char`\"{} - -~~~~utility).~~Perl~is~not~required~to~actually~install~and~run - -~~~~scan\_for\_matches.~ - -~~~~If~you~do~not~have~Perl,~I~suggest~you~get~it~and~install~it~(it - -~~~~is~a~wonderful~utility).~~Information~about~Perl~and~how~to~get~it - -~~~~can~be~found~in~the~book~\char`\"{}Programming~Perl\char`\"{}~by~Larry~Wall~and - -~~~~Randall~L.~Schwartz,~published~by~O'Reilly~\&~Associates,~Inc. - -~~~~To~get~started,~you~will~need~to~compile~the~program.~~~I~do~this - -~~~~using~ - -~~~~~~~~gcc~-O~-o~scan\_for\_matches~~ggpunit.c~scan\_for\_matches.c - -~~~~If~you~do~not~use~GNU~C,~use~ - -~~~~~~~~cc~-O~-DCC~-o~scan\_for\_matches~~ggpunit.c~scan\_for\_matches.c - -~~~~which~works~on~my~Sun.~~ - -~~~~Once~you~have~compiled~scan\_for\_matches,~you~can~verify~that~it - -~~~~works~with - -~~~~~~~~clone\%~run\_tests~tmp - -~~~~~~~~clone\%~diff~tmp~test\_output - -~~~~You~may~get~a~few~strange~lines~of~the~sort - -~~~~~~~~clone\%~run\_tests~tmp - -~~~~~~~~rm:~tmp:~No~such~file~or~directory - -~~~~~~~~clone\%~diff~tmp~test\_output - -~~~~These~should~cause~no~concern.~~However,~if~the~\char`\"{}diff\char`\"{}~shows~that - -~~~~tmp~and~test\_output~are~different,~contact~me~(you~have~a - -~~~~problem).~ - -~~~~You~should~now~be~able~to~use~scan\_for\_matches~by~following~the - -~~~~instructions~given~below~(which~is~all~the~normal~user~should~have - -~~~~to~understand,~once~things~are~installed~properly). - -~============================================================== - -How~to~run~scan\_for\_matches: - -~~~~To~run~the~program,~you~type~need~to~create~two~files - -~~~~1.~~the~first~file~contains~the~pattern~you~wish~to~scan~for;~I'll - -~~~~~~~~call~this~file~pat\_file~in~what~follows~(but~any~name~is~ok) - -~~~~2.~~the~second~file~contains~a~set~of~sequences~to~scan.~~These - -~~~~~~~~should~be~in~\char`\"{}fasta~format\char`\"{}.~~Just~look~at~the~contents~of - -~~~~~~~~test\_dna\_input~to~see~examples~of~this~format.~~Basically, - -~~~~~~~~each~sequence~begins~with~a~line~of~the~form - -~~~~~~~~~~~>sequence\_id - -~~~~~~~~and~is~followed~by~one~or~more~lines~containing~the~sequence. - -~~~~Once~these~files~have~been~created,~you~just~use - -~~~~~~~~scan\_for\_matches~pat\_file~<~input\_file - -~~~~to~scan~all~of~the~input~sequences~for~the~given~pattern.~~As~an - -~~~~example,~suppose~that~pat\_file~contains~a~single~line~of~the~form - -~~~~~~~~~~~~~~~~p1=4...7~3...8~\textasciitilde{}p1 - -~~~~Then, - -~~~~~~~~~~~~~~~~scan\_for\_matches~pat\_file~<~test\_dna\_input - -~~~~should~produce~two~\char`\"{}hits\char`\"{}.~~When~I~run~this~on~my~machine,~I~get - -~~~~~~~~clone\%~scan\_for\_matches~pat\_file~<~test\_dna\_input - -~~~~~~~~>tst1:{[}6,27] - -~~~~~~~~cguaacc~ggttaacc~gguuacg~ - -~~~~~~~~>tst2:{[}6,27] - -~~~~~~~~CGUAACC~GGTTAACC~GGUUACG~ - -~~~~~~~~clone\%~ - -Simple~Patterns~Built~by~Matching~Ranges~and~Reverse~Complements - -~~~~Let~me~first~explain~this~simple~pattern: - -~~~~~~~~~~~~~~~~ - -~~~~~~~~~~~~~~~~p1=4...7~3...8~\textasciitilde{}p1 - -~~~~The~pattern~consists~of~three~\char`\"{}pattern~units\char`\"{}~separated~by~spaces. - -~~~~The~first~pattern~unit~is - -~~~~~~~~~~~~~~~~p1=4...7 - -~~~~which~means~\char`\"{}match~4~to~7~characters~and~call~them~p1\char`\"{}.~~The - -~~~~second~pattern~unit~is - -~~~~~~~~~~~~~~~~3...8 - -~~~~which~means~\char`\"{}then~match~3~to~8~characters\char`\"{}.~~The~last~pattern~unit - -~~~~is~ - -~~~~~~~~~~~~~~~~\textasciitilde{}p1 - -~~~~which~means~\char`\"{}match~the~reverse~complement~of~p1\char`\"{}.~~The~first - -~~~~reported~hit~is~shown~as - -~~~~~~~~>tst1:{[}6,27] - -~~~~~~~~cguaacc~ggttaacc~gguuacg~ - -~~~~which~states~that~characters~6~through~27~of~sequence~tst1~were - -~~~~matched.~~\char`\"{}cguaac\char`\"{}~matched~the~first~pattern~unit,~\char`\"{}ggttaacc\char`\"{}~the - -~~~~second,~and~\char`\"{}gguuacg\char`\"{}~the~third.~~This~is~an~example~of~a~common - -~~~~type~of~pattern~used~to~search~for~sections~of~DNA~or~RNA~that - -~~~~would~fold~into~a~hairpin~loop. - -Searching~Both~Strands - -~~~~Now~for~a~short~aside:~scan\_for\_matches~only~searched~the - -~~~~sequences~in~the~input~file;~it~did~not~search~the~opposite - -~~~~strand.~~With~a~pattern~of~the~sort~we~just~used,~there~is~not - -~~~~need~o~search~the~opposite~strand.~~However,~it~is~normally~the - -~~~~case~that~you~will~wish~to~search~both~the~sequence~and~the - -~~~~opposite~strand~(i.e.,~the~reverse~complement~of~the~sequence). - -~~~~To~do~that,~you~would~just~use~the~\char`\"{}-c\char`\"{}~command~line.~~For~example, - -~~~~~~~~scan\_for\_matches~-c~pat\_file~<~test\_dna\_input - -~~~~Hits~on~the~opposite~strand~will~show~a~beginning~location~greater - -~~~~than~te~end~location~of~the~match. - -Defining~Pairing~Rules~and~Allowing~Mismatches,~Insertions,~and~Deletions - -~~~~Let~us~stop~now~and~ask~\char`\"{}What~additional~features~would~one~need~to - -~~~~really~find~the~kinds~of~loop~structures~that~characterize~tRNAs, - -~~~~rRNAs,~and~so~forth?\char`\"{}~~I~can~immediately~think~of~two: - -~~~~~~~~a)~you~will~need~to~be~able~to~allow~non-standard~pairings - -~~~~~~~~~~~(those~other~than~G-C~and~A-U),~and - -~~~~~~~~b)~you~will~need~to~be~able~to~tolerate~some~number~of - -~~~~~~~~~~~mismatches~and~bulges. - -~~~~~~~~ - -~~~~Let~me~first~show~you~how~to~handle~non-standard~\char`\"{}rules~for - -~~~~pairing~in~reverse~complements\char`\"{}.~~Consider~the~following~pattern, - -~~~~which~I~show~as~two~line~(you~may~use~as~many~lines~as~you~like~in - -~~~~forming~a~pattern,~although~you~can~only~break~a~pattern~at~points - -~~~~where~space~would~be~legal): - -~~~~~~~~~~~~r1=\{au,ua,gc,cg,gu,ug,ga,ag\}~ - -~~~~~~~~~~~~p1=2...3~0...4~p2=2...5~1...5~r1\textasciitilde{}p2~0...4~\textasciitilde{}p1~~~~~~~~ - -~~~~The~first~\char`\"{}pattern~unit\char`\"{}~does~not~actually~match~anything;~rather, - -~~~~it~defines~a~\char`\"{}pairing~rule\char`\"{}~in~which~standard~pairings~are - -~~~~allowed,~as~well~as~G-A~and~A-G~(in~case~you~wondered,~Us~and~Ts - -~~~~and~upper~and~lower~case~can~be~used~interchangably;~for~example - -~~~~r1=\{AT,UA,gc,cg\}~could~be~used~to~define~the~\char`\"{}standard~rule\char`\"{}~for - -~~~~pairings).~~The~second~line~consists~of~six~pattern~units~which - -~~~~may~be~interpreted~as~follows: - -~~~~~~~~~~~~p1=2...3~~~~~match~2~or~3~characters~(call~it~p1) - -~~~~~~~~~~~~0...4~~~~~~~~match~0~to~4~characters - -~~~~~~~~~~~~p2=2...5~~~~~match~2~to~5~characters~(call~it~p2) - -~~~~~~~~~~~~1...5~~~~~~~~match~1~to~5~characters - -~~~~~~~~~~~~r1\textasciitilde{}p2~~~~~~~~match~the~reverse~complement~of~p2, - -~~~~~~~~~~~~~~~~~~~~~~~~~~~~allowing~G-A~and~A-G~pairs - -~~~~~~~~~~~~0...4~~~~~~~~match~0~to~4~characters~~~~~~~~ - -~~~~~~~~~~~~\textasciitilde{}p1~~~~~~~~~~match~the~reverse~complement~of~p1 - -~~~~~~~~~~~~~~~~~~~~~~~~~~~~allowing~only~G-C,~C-G,~A-T,~and~T-A~pairs - -~~~~Thus,~r1\textasciitilde{}p2~means~\char`\"{}match~the~reverse~complement~of~p2~using~rule~r1\char`\"{}. - -~~~~Now~let~us~consider~the~issue~of~tolerating~mismatches~and~bulges. - -~~~~You~may~add~a~\char`\"{}qualifier\char`\"{}~to~the~pattern~unit~that~gives~the - -~~~~tolerable~number~of~\char`\"{}mismatches,~deletions,~and~insertions\char`\"{}. - -~~~~Thus, - -~~~~~~~~~~~~~~~~p1=10...10~3...8~\textasciitilde{}p1{[}1,2,1] - -~~~~means~that~the~third~pattern~unit~must~match~10~characters, - -~~~~allowing~one~\char`\"{}mismatch\char`\"{}~(a~pairing~other~than~G-C,~C-G,~A-T,~or - -~~~~T-A),~two~deletions~(a~deletion~is~a~character~that~occurs~in~p1, - -~~~~but~has~been~\char`\"{}deleted\char`\"{}~from~the~string~matched~by~\textasciitilde{}p1),~and~one - -~~~~insertion~(an~\char`\"{}insertion\char`\"{}~is~a~character~that~occurs~in~the~string - -~~~~matched~by~\textasciitilde{}p1,~but~not~for~which~no~corresponding~character - -~~~~occurs~in~p1).~~In~this~case,~the~pattern~would~match - -~~~~~~~~~~~~~~ACGTACGTAC~GGGGGGGG~GCGTTACCT - -~~~~which~is,~you~must~admit,~a~fairly~weak~loop.~~It~is~common~to - -~~~~allow~mismatches,~but~you~will~find~yourself~using~insertions~and - -~~~~deletions~much~more~rarely.~~In~any~event,~you~should~note~that - -~~~~allowing~mismatches,~insertions,~and~deletions~does~force~the - -~~~~program~to~try~many~additional~possible~pairings,~so~it~does~slow - -~~~~things~down~a~bit. - -How~Patterns~Are~Matched - -~~~~Now~is~as~good~a~time~as~any~to~discuss~the~basic~flow~of~control - -~~~~when~matching~patterns.~~Recall~that~a~\char`\"{}pattern\char`\"{}~is~a~sequence~of - -~~~~\char`\"{}pattern~units\char`\"{}.~~Suppose~that~the~pattern~units~were - -~~~~~~~~u1~u2~u3~u4~...~un - -~~~~The~scan~of~a~sequence~S~begins~by~setting~the~current~position - -~~~~to~1.~~Then,~an~attempt~is~made~to~match~u1~starting~at~the - -~~~~current~position.~~Each~attempt~to~match~a~pattern~unit~can - -~~~~succeed~or~fail.~~If~it~succeeds,~then~an~attempt~is~made~to~match - -~~~~the~next~unit.~~If~it~fails,~then~an~attempt~is~made~to~find~an - -~~~~alternative~match~for~the~immediately~preceding~pattern~unit.~~If - -~~~~this~succeeds,~then~we~proceed~forward~again~to~the~next~unit.~~If - -~~~~it~fails~we~go~back~to~the~preceding~unit.~~This~process~is~called - -~~~~\char`\"{}backtracking\char`\"{}.~~If~there~are~no~previous~units,~then~the~current - -~~~~position~is~incremented~by~one,~and~everything~starts~again.~~This - -~~~~proceeds~until~either~the~current~position~goes~past~the~end~of - -~~~~the~sequence~or~all~of~the~pattern~units~succeed.~~On~success, - -~~~~scan\_for\_matches~reports~the~\char`\"{}hit\char`\"{},~the~current~position~is~set - -~~~~just~past~the~hit,~and~an~attempt~is~made~to~find~another~hit. - -~~~~If~you~wish~to~limit~the~scan~to~simply~finding~a~maximum~of,~say, - -~~~~10~hits,~you~can~use~the~-n~option~(-n~10~would~set~the~limit~to - -~~~~10~reported~hits).~~For~example, - -~~~~~~~~scan\_for\_matches~-c~-n~1~pat\_file~<~test\_dna\_input - -~~~~would~search~for~just~the~first~hit~(and~would~stop~searching~the - -~~~~current~sequences~or~any~that~follow~in~the~input~file). - -Searching~for~repeats: - -~~~~In~the~last~section,~I~discussed~almost~all~of~the~details - -~~~~required~to~allow~you~to~look~for~repeats.~~Consider~the~following - -~~~~set~of~patterns: - -~~~~~~~~p1=6...6~3...8~p1~~~(find~exact~6~character~repeat~separated - -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~by~to~8~characters) - -~~~~~~~~p1=6...6~3..8~p1{[}1,0,0]~~~(allow~one~mismatch) - -~~~~~~~~p1=3...3~p1{[}1,0,0]~p1{[}1,0,0]~p1{[}1,0,0]~~ - -~~~~~~~~~~~~~~~~~~~~~~~~~~~~(match~12~characters~that~are~the~remains - -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~of~a~3-character~sequence~occurring~4~times) - -~~~~~~~~~~~~~~~~ - -~~~~~~~~p1=4...8~0...3~p2=6...8~p1~0...3~p2 - -~~~~~~~~~~~~~~~~~~~~~~~~~~~~(This~would~match~things~like - -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ATCT~G~TCTTT~ATCT~TG~TCTTT - -~~~~~~~~~~~~~~~~~~~~~~~~~~~~) - -Searching~for~particular~sequences: - -~~~~Occasionally,~one~wishes~to~match~a~specific,~known~sequence. - -~~~~In~such~a~case,~you~can~just~give~the~sequence~(along~with~an - -~~~~optional~statement~of~the~allowable~mismatches,~insertions,~and - -~~~~deletions).~~Thus, - -~~~~~~~~p1=6...8~GAGA~\textasciitilde{}p1~~~~(match~a~hairpin~with~GAGA~as~the~loop) - -~~~~~~~~RRRRYYYY~~~~~~~~~~~~~(match~4~purines~followed~by~4~pyrimidines) - -~~~~~~~~TATAA{[}1,0,0]~~~~~~~~~(match~TATAA,~allowing~1~mismatch) - -~~~~~~~~ - -Matches~against~a~\char`\"{}weight~matrix\char`\"{}: - -~~~~I~will~conclude~my~examples~of~the~types~of~pattern~units - -~~~~available~for~matching~against~nucleotide~sequences~by~discussing~a - -~~~~crude~implemetation~of~matching~using~a~\char`\"{}weight~matrix\char`\"{}.~~While~I - -~~~~am~less~than~overwhelmed~with~the~syntax~that~I~chose,~I~think~that - -~~~~the~reader~should~be~aware~that~I~was~thinking~of~generating - -~~~~patterns~containing~such~pattern~units~automatically~from - -~~~~alignments~(and~did~not~really~plan~on~typing~such~things~in~by - -~~~~hand~very~often).~~Anyway,~suppose~that~you~wanted~to~match~a - -~~~~sequence~of~eight~characters.~~The~\char`\"{}consensus\char`\"{}~of~these~eight - -~~~~characters~is~GRCACCGS,~but~the~actual~\char`\"{}frequencies~of~occurrence\char`\"{} - -~~~~are~given~in~the~matrix~below.~~Thus,~the~first~character~is~an~A - -~~~~16\%~the~time~and~a~G~84\%~of~the~time.~~The~second~is~an~A~57\%~of - -~~~~the~time,~a~C~10\%~of~the~time,~a~G~29\%~of~the~time,~and~a~T~4\%~of - -~~~~the~time.~~ - -~~~~~~~~~~~~~C1~~~~~C2~~~~C3~~~~C4~~~C5~~~~C6~~~~C7~~~~C8 - -~~~~ - -~~~~~~~A~~~~~16~~~~~57~~~~~0~~~~95~~~~0~~~~18~~~~~0~~~~~0 - -~~~~~~~C~~~~~~0~~~~~10~~~~80~~~~~0~~100~~~~60~~~~~0~~~~50 - -~~~~~~~G~~~~~84~~~~~29~~~~~0~~~~~0~~~~0~~~~20~~~100~~~~50 - -~~~~~~~T~~~~~~0~~~~~~4~~~~20~~~~~5~~~~0~~~~~2~~~~~0~~~~~0~~~ - -~~~~ - -~~~~One~could~use~the~following~pattern~unit~to~search~for~inexact - -~~~~matches~related~to~such~a~\char`\"{}weight~matrix\char`\"{}: - -~~~~~~~~\{(16,0,84,0),(57,10,29,4),(0,80,0,20),(95,0,0,5), - -~~~~~~~~~(0,100,0,0),(18,60,20,2),(0,0,100,0),(0,50,50,0)\}~>~450 - -~~~~This~pattern~unit~will~attempt~to~match~exactly~eight~characters. - -~~~~For~each~character~in~the~sequence,~the~entry~in~the~corresponding - -~~~~tuple~is~added~to~an~accumulated~sum.~~If~the~sum~is~greater~than - -~~~~450,~the~match~succeeds;~else~it~fails. - -~~~~Recently,~this~feature~was~upgraded~to~allow~ranges.~~Thus, - -~~600~>~~\{(16,0,84,0),(57,10,29,4),(0,80,0,20),(95,0,0,5), - -~~~~~~~~~(0,100,0,0),(18,60,20,2),(0,0,100,0),(0,50,50,0)\}~>~450 - -~~~~will~work,~as~well. - -Allowing~Alternatives: - -~~~~Very~occasionally,~you~may~wish~to~allow~alternative~pattern~units - -~~~~(i.e.,~\char`\"{}match~either~A~or~B\char`\"{}).~~You~can~do~this~using~something - -~~~~like - -~~~~~~~~~~~~~~~~(~GAGA~|~GCGCA) - -~~~~which~says~\char`\"{}match~either~GAGA~or~GCGCA\char`\"{}.~~You~may~take - -~~~~alternatives~of~a~list~of~pattern~units,~for~example - -~~~~~~~~(p1=3...3~3...8~\textasciitilde{}p1~|~p1=5...5~4...4~\textasciitilde{}p1~GGG) - -~~~~would~match~one~of~two~sequences~of~pattern~units.~~There~is~one - -~~~~clumsy~aspect~of~the~syntax:~to~match~a~list~of~alternatives,~you - -~~~~need~to~fully~the~request.~~Thus, - -~~~~~~~~(GAGA~|~(GCGCA~|~TTCGA)) - -~~~~would~be~needed~to~try~the~three~alternatives. - -One~Minor~Extension - -~~~~Sometimes~a~pattern~will~contain~a~sequence~of~distinct~ranges, - -~~~~and~you~might~wish~to~limit~the~sum~of~the~lengths~of~the~matched - -~~~~subsequences.~~~For~example,~suppose~that~you~basically~wanted~to - -~~~~match~something~like - -~~~~ARRYYTT~p1=0...5~GCA{[}1,0,0]~p2=1...6~\textasciitilde{}p1~4...8~\textasciitilde{}p2~p3=4...10~CCT - -~~~~but~that~the~sum~of~the~lengths~of~p1,~p2,~and~p3~must~not~exceed - -~~~~eight~characters.~~To~do~this,~you~could~add~ - -~~~~~~~~length(p1+p2+p3)~<~9 - -~~~~as~the~last~pattern~unit.~~It~will~just~succeed~or~fail~(but~does - -~~~~not~actually~match~any~characters~in~the~sequence). - -~~~~ - -Matching~Protein~Sequences - -~~~~Suppose~that~the~input~file~contains~protein~sequences.~~In~this - -~~~~case,~you~must~invoke~scan\_for\_matches~with~the~\char`\"{}-p\char`\"{}~option.~~You - -~~~~cannot~use~aspects~of~the~language~that~relate~directly~to - -~~~~nucleotide~sequences~(e.g.,~the~-c~command~line~option~or~pattern - -~~~~constructs~referring~to~the~reverse~complement~of~a~previously - -~~~~matched~unit).~~ - -~~~~You~also~have~two~additional~constructs~that~allow~you~to~match - -~~~~either~\char`\"{}one~of~a~set~of~amino~acids\char`\"{}~or~\char`\"{}any~amino~acid~other~than - -~~~~those~a~given~set\char`\"{}.~~For~example, - -~~~~~~~~p1=0...4~any(HQD)~1...3~notany(HK)~p1 - -~~~~would~successfully~match~a~string~like - -~~~~~~~~~~~YWV~D~AA~C~YWV - -Using~the~show\_hits~Utility - -~~~~When~viewing~a~large~set~of~complex~matches,~you~might~find~it - -~~~~convenient~to~post-process~the~scan\_for\_matches~output~to~get~a - -~~~~more~readable~version.~~We~provide~a~simple~post-processor~called - -~~~~\char`\"{}show\_hits\char`\"{}.~~To~see~its~effect,~just~pipe~the~output~of~a - -~~~~scan\_for\_matches~into~show\_hits: - -~~~~~Normal~Output: - -~~~~~~~~clone\%~scan\_for\_matches~-c~pat\_file~<~tmp - -~~~~~~~~>tst1:{[}1,28] - -~~~~~~~~gtacguaacc~~ggttaac~cgguuacgtac~ - -~~~~~~~~>tst1:{[}28,1] - -~~~~~~~~gtacgtaacc~~ggttaac~cggttacgtac~ - -~~~~~~~~>tst2:{[}2,31] - -~~~~~~~~CGTACGUAAC~C~GGTTAACC~GGUUACGTACG~ - -~~~~~~~~>tst2:{[}31,2] - -~~~~~~~~CGTACGTAAC~C~GGTTAACC~GGTTACGTACG~ - -~~~~~~~~>tst3:{[}3,32] - -~~~~~~~~gtacguaacc~g~gttaactt~cgguuacgtac~ - -~~~~~~~~>tst3:{[}32,3] - -~~~~~~~~gtacgtaacc~g~aagttaac~cggttacgtac~ - -~~~~~Piped~Through~show\_hits: - -~~~~ - -~~~~~~~~clone\%~scan\_for\_matches~-c~pat\_file~<~tmp~|~show\_hits - -~~~~~~~~tst1:{[}1,28]:~~gtacguaacc~~~ggttaac~~cgguuacgtac - -~~~~~~~~tst1:{[}28,1]:~~gtacgtaacc~~~ggttaac~~cggttacgtac - -~~~~~~~~tst2:{[}2,31]:~~CGTACGUAAC~C~GGTTAACC~GGUUACGTACG - -~~~~~~~~tst2:{[}31,2]:~~CGTACGTAAC~C~GGTTAACC~GGTTACGTACG - -~~~~~~~~tst3:{[}3,32]:~~gtacguaacc~g~gttaactt~cgguuacgtac - -~~~~~~~~tst3:{[}32,3]:~~gtacgtaacc~g~aagttaac~cggttacgtac - -~~~~~~~~clone\%~ - -~~~~Optionally,~you~can~specify~which~of~the~\char`\"{}fields\char`\"{}~in~the~matches - -~~~~you~wish~to~sort~on,~and~show\_hits~will~sort~them.~~The~field - -~~~~numbers~start~with~0.~~So,~you~might~get~something~like - -~~~~~~~~clone\%~scan\_for\_matches~-c~pat\_file~<~tmp~|~show\_hits~2~1 - -~~~~~~~~tst2:{[}2,31]:~~CGTACGUAAC~C~GGTTAACC~GGUUACGTACG - -~~~~~~~~tst2:{[}31,2]:~~CGTACGTAAC~C~GGTTAACC~GGTTACGTACG - -~~~~~~~~tst3:{[}32,3]:~~gtacgtaacc~g~aagttaac~cggttacgtac - -~~~~~~~~tst1:{[}1,28]:~~gtacguaacc~~~ggttaac~~cgguuacgtac - -~~~~~~~~tst1:{[}28,1]:~~gtacgtaacc~~~ggttaac~~cggttacgtac - -~~~~~~~~tst3:{[}3,32]:~~gtacguaacc~g~gttaactt~cgguuacgtac - -~~~~~~~~clone\%~ - -~~~~In~this~case,~the~hits~have~been~sorted~on~fields~2~and~1~(that~is, - -~~~~the~third~and~second~matched~subfields). - -~~~~show\_hits~is~just~one~possible~little~post-processor,~and~you - -~~~~might~well~wish~to~write~a~customized~one~for~yourself. - -Reducing~the~Cost~of~a~Search - -~~~~The~scan\_for\_matches~utility~uses~a~fairly~simple~search,~and~may - -~~~~consume~large~amounts~of~CPU~time~for~complex~patterns.~~Someday, - -~~~~I~may~decide~to~optimize~the~code.~~However,~until~then,~let~me - -~~~~mention~one~useful~technique.~~ - -~~~~When~you~have~a~complex~pattern~that~includes~a~number~of~varying - -~~~~ranges,~imprecise~matches,~and~so~forth,~it~is~useful~to - -~~~~\char`\"{}pipeline\char`\"{}~matches.~~That~is,~form~a~simpler~pattern~that~can~be - -~~~~used~to~scan~through~a~large~database~extracting~sections~that - -~~~~might~be~matched~by~the~more~complex~pattern.~~Let~me~illustrate - -~~~~with~a~short~example.~~Suppose~that~you~really~wished~to~match~the - -~~~~pattern~ - -~~~~p1=3...5~0...8~\textasciitilde{}p1{[}1,1,0]~p2=6...7~3...6~AGC~3...5~RYGC~\textasciitilde{}p2{[}1,0,0] - -~~~~In~this~case,~the~pattern~units~AGC~3...5~RYGC~can~be~used~to~rapidly - -~~~~constrain~the~overall~search.~~You~can~preprocess~the~overall - -~~~~database~using~the~pattern: - -~~~~~~~~~~31...31~AGC~3...5~RYGC~7...7 - -~~~~Put~the~complex~pattern~in~pat\_file1~and~the~simpler~pattern~in - -~~~~pat\_file2.~~Then~use, - -~~~~~~~~scan\_for\_matches~-c~pat\_file2~<~nucleotide\_database~| - -~~~~~~~~scan\_for\_matches~pat\_file1 - -~~~~The~output~will~show~things~like - -~~~~>seqid:{[}232,280]{[}2,47] - -~~~~matches~pieces - -~~~~Then,~the~actual~section~of~the~sequence~that~was~matched~can~be - -~~~~easily~computed~as~{[}233,278]~(remember,~the~positions~start~from - -~~~~1,~not~0). - -~~~~Let~me~finally~add,~you~should~do~a~few~short~experiments~to~see - -~~~~whether~or~not~such~pipelining~actually~improves~performance~-{}-~it - -~~~~is~not~always~obvious~where~the~time~is~going,~and~I~have - -~~~~sometimes~found~that~the~added~complexity~of~pipelining~actually - -~~~~slowed~things~up.~~It~gets~its~best~improvements~when~there~are - -~~~~exact~matches~of~more~than~just~a~few~characters~that~can~be - -~~~~rapidly~used~to~eliminate~large~sections~of~the~database. - -============= - -Additions: - -Feb~9,~1995:~~~the~pattern~units~\textasciicircum{}~and~\$~now~work~as~in~normal~regular - -~~~~~~~~~~~~~~~expressions.~~That~is - -~~~~~~~~~~~~~~~~~~~~~~~~TTF~\$ - -~~~~~~~~~~~~~~~matches~only~TTF~at~the~end~of~the~string~and~ - -~~~~~~~~~~~~~~~~~~~~~~~~\textasciicircum{}~TTF~ - -~~~~~~~~~~~~~~~matches~only~an~initial~TTF - -~~~~~~~~~~~~~~~The~pattern~unit~ - -~~~~~~~~~~~~~~~~~~~~~~~~> matrix makepattern -/Pat1 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 0 0 M 8 8 L 0 8 M 8 0 L stroke - 0 4 M 4 8 L 8 4 L 4 0 L 0 4 L stroke} ->> matrix makepattern -/Pat2 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 0 0 M 0 8 L - 8 8 L 8 0 L 0 0 L fill} ->> matrix makepattern -/Pat3 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop -4 8 M 8 -4 L - 0 12 M 12 0 L stroke} ->> matrix makepattern -/Pat4 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop -4 0 M 8 12 L - 0 -4 M 12 8 L stroke} ->> matrix makepattern -/Pat5 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop -2 8 M 4 -4 L - 0 12 M 8 -4 L 4 12 M 10 0 L stroke} ->> matrix makepattern -/Pat6 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop -2 0 M 4 12 L - 0 -4 M 8 12 L 4 -4 M 10 8 L stroke} ->> matrix makepattern -/Pat7 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 8 -2 M -4 4 L - 12 0 M -4 8 L 12 4 M 0 10 L stroke} ->> matrix makepattern -/Pat8 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 0 -2 M 12 4 L - -4 0 M 12 8 L -4 4 M 8 10 L stroke} ->> matrix makepattern -/Pat9 exch def -/Pattern1 {PatternBgnd KeepColor Pat1 setpattern} bind def -/Pattern2 {PatternBgnd KeepColor Pat2 setpattern} bind def -/Pattern3 {PatternBgnd KeepColor Pat3 setpattern} bind def -/Pattern4 {PatternBgnd KeepColor Landscape {Pat5} {Pat4} ifelse setpattern} bind def -/Pattern5 {PatternBgnd KeepColor Landscape {Pat4} {Pat5} ifelse setpattern} bind def -/Pattern6 {PatternBgnd KeepColor Landscape {Pat9} {Pat6} ifelse setpattern} bind def -/Pattern7 {PatternBgnd KeepColor Landscape {Pat8} {Pat7} ifelse setpattern} bind def -} def -% -% -%End of PostScript Level 2 code -% -/PatternBgnd { - TransparentPatterns {} {gsave 1 setgray fill grestore} ifelse -} def -% -% Substitute for Level 2 pattern fill codes with -% grayscale if Level 2 support is not selected. -% -/Level1PatternFill { -/Pattern1 {0.250 Density} bind def -/Pattern2 {0.500 Density} bind def -/Pattern3 {0.750 Density} bind def -/Pattern4 {0.125 Density} bind def -/Pattern5 {0.375 Density} bind def -/Pattern6 {0.625 Density} bind def -/Pattern7 {0.875 Density} bind def -} def -% -% Now test for support of Level 2 code -% -Level1 {Level1PatternFill} {Level2PatternFill} ifelse -% -/Symbol-Oblique /Symbol findfont [1 0 .167 1 0 0] makefont -dup length dict begin {1 index /FID eq {pop pop} {def} ifelse} forall -currentdict end definefont pop -end -%%EndProlog -%%Page: 1 1 -gnudict begin -gsave -50 50 translate -0.100 0.100 scale -90 rotate -0 -5040 translate -0 setgray -newpath -(Helvetica) findfont 100 scalefont setfont -1.000 UL -LTb -410 660 M -63 0 V -6557 0 R --63 0 V -350 660 M -( 0) Rshow -1.000 UL -LTb -410 1243 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 20) Rshow -1.000 UL -LTb -410 1826 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 40) Rshow -1.000 UL -LTb -410 2409 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 60) Rshow -1.000 UL -LTb -410 2991 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 80) Rshow -1.000 UL -LTb -410 3574 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 100) Rshow -1.000 UL -LTb -410 4157 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 120) Rshow -1.000 UL -LTb -410 4740 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 140) Rshow -1.000 UL -LTb -698 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr1) Rshow -grestore -1.000 UL -LTb -986 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr2) Rshow -grestore -1.000 UL -LTb -1273 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr3) Rshow -grestore -1.000 UL -LTb -1561 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr4) Rshow -grestore -1.000 UL -LTb -1849 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr5) Rshow -grestore -1.000 UL -LTb -2137 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr6) Rshow -grestore -1.000 UL -LTb -2425 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr7) Rshow -grestore -1.000 UL -LTb -2713 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr8) Rshow -grestore -1.000 UL -LTb -3000 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr9) Rshow -grestore -1.000 UL -LTb -3288 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr10) Rshow -grestore -1.000 UL -LTb -3576 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr11) Rshow -grestore -1.000 UL -LTb -3864 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr12) Rshow -grestore -1.000 UL -LTb -4152 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr13) Rshow -grestore -1.000 UL -LTb -4440 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr14) Rshow -grestore -1.000 UL -LTb -4727 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr15) Rshow -grestore -1.000 UL -LTb -5015 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr16) Rshow -grestore -1.000 UL -LTb -5303 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr17) Rshow -grestore -1.000 UL -LTb -5591 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr18) Rshow -grestore -1.000 UL -LTb -5879 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr19) Rshow -grestore -1.000 UL -LTb -6167 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chrX) Rshow -grestore -1.000 UL -LTb -6454 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chr5_random) Rshow -grestore -1.000 UL -LTb -6742 660 M -0 -60 R -currentpoint gsave translate 90 rotate 0 0 M -(chrY_random) Rshow -grestore -1.000 UL -LTb -1.000 UL -LTb -410 4740 N -410 660 L -6620 0 V -0 4080 V --6620 0 V -Z stroke -3720 4890 M -(Chromosome Distribution) Cshow -1.000 UP -1.000 UL -LTb -1.000 UL -LT0 -/Helvetica findfont 100 scalefont setfont -1.000 698 660 97 2012 BoxColFill -698 660 N -0 2011 V -96 0 V -794 660 L --96 0 V -Z stroke -1.000 986 660 97 4023 BoxColFill -986 660 N -0 4022 V -96 0 V -0 -4022 V --96 0 V -Z stroke -1.000 1273 660 97 1196 BoxColFill -1273 660 N -0 1195 V -96 0 V -0 -1195 V --96 0 V -Z stroke -1.000 1561 660 97 2332 BoxColFill -1561 660 N -0 2331 V -96 0 V -0 -2331 V --96 0 V -Z stroke -1.000 1849 660 97 2857 BoxColFill -1849 660 N -0 2856 V -96 0 V -0 -2856 V --96 0 V -Z stroke -1.000 2137 660 97 2187 BoxColFill -2137 660 N -0 2186 V -96 0 V -0 -2186 V --96 0 V -Z stroke -1.000 2425 660 97 3148 BoxColFill -2425 660 N -0 3147 V -96 0 V -0 -3147 V --96 0 V -Z stroke -1.000 2713 660 97 1021 BoxColFill -2713 660 N -0 1020 V -96 0 V -0 -1020 V --96 0 V -Z stroke -1.000 3000 660 97 3178 BoxColFill -3000 660 N -0 3177 V -96 0 V -0 -3177 V --96 0 V -Z stroke -1.000 3288 660 97 2274 BoxColFill -3288 660 N -0 2273 V -96 0 V -0 -2273 V --96 0 V -Z stroke -1.000 3576 660 97 1371 BoxColFill -3576 660 N -0 1370 V -96 0 V -0 -1370 V --96 0 V -Z stroke -1.000 3864 660 97 1954 BoxColFill -3864 660 N -0 1953 V -96 0 V -0 -1953 V --96 0 V -Z stroke -1.000 4152 660 97 1458 BoxColFill -4152 660 N -0 1457 V -96 0 V -0 -1457 V --96 0 V -Z stroke -1.000 4440 660 97 1400 BoxColFill -4440 660 N -0 1399 V -96 0 V -0 -1399 V --96 0 V -Z stroke -1.000 4727 660 97 2041 BoxColFill -4727 660 N -0 2040 V -96 0 V -0 -2040 V --96 0 V -Z stroke -1.000 5015 660 97 817 BoxColFill -5015 660 N -0 816 V -96 0 V -0 -816 V --96 0 V -Z stroke -1.000 5303 660 97 2566 BoxColFill -5303 660 N -0 2565 V -96 0 V -0 -2565 V --96 0 V -Z stroke -1.000 5591 660 97 1400 BoxColFill -5591 660 N -0 1399 V -96 0 V -0 -1399 V --96 0 V -Z stroke -1.000 5879 660 97 730 BoxColFill -5879 660 N -0 729 V -96 0 V -0 -729 V --96 0 V -Z stroke -1.000 6167 660 96 1138 BoxColFill -6167 660 N -0 1137 V -95 0 V -0 -1137 V --95 0 V -Z stroke -1.000 6454 660 97 59 BoxColFill -6454 660 N -0 58 V -96 0 V -0 -58 V --96 0 V -Z stroke -1.000 6742 660 97 817 BoxColFill -6742 660 N -0 816 V -96 0 V -0 -816 V --96 0 V -Z stroke -1.000 UL -LTb -410 4740 N -410 660 L -6620 0 V -0 4080 V --6620 0 V -Z stroke -1.000 UP -1.000 UL -LTb -stroke -grestore -end -showpage -%%Trailer -%%DocumentFonts: Helvetica -%%Pages: 1 diff --git a/bp_doc/chrdist_ascii.png b/bp_doc/chrdist_ascii.png deleted file mode 100644 index 32b367f..0000000 Binary files a/bp_doc/chrdist_ascii.png and /dev/null differ diff --git a/bp_doc/dotplot.pdf b/bp_doc/dotplot.pdf deleted file mode 100644 index ba921d5..0000000 Binary files a/bp_doc/dotplot.pdf and /dev/null differ diff --git a/bp_doc/dotplot.png b/bp_doc/dotplot.png deleted file mode 100644 index c4fc4d1..0000000 Binary files a/bp_doc/dotplot.png and /dev/null differ diff --git a/bp_doc/dotplot.ps b/bp_doc/dotplot.ps deleted file mode 100644 index c77b14c..0000000 --- a/bp_doc/dotplot.ps +++ /dev/null @@ -1,13881 +0,0 @@ -%!PS-Adobe-2.0 -%%Creator: gnuplot 4.2 patchlevel 0 -%%CreationDate: Thu Nov 13 17:03:30 2008 -%%DocumentFonts: (atend) -%%BoundingBox: 50 50 554 770 -%%Orientation: Landscape -%%Pages: (atend) -%%EndComments -%%BeginProlog -/gnudict 256 dict def -gnudict begin -% -% The following 6 true/false flags may be edited by hand if required -% The unit line width may also be changed -% -/Color false def -/Blacktext false def -/Solid false def -/Dashlength 1 def -/Landscape true def -/Level1 false def -/Rounded false def -/TransparentPatterns false def -/gnulinewidth 5.000 def -/userlinewidth gnulinewidth def -% -/vshift -46 def -/dl1 { - 10.0 Dashlength mul mul - Rounded { currentlinewidth 0.75 mul sub dup 0 le { pop 0.01 } if } if -} def -/dl2 { - 10.0 Dashlength mul mul - Rounded { currentlinewidth 0.75 mul add } if -} def -/hpt_ 31.5 def -/vpt_ 31.5 def -/hpt hpt_ def -/vpt vpt_ def -Level1 {} { -/SDict 10 dict def -systemdict /pdfmark known not { - userdict /pdfmark systemdict /cleartomark get put -} if -SDict begin [ - /Title () - /Subject (gnuplot plot) - /Creator (gnuplot 4.2 patchlevel 0) - /Author (Martin Hansen) -% /Producer (gnuplot) -% /Keywords () - /CreationDate (Thu Nov 13 17:03:30 2008) - /DOCINFO pdfmark -end -} ifelse -% -% Gnuplot Prolog Version 4.2 (August 2006) -% -/M {moveto} bind def -/L {lineto} bind def -/R {rmoveto} bind def -/V {rlineto} bind def -/N {newpath moveto} bind def -/Z {closepath} bind def -/C {setrgbcolor} bind def -/f {rlineto fill} bind def -/vpt2 vpt 2 mul def -/hpt2 hpt 2 mul def -/Lshow {currentpoint stroke M 0 vshift R - Blacktext {gsave 0 setgray show grestore} {show} ifelse} def -/Rshow {currentpoint stroke M dup stringwidth pop neg vshift R - Blacktext {gsave 0 setgray show grestore} {show} ifelse} def -/Cshow {currentpoint stroke M dup stringwidth pop -2 div vshift R - Blacktext {gsave 0 setgray show grestore} {show} ifelse} def -/UP {dup vpt_ mul /vpt exch def hpt_ mul /hpt exch def - /hpt2 hpt 2 mul def /vpt2 vpt 2 mul def} def -/DL {Color {setrgbcolor Solid {pop []} if 0 setdash} - {pop pop pop 0 setgray Solid {pop []} if 0 setdash} ifelse} def -/BL {stroke userlinewidth 2 mul setlinewidth - Rounded {1 setlinejoin 1 setlinecap} if} def -/AL {stroke userlinewidth 2 div setlinewidth - Rounded {1 setlinejoin 1 setlinecap} if} def -/UL {dup gnulinewidth mul /userlinewidth exch def - dup 1 lt {pop 1} if 10 mul /udl exch def} def -/PL {stroke userlinewidth setlinewidth - Rounded {1 setlinejoin 1 setlinecap} if} def -% Default Line colors -/LCw {1 1 1} def -/LCb {0 0 0} def -/LCa {0 0 0} def -/LC0 {1 0 0} def -/LC1 {0 1 0} def -/LC2 {0 0 1} def -/LC3 {1 0 1} def -/LC4 {0 1 1} def -/LC5 {1 1 0} def -/LC6 {0 0 0} def -/LC7 {1 0.3 0} def -/LC8 {0.5 0.5 0.5} def -% Default Line Types -/LTw {PL [] 1 setgray} def -/LTb {BL [] LCb DL} def -/LTa {AL [1 udl mul 2 udl mul] 0 setdash LCa setrgbcolor} def -/LT0 {PL [] LC0 DL} def -/LT1 {PL [4 dl1 2 dl2] LC1 DL} def -/LT2 {PL [2 dl1 3 dl2] LC2 DL} def -/LT3 {PL [1 dl1 1.5 dl2] LC3 DL} def -/LT4 {PL [6 dl1 2 dl2 1 dl1 2 dl2] LC4 DL} def -/LT5 {PL [3 dl1 3 dl2 1 dl1 3 dl2] LC5 DL} def -/LT6 {PL [2 dl1 2 dl2 2 dl1 6 dl2] LC6 DL} def -/LT7 {PL [1 dl1 2 dl2 6 dl1 2 dl2 1 dl1 2 dl2] LC7 DL} def -/LT8 {PL [2 dl1 2 dl2 2 dl1 2 dl2 2 dl1 2 dl2 2 dl1 4 dl2] LC8 DL} def -/Pnt {stroke [] 0 setdash gsave 1 setlinecap M 0 0 V stroke grestore} def -/Dia {stroke [] 0 setdash 2 copy vpt add M - hpt neg vpt neg V hpt vpt neg V - hpt vpt V hpt neg vpt V closepath stroke - Pnt} def -/Pls {stroke [] 0 setdash vpt sub M 0 vpt2 V - currentpoint stroke M - hpt neg vpt neg R hpt2 0 V stroke - } def -/Box {stroke [] 0 setdash 2 copy exch hpt sub exch vpt add M - 0 vpt2 neg V hpt2 0 V 0 vpt2 V - hpt2 neg 0 V closepath stroke - Pnt} def -/Crs {stroke [] 0 setdash exch hpt sub exch vpt add M - hpt2 vpt2 neg V currentpoint stroke M - hpt2 neg 0 R hpt2 vpt2 V stroke} def -/TriU {stroke [] 0 setdash 2 copy vpt 1.12 mul add M - hpt neg vpt -1.62 mul V - hpt 2 mul 0 V - hpt neg vpt 1.62 mul V closepath stroke - Pnt} def -/Star {2 copy Pls Crs} def -/BoxF {stroke [] 0 setdash exch hpt sub exch vpt add M - 0 vpt2 neg V hpt2 0 V 0 vpt2 V - hpt2 neg 0 V closepath fill} def -/TriUF {stroke [] 0 setdash vpt 1.12 mul add M - hpt neg vpt -1.62 mul V - hpt 2 mul 0 V - hpt neg vpt 1.62 mul V closepath fill} def -/TriD {stroke [] 0 setdash 2 copy vpt 1.12 mul sub M - hpt neg vpt 1.62 mul V - hpt 2 mul 0 V - hpt neg vpt -1.62 mul V closepath stroke - Pnt} def -/TriDF {stroke [] 0 setdash vpt 1.12 mul sub M - hpt neg vpt 1.62 mul V - hpt 2 mul 0 V - hpt neg vpt -1.62 mul V closepath fill} def -/DiaF {stroke [] 0 setdash vpt add M - hpt neg vpt neg V hpt vpt neg V - hpt vpt V hpt neg vpt V closepath fill} def -/Pent {stroke [] 0 setdash 2 copy gsave - translate 0 hpt M 4 {72 rotate 0 hpt L} repeat - closepath stroke grestore Pnt} def -/PentF {stroke [] 0 setdash gsave - translate 0 hpt M 4 {72 rotate 0 hpt L} repeat - closepath fill grestore} def -/Circle {stroke [] 0 setdash 2 copy - hpt 0 360 arc stroke Pnt} def -/CircleF {stroke [] 0 setdash hpt 0 360 arc fill} def -/C0 {BL [] 0 setdash 2 copy moveto vpt 90 450 arc} bind def -/C1 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 0 90 arc closepath fill - vpt 0 360 arc closepath} bind def -/C2 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 90 180 arc closepath fill - vpt 0 360 arc closepath} bind def -/C3 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 0 180 arc closepath fill - vpt 0 360 arc closepath} bind def -/C4 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 180 270 arc closepath fill - vpt 0 360 arc closepath} bind def -/C5 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 0 90 arc - 2 copy moveto - 2 copy vpt 180 270 arc closepath fill - vpt 0 360 arc} bind def -/C6 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 90 270 arc closepath fill - vpt 0 360 arc closepath} bind def -/C7 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 0 270 arc closepath fill - vpt 0 360 arc closepath} bind def -/C8 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 270 360 arc closepath fill - vpt 0 360 arc closepath} bind def -/C9 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 270 450 arc closepath fill - vpt 0 360 arc closepath} bind def -/C10 {BL [] 0 setdash 2 copy 2 copy moveto vpt 270 360 arc closepath fill - 2 copy moveto - 2 copy vpt 90 180 arc closepath fill - vpt 0 360 arc closepath} bind def -/C11 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 0 180 arc closepath fill - 2 copy moveto - 2 copy vpt 270 360 arc closepath fill - vpt 0 360 arc closepath} bind def -/C12 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 180 360 arc closepath fill - vpt 0 360 arc closepath} bind def -/C13 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 0 90 arc closepath fill - 2 copy moveto - 2 copy vpt 180 360 arc closepath fill - vpt 0 360 arc closepath} bind def -/C14 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 90 360 arc closepath fill - vpt 0 360 arc} bind def -/C15 {BL [] 0 setdash 2 copy vpt 0 360 arc closepath fill - vpt 0 360 arc closepath} bind def -/Rec {newpath 4 2 roll moveto 1 index 0 rlineto 0 exch rlineto - neg 0 rlineto closepath} bind def -/Square {dup Rec} bind def -/Bsquare {vpt sub exch vpt sub exch vpt2 Square} bind def -/S0 {BL [] 0 setdash 2 copy moveto 0 vpt rlineto BL Bsquare} bind def -/S1 {BL [] 0 setdash 2 copy vpt Square fill Bsquare} bind def -/S2 {BL [] 0 setdash 2 copy exch vpt sub exch vpt Square fill Bsquare} bind def -/S3 {BL [] 0 setdash 2 copy exch vpt sub exch vpt2 vpt Rec fill Bsquare} bind def -/S4 {BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt Square fill Bsquare} bind def -/S5 {BL [] 0 setdash 2 copy 2 copy vpt Square fill - exch vpt sub exch vpt sub vpt Square fill Bsquare} bind def -/S6 {BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt vpt2 Rec fill Bsquare} bind def -/S7 {BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt vpt2 Rec fill - 2 copy vpt Square fill Bsquare} bind def -/S8 {BL [] 0 setdash 2 copy vpt sub vpt Square fill Bsquare} bind def -/S9 {BL [] 0 setdash 2 copy vpt sub vpt vpt2 Rec fill Bsquare} bind def -/S10 {BL [] 0 setdash 2 copy vpt sub vpt Square fill 2 copy exch vpt sub exch vpt Square fill - Bsquare} bind def -/S11 {BL [] 0 setdash 2 copy vpt sub vpt Square fill 2 copy exch vpt sub exch vpt2 vpt Rec fill - Bsquare} bind def -/S12 {BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill Bsquare} bind def -/S13 {BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill - 2 copy vpt Square fill Bsquare} bind def -/S14 {BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill - 2 copy exch vpt sub exch vpt Square fill Bsquare} bind def -/S15 {BL [] 0 setdash 2 copy Bsquare fill Bsquare} bind def -/D0 {gsave translate 45 rotate 0 0 S0 stroke grestore} bind def -/D1 {gsave translate 45 rotate 0 0 S1 stroke grestore} bind def -/D2 {gsave translate 45 rotate 0 0 S2 stroke grestore} bind def -/D3 {gsave translate 45 rotate 0 0 S3 stroke grestore} bind def -/D4 {gsave translate 45 rotate 0 0 S4 stroke grestore} bind def -/D5 {gsave translate 45 rotate 0 0 S5 stroke grestore} bind def -/D6 {gsave translate 45 rotate 0 0 S6 stroke grestore} bind def -/D7 {gsave translate 45 rotate 0 0 S7 stroke grestore} bind def -/D8 {gsave translate 45 rotate 0 0 S8 stroke grestore} bind def -/D9 {gsave translate 45 rotate 0 0 S9 stroke grestore} bind def -/D10 {gsave translate 45 rotate 0 0 S10 stroke grestore} bind def -/D11 {gsave translate 45 rotate 0 0 S11 stroke grestore} bind def -/D12 {gsave translate 45 rotate 0 0 S12 stroke grestore} bind def -/D13 {gsave translate 45 rotate 0 0 S13 stroke grestore} bind def -/D14 {gsave translate 45 rotate 0 0 S14 stroke grestore} bind def -/D15 {gsave translate 45 rotate 0 0 S15 stroke grestore} bind def -/DiaE {stroke [] 0 setdash vpt add M - hpt neg vpt neg V hpt vpt neg V - hpt vpt V hpt neg vpt V closepath stroke} def -/BoxE {stroke [] 0 setdash exch hpt sub exch vpt add M - 0 vpt2 neg V hpt2 0 V 0 vpt2 V - hpt2 neg 0 V closepath stroke} def -/TriUE {stroke [] 0 setdash vpt 1.12 mul add M - hpt neg vpt -1.62 mul V - hpt 2 mul 0 V - hpt neg vpt 1.62 mul V closepath stroke} def -/TriDE {stroke [] 0 setdash vpt 1.12 mul sub M - hpt neg vpt 1.62 mul V - hpt 2 mul 0 V - hpt neg vpt -1.62 mul V closepath stroke} def -/PentE {stroke [] 0 setdash gsave - translate 0 hpt M 4 {72 rotate 0 hpt L} repeat - closepath stroke grestore} def -/CircE {stroke [] 0 setdash - hpt 0 360 arc stroke} def -/Opaque {gsave closepath 1 setgray fill grestore 0 setgray closepath} def -/DiaW {stroke [] 0 setdash vpt add M - hpt neg vpt neg V hpt vpt neg V - hpt vpt V hpt neg vpt V Opaque stroke} def -/BoxW {stroke [] 0 setdash exch hpt sub exch vpt add M - 0 vpt2 neg V hpt2 0 V 0 vpt2 V - hpt2 neg 0 V Opaque stroke} def -/TriUW {stroke [] 0 setdash vpt 1.12 mul add M - hpt neg vpt -1.62 mul V - hpt 2 mul 0 V - hpt neg vpt 1.62 mul V Opaque stroke} def -/TriDW {stroke [] 0 setdash vpt 1.12 mul sub M - hpt neg vpt 1.62 mul V - hpt 2 mul 0 V - hpt neg vpt -1.62 mul V Opaque stroke} def -/PentW {stroke [] 0 setdash gsave - translate 0 hpt M 4 {72 rotate 0 hpt L} repeat - Opaque stroke grestore} def -/CircW {stroke [] 0 setdash - hpt 0 360 arc Opaque stroke} def -/BoxFill {gsave Rec 1 setgray fill grestore} def -/Density { - /Fillden exch def - currentrgbcolor - /ColB exch def /ColG exch def /ColR exch def - /ColR ColR Fillden mul Fillden sub 1 add def - /ColG ColG Fillden mul Fillden sub 1 add def - /ColB ColB Fillden mul Fillden sub 1 add def - ColR ColG ColB setrgbcolor} def -/BoxColFill {gsave Rec PolyFill} def -/PolyFill {gsave Density fill grestore grestore} def -/h {rlineto rlineto rlineto gsave fill grestore} bind def -% -% PostScript Level 1 Pattern Fill routine for rectangles -% Usage: x y w h s a XX PatternFill -% x,y = lower left corner of box to be filled -% w,h = width and height of box -% a = angle in degrees between lines and x-axis -% XX = 0/1 for no/yes cross-hatch -% -/PatternFill {gsave /PFa [ 9 2 roll ] def - PFa 0 get PFa 2 get 2 div add PFa 1 get PFa 3 get 2 div add translate - PFa 2 get -2 div PFa 3 get -2 div PFa 2 get PFa 3 get Rec - gsave 1 setgray fill grestore clip - currentlinewidth 0.5 mul setlinewidth - /PFs PFa 2 get dup mul PFa 3 get dup mul add sqrt def - 0 0 M PFa 5 get rotate PFs -2 div dup translate - 0 1 PFs PFa 4 get div 1 add floor cvi - {PFa 4 get mul 0 M 0 PFs V} for - 0 PFa 6 get ne { - 0 1 PFs PFa 4 get div 1 add floor cvi - {PFa 4 get mul 0 2 1 roll M PFs 0 V} for - } if - stroke grestore} def -% -/languagelevel where - {pop languagelevel} {1} ifelse - 2 lt - {/InterpretLevel1 true def} - {/InterpretLevel1 Level1 def} - ifelse -% -% PostScript level 2 pattern fill definitions -% -/Level2PatternFill { -/Tile8x8 {/PaintType 2 /PatternType 1 /TilingType 1 /BBox [0 0 8 8] /XStep 8 /YStep 8} - bind def -/KeepColor {currentrgbcolor [/Pattern /DeviceRGB] setcolorspace} bind def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 0 0 M 8 8 L 0 8 M 8 0 L stroke} ->> matrix makepattern -/Pat1 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 0 0 M 8 8 L 0 8 M 8 0 L stroke - 0 4 M 4 8 L 8 4 L 4 0 L 0 4 L stroke} ->> matrix makepattern -/Pat2 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 0 0 M 0 8 L - 8 8 L 8 0 L 0 0 L fill} ->> matrix makepattern -/Pat3 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop -4 8 M 8 -4 L - 0 12 M 12 0 L stroke} ->> matrix makepattern -/Pat4 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop -4 0 M 8 12 L - 0 -4 M 12 8 L stroke} ->> matrix makepattern -/Pat5 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop -2 8 M 4 -4 L - 0 12 M 8 -4 L 4 12 M 10 0 L stroke} ->> matrix makepattern -/Pat6 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop -2 0 M 4 12 L - 0 -4 M 8 12 L 4 -4 M 10 8 L stroke} ->> matrix makepattern -/Pat7 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 8 -2 M -4 4 L - 12 0 M -4 8 L 12 4 M 0 10 L stroke} ->> matrix makepattern -/Pat8 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 0 -2 M 12 4 L - -4 0 M 12 8 L -4 4 M 8 10 L stroke} ->> matrix makepattern -/Pat9 exch def -/Pattern1 {PatternBgnd KeepColor Pat1 setpattern} bind def -/Pattern2 {PatternBgnd KeepColor Pat2 setpattern} bind def -/Pattern3 {PatternBgnd KeepColor Pat3 setpattern} bind def -/Pattern4 {PatternBgnd KeepColor Landscape {Pat5} {Pat4} ifelse setpattern} bind def -/Pattern5 {PatternBgnd KeepColor Landscape {Pat4} {Pat5} ifelse setpattern} bind def -/Pattern6 {PatternBgnd KeepColor Landscape {Pat9} {Pat6} ifelse setpattern} bind def -/Pattern7 {PatternBgnd KeepColor Landscape {Pat8} {Pat7} ifelse setpattern} bind def -} def -% -% -%End of PostScript Level 2 code -% -/PatternBgnd { - TransparentPatterns {} {gsave 1 setgray fill grestore} ifelse -} def -% -% Substitute for Level 2 pattern fill codes with -% grayscale if Level 2 support is not selected. -% -/Level1PatternFill { -/Pattern1 {0.250 Density} bind def -/Pattern2 {0.500 Density} bind def -/Pattern3 {0.750 Density} bind def -/Pattern4 {0.125 Density} bind def -/Pattern5 {0.375 Density} bind def -/Pattern6 {0.625 Density} bind def -/Pattern7 {0.875 Density} bind def -} def -% -% Now test for support of Level 2 code -% -Level1 {Level1PatternFill} {Level2PatternFill} ifelse -% -/Symbol-Oblique /Symbol findfont [1 0 .167 1 0 0] makefont -dup length dict begin {1 index /FID eq {pop pop} {def} ifelse} forall -currentdict end definefont pop -end -%%EndProlog -%%Page: 1 1 -gnudict begin -gsave -50 50 translate -0.100 0.100 scale -90 rotate -0 -5040 translate -0 setgray -newpath -(Helvetica) findfont 140 scalefont setfont -gsave % colour palette begin -/maxcolors 0 def -/HSV2RGB { exch dup 0.0 eq {pop exch pop dup dup} % achromatic gray - { /HSVs exch def /HSVv exch def 6.0 mul dup floor dup 3 1 roll sub - /HSVf exch def /HSVi exch cvi def /HSVp HSVv 1.0 HSVs sub mul def - /HSVq HSVv 1.0 HSVs HSVf mul sub mul def - /HSVt HSVv 1.0 HSVs 1.0 HSVf sub mul sub mul def - /HSVi HSVi 6 mod def 0 HSVi eq {HSVv HSVt HSVp} - {1 HSVi eq {HSVq HSVv HSVp}{2 HSVi eq {HSVp HSVv HSVt} - {3 HSVi eq {HSVp HSVq HSVv}{4 HSVi eq {HSVt HSVp HSVv} - {HSVv HSVp HSVq} ifelse} ifelse} ifelse} ifelse} ifelse - } ifelse} def -/Constrain { - dup 0 lt {0 exch pop}{dup 1 gt {1 exch pop} if} ifelse} def -/YIQ2RGB { - 3 copy -1.702 mul exch -1.105 mul add add Constrain 4 1 roll - 3 copy -0.647 mul exch -0.272 mul add add Constrain 5 1 roll - 0.621 mul exch -0.956 mul add add Constrain 3 1 roll } def -/CMY2RGB { 1 exch sub exch 1 exch sub 3 2 roll 1 exch sub 3 1 roll exch } def -/XYZ2RGB { 3 copy -0.9017 mul exch -0.1187 mul add exch 0.0585 mul exch add - Constrain 4 1 roll 3 copy -0.0279 mul exch 1.999 mul add exch - -0.9844 mul add Constrain 5 1 roll -0.2891 mul exch -0.5338 mul add - exch 1.91 mul exch add Constrain 3 1 roll} def -/SelectSpace {ColorSpace (HSV) eq {HSV2RGB}{ColorSpace (XYZ) eq { - XYZ2RGB}{ColorSpace (CMY) eq {CMY2RGB}{ColorSpace (YIQ) eq {YIQ2RGB} - if} ifelse} ifelse} ifelse} def -/InterpolatedColor false def -/cF7 {sqrt} bind def % sqrt(x) -/cF5 {dup dup mul mul} bind def % x^3 -/cF15 {360 mul sin} bind def % sin(360x) -/pm3dround {maxcolors 0 gt {dup 1 ge - {pop 1} {maxcolors mul floor maxcolors 1 sub div} ifelse} if} def -/pm3dGamma 1.0 1.5 div def -/ColorSpace (RGB) def -Color true and { % COLOUR vs. GRAY map - InterpolatedColor { %% Interpolation vs. RGB-Formula - /g {stroke pm3dround /grayv exch def interpolate - SelectSpace setrgbcolor} bind def - }{ - /g {stroke pm3dround dup cF7 Constrain exch dup cF5 Constrain exch cF15 Constrain - SelectSpace setrgbcolor} bind def - } ifelse -}{ - /g {stroke pm3dround pm3dGamma exp setgray} bind def -} ifelse -1.000 UL -LTb -1.000 UL -LTa -1113 483 M -5849 0 V -stroke -LTb -1113 483 M --63 0 V -5912 0 R -63 0 V -966 483 M -( 0) Rshow -1.000 UL -LTb -1.000 UL -LTa -1113 979 M -5849 0 V -stroke -LTb -1113 979 M --63 0 V -5912 0 R -63 0 V -966 979 M -( 200000) Rshow -1.000 UL -LTb -1.000 UL -LTa -1113 1475 M -5849 0 V -stroke -LTb -1113 1475 M --63 0 V -5912 0 R -63 0 V --6059 0 R -( 400000) Rshow -1.000 UL -LTb -1.000 UL -LTa -1113 1971 M -5849 0 V -stroke -LTb -1113 1971 M --63 0 V -5912 0 R -63 0 V --6059 0 R -( 600000) Rshow -1.000 UL -LTb -1.000 UL -LTa -1113 2467 M -5849 0 V -stroke -LTb -1113 2467 M --63 0 V -5912 0 R -63 0 V --6059 0 R -( 800000) Rshow -1.000 UL -LTb -1.000 UL -LTa -1113 2963 M -5849 0 V -stroke -LTb -1113 2963 M --63 0 V -5912 0 R -63 0 V --6059 0 R -( 1e+06) Rshow -1.000 UL -LTb -1.000 UL -LTa -1113 3460 M -5849 0 V -stroke -LTb -1113 3460 M --63 0 V -5912 0 R -63 0 V --6059 0 R -( 1.2e+06) Rshow -1.000 UL -LTb -1.000 UL -LTa -1113 3956 M -5849 0 V -stroke -LTb -1113 3956 M --63 0 V -5912 0 R -63 0 V --6059 0 R -( 1.4e+06) Rshow -1.000 UL -LTb -1.000 UL -LTa -1113 4452 M -5849 0 V -stroke -LTb -1113 4452 M --63 0 V -5912 0 R -63 0 V --6059 0 R -( 1.6e+06) Rshow -1.000 UL -LTb -1.000 UL -LTa -1113 483 M -0 4137 V -stroke -LTb -1113 483 M -0 -63 V -0 4200 R -0 63 V -0 -4403 R -( 0) Cshow -1.000 UL -LTb -1.000 UL -LTa -1825 483 M -0 4137 V -stroke -LTb -1825 483 M -0 -63 V -0 4200 R -0 63 V -0 -4403 R -( 200000) Cshow -1.000 UL -LTb -1.000 UL -LTa -2537 483 M -0 4137 V -stroke -LTb -2537 483 M -0 -63 V -0 4200 R -0 63 V -0 -4403 R -( 400000) Cshow -1.000 UL -LTb -1.000 UL -LTa -3249 483 M -0 4137 V -stroke -LTb -3249 483 M -0 -63 V -0 4200 R -0 63 V -0 -4403 R -( 600000) Cshow -1.000 UL -LTb -1.000 UL -LTa -3961 483 M -0 4137 V -stroke -LTb -3961 483 M -0 -63 V -0 4200 R -0 63 V -0 -4403 R -( 800000) Cshow -1.000 UL -LTb -1.000 UL -LTa -4673 483 M -0 4137 V -stroke -LTb -4673 483 M -0 -63 V -0 4200 R -0 63 V -0 -4403 R -( 1e+06) Cshow -1.000 UL -LTb -1.000 UL -LTa -5385 483 M -0 4137 V -stroke -LTb -5385 483 M -0 -63 V -0 4200 R -0 63 V -0 -4403 R -( 1.2e+06) Cshow -1.000 UL -LTb -1.000 UL -LTa -6096 483 M -0 4137 V -stroke -LTb -6096 483 M -0 -63 V -0 4200 R -0 63 V -0 -4403 R -( 1.4e+06) Cshow -1.000 UL -LTb -1.000 UL -LTa -6808 483 M -0 4137 V -stroke -LTb -6808 483 M -0 -63 V -0 4200 R -0 63 V -0 -4403 R -( 1.6e+06) Cshow -1.000 UL -LTb -1.000 UL -LTb -1113 4620 N -0 -4137 V -5849 0 V -0 4137 V --5849 0 V -Z stroke -LCb setrgbcolor -140 2551 M -currentpoint gsave translate 90 rotate 0 0 M -(Helicobacter_pylori_26695) Cshow -grestore -LTb -LCb setrgbcolor -4037 70 M -(Helicobacter_pylori_J99 ) Cshow -LTb -4037 4830 M -(plot_matches) Cshow -1.000 UP -1.000 UL -LTb -2.000 UL -LT0 -0.00 1.00 0.00 C /Helvetica findfont 140 scalefont setfont -1113 483 M -0 1 V -1 0 V -1 1 V -1 0 R -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V --1 1661 R -1 -1660 R -1 0 V -0 1 V -1 0 V -1 1 R -0 1 R -0 -1 R -0 3169 R -0 -3169 R -1 0 V -0 3170 R -0 -3170 R -0 1 V -0 -1 R -0 1 R -1 0 V -0 3255 R -0 -3254 R -1 0 V -0 3254 R -0 1 R -0 -919 R -0 -1527 R -1 1 R -0 2445 R -0 -918 R -0 -2337 R -0 2337 R -0 -1527 R -0 2446 R -0 -919 R -0 -2337 R -1 0 V -0 45 R -0 534 R -0 -9 R -0 -525 R -0 2293 R -0 -1768 R -0 1350 R -0 -1920 R -0 1 V -0 2337 R -0 918 R -0 -2445 R -0 -810 R -1 0 R -0 570 R -0 -570 R -0 2338 R -0 918 R -0 -2446 R -0 1109 R -0 -1919 R -0 3256 R -0 -918 R -0 -1528 R -0 1 V -0 -811 R -1 1 V -1 0 V -0 1 V -0 3543 R -0 -3543 R -0 810 R -1 0 V --1 -810 R -1 0 V -0 1 R -1 0 V -0 3256 R -0 -3256 R -1 0 V --1 45 R -1 -45 R -0 1 R -0 3071 R -0 -734 R -0 -2292 R -0 765 R -0 2446 R -0 -3211 R -0 1874 R -0 -1919 R -1 0 V --1 2338 R -1 -2338 R -0 2338 R -0 -2338 R -1140 502 L -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -0 1146 R -0 -1146 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -0 617 R -0 -617 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -0 3671 R -0 272 R -0 -3943 R -1 3943 R -0 -3943 R -0 1 V -1 765 R -0 2445 R -0 -918 R -0 -2337 R -0 3256 R -0 -2446 R -0 -765 R -0 3027 R -0 -735 R -1197 2834 L -0 -2338 R -0 579 R -0 -9 R -0 -525 R -0 525 R -0 -525 R -0 2293 R -1 0 V --1 918 R -1 0 V --1 -2445 R -1 0 V --1 -232 R -1 1 V --1 -580 R -1 1 V -0 44 R -0 1 V -0 3026 R -0 -3026 R -0 2292 R -0 919 R -0 -2446 R -0 1109 R -0 -1874 R -1 0 V --1 534 R -1 0 V --1 2677 R -1 0 V --1 -918 R -1 0 V --1 -1528 R -1 0 V -0 -765 R -0 1 V -1 0 R -0 765 R -0 1528 R -0 -2293 R -1 0 V -0 1 V -1 1 R -0 2292 R -1 1769 R -0 -4061 R -0 2293 R -0 918 R -0 -2445 R -0 -766 R -0 -45 R -0 3257 R -0 -3212 R -0 1 V -1 0 R -0 2292 R -0 -2337 R -0 45 R -0 1874 R -0 -1874 R -0 -45 R -0 45 R -1 0 V -0 1 V -0 1873 R -0 1688 R -0 -3561 R -1 0 R -0 3717 R -0 -664 R -0 -2694 R -0 735 R -1 -1093 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 R -3 1018 R -1 -1015 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 R -1 1 R -1 1 R -1 0 R -0 1 R -1 0 R -0 1 R -1 0 R -1235 567 L -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 R -1 0 R -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -0 3260 R -0 -3260 R -1 0 R -0 1 V -1 0 V -1 1 R -0 2200 R -0 -2200 R -1 0 R -0 1 R -1 0 R -1 1 V -1 0 V -0 1 R -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 4 R -0 1 V -1 0 V -8 0 R -1 9 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1307 625 L -1 1 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -0 2314 R -0 -2314 R -0 1181 R -0 -1181 R -1 0 V -0 1 V -1 0 V -1 3 R -0 -2 R -0 2 R -0 -2 R -1 2 R -0 -2 R -1 1 R -1 0 R -0 1 R -2 1 R -1 1 R -1 0 V -0 2749 R -0 -2749 R -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 R -1 0 R -0 1 R -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -1 1 R -1 1 R -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -1 1 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 R -1373 670 L -0 1 V -1 0 V -1 1 V -1 0 R -0 1 R -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 R -0 2684 R -0 1 V -1 -2685 R -1 1 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 R -1 1 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -0 3 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 R -1 1 R -1 1 V -0 55 R -0 1303 R -0 -1358 R -1 0 V -0 1 V -1 0 V -0 3879 R -0 -3879 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -1 1 V -1 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -0 2721 R -0 -703 R -0 -2018 R -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -1436 718 L -1 0 V -0 2478 R -0 1 V -0 -2479 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -1 1 V -0 666 R -0 -666 R -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 3069 R -0 -3069 R -1 1 V -0 1537 R -0 -1537 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1303 R -0 -1358 R -0 55 R -0 1 V -0 -56 R -0 1358 R -0 -1302 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1495 758 L -0 1 V -1 0 R -0 1 R -1 0 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 R -1 1 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -1 1 R -1 0 R -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -0 2857 R -0 -2857 R -1 1 V -1 0 V -0 1 R -1 0 V -1 1 R -1 0 V -0 1 R -1 0 R -0 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 1580 R -0 -1580 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -0 1 V -0 3264 R -1 -3264 R -1 0 V -0 1 R -1 0 V -0 2800 R -0 4 R -0 -2800 R -0 4 R -0 2792 R -0 4 R -0 -2800 R -0 4 R -0 -8 R -1554 801 L -0 2799 R -1 1 V --1 3 R -1 0 V -1554 804 M -1 0 V --1 4 R -1 0 V --1 -7 R -1 0 V -0 2803 R -0 -3 R -0 -2797 R -0 4 R -0 1 V -0 2795 R -0 -2803 R -0 3 R -0 1 V -0 4 R -0 2795 R -0 -3 R -0 -2800 R -0 4 R -0 4 R -0 2792 R -0 -2800 R -0 4 R -0 4 R -0 2795 R -0 1 V -0 -4 R -1 0 V -1555 805 M -0 4 R -0 -8 R -1 0 V --1 2804 R -0 -2800 R -0 4 R -0 -4 R -1 0 V --1 4 R -1 0 V --1 2796 R -1 0 V -0 -2803 R -0 2800 R -0 -2797 R -0 2800 R -0 -2796 R -0 1 V -1 0 V --1 2792 R -0 -2797 R -0 1 R -1 2799 R -0 -2795 R -0 -4 R -0 2799 R -0 -3 R -0 -2792 R -0 2795 R -0 -2795 R -0 2792 R -0 -2800 R -0 2800 R -0 -2800 R -0 8 R -1 0 V --1 2795 R -0 1 V -1 0 V --1 -4 R -1 1 V -1557 802 M -1 1 V -0 2803 R -0 -2796 R -0 1 V -0 2792 R -0 -2800 R -0 2803 R -0 -3 R -0 -2800 R -0 8 R -0 2795 R -0 -2795 R -0 2795 R -1 0 V --1 -3 R -1 0 V -1558 803 M -1 0 V --1 8 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 1 V -1566 816 L -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -0 1121 R -1 0 V -0 -860 R -0 -261 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 2 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -1 0 R -0 1 R -0 639 R -1 -639 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 1 R -1 0 V -0 1 R -0 4 R -0 -4 R -0 4 R -0 -4 R -1 0 V -0 4 R -0 -4 R -0 1 V -1 4 R -0 -4 R -1 0 R -0 1 V -0 4 R -0 -4 R -1628 859 L -0 1 R -1 0 V --1 4 R -1 -4 R -1 1 V -0 -4 R -0 4 R -1 0 V -0 -4 R -0 4 R -0 -4 R -0 4 R -0 1 V -0 -5 R -0 1 V -0 4 R -1 0 V -0 -4 R -0 4 R -1 1 V -0 -4 R -0 4 R -1 0 V -0 1 V -0 -5 R -0 5 R -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 0 R -0 1 R -1 0 R -0 1 V -3 0 R -1 1 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1686 898 L -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -0 -1 R -0 1 R -1 0 R -0 1 R -1 0 R -1 1951 R -1 -1950 R -0 1982 R -1 -1981 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 2068 R -1 -2068 R -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 831 R -0 -831 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -1 1 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 R -11 0 R -0 1 R -1 0 V -0 1 V -1 0 R -1760 943 L -0 1 V -1 0 R -0 1 V -1 0 V -0 1056 R -0 -1056 R -0 3350 R -0 -1931 R -0 -1419 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 1 R -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 1 R -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 R -1 0 R -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 1 R -1 0 V -0 1 R -1824 988 L -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 0 R -0 1 R -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 V -2 0 R -0 1 V -1 0 V -0 1 V -1 0 R -0 3305 R -0 -3305 R -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1389 R -0 -1389 R -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1679 R -0 -1679 R -0 1 V -0 2255 R -0 -2255 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -0 2332 R -0 -2332 R -1 0 V -0 2333 R -0 -2333 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -1884 1029 L -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -0 1628 R -0 -1628 R -1 0 R -0 1 R -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -0 1188 R -0 -1188 R -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 R -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -0 1792 R -0 572 R -0 -702 R -0 -1662 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 R -1 0 V -0 1515 R -0 -1514 R -0 -525 R -0 -45 R -0 3072 R -0 -1153 R -0 1337 R -0 -2446 R -0 1527 R -0 1 V -0 -1759 R -0 2493 R -1939 3568 L -0 -2502 R -0 1 V -0 2501 R -0 -2501 R -0 1767 R -0 919 R -0 -2446 R -0 -240 R -1 0 V --1 2502 R -1 0 V --1 -735 R -1 0 V -0 -1767 R -0 -570 R -0 3072 R -0 -3072 R -0 3256 R -0 -918 R -0 -1528 R -0 -765 R -0 534 R -0 -9 R -1 1 V -1 0 R -0 1 R -1 0 R -0 1 R -1 0 V -1 1 R -0 2501 R -0 -734 R -0 -2337 R -0 570 R -1 0 R -0 -525 R -0 525 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 -9 R -0 9 R -0 1341 R -0 -1341 R -0 1 V -1 0 V -0 -9 R -0 9 R -0 -534 R -0 3211 R -0 -918 R -0 -1528 R -0 -231 R -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 233 R -0 2446 R -0 -2679 R -0 -577 R -0 577 R -0 1 V -0 1341 R -0 418 R -1 -1759 R -0 1341 R -0 1152 R -0 -2493 R -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -0 251 R -0 -251 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 3355 R -0 -2043 R -0 1220 R -0 -2531 R -0 1703 R -0 -1703 R -1973 1090 L -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -0 419 R -0 -419 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -0 1296 R -0 -1294 R -0 1 V -0 -3 R -1 1 V -1 0 R -0 1 V -1 0 R -0 1 V -0 -3 R -0 1 V -0 2 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -2034 1132 L -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -1 1 V -1 0 V --1 1929 R -1 0 V -0 -1929 R -0 1 V -0 1928 R -0 -1928 R -0 1928 R -0 -1928 R -0 1928 R -0 1 V -0 -1929 R -1 0 V -0 1 R -1 0 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -0 1925 R -0 -1925 R -0 1925 R -0 -1925 R -0 1925 R -0 -1925 R -1 0 V -0 1 V -1 0 V --1 -617 R -1 617 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 -595 R -0 595 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 1 R -2089 1171 L -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 -338 R -0 338 R -0 1 R -1 0 V -0 1577 R -0 -1577 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 R -1 0 R -0 1 R -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -0 2173 R -0 -2173 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 3369 R -0 -3368 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -2152 1214 L -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 1 R -1 0 R -0 1 V -1 0 V -1 1 R -1 0 V -0 1 R -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -2 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 -124 R -0 124 R -0 1 V -1 0 V -1 1 R -1 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -2218 1260 L -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 810 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -2287 2117 L -0 -805 R -1 1 V -1 0 V -0 1 V -1 0 R -0 1584 R -1 -173 R -0 173 R -0 -1583 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -1 1 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 R -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 R -3 1 R -1 0 V -0 1 R -0 2352 R -0 -2563 R -0 211 R -1 0 R -0 1 V -1 0 V -1 9 R -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -2355 1367 L -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -0 -300 R -1 301 R -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -38 2926 R -72 144 R -13 -838 R -38 -2238 R -0 1 V -17 1754 R -0 -2456 R -22 1310 R -7 1398 R -2617 580 M -42 2455 R -29 -1362 R -1 0 V -0 1 V -1 0 V -2 1 R -1 1 R -1 0 V -0 1 V -1 0 V --1 0 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 0 R -0 1 V -2 1 R -1 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -0 2436 R -1 -2436 R -0 1 V -1 0 R -2710 1687 L -0 1 R -1 0 R -0 1 V -1 0 R -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -0 1727 R -0 1 V -0 -1934 R -0 206 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -0 1210 R -1 -1210 R -0 1211 R -0 -1211 R -0 1211 R -0 -1211 R -0 1211 R -1 -1210 R -0 1210 R -0 -1210 R -0 1210 R -1 -1210 R -0 1211 R -0 -1211 R -0 1211 R -0 -1211 R -0 1 V -0 1210 R -0 -1210 R -0 1210 R -1 -1210 R -0 1211 R -0 -1211 R -0 1211 R -0 -1211 R -0 1211 R -0 -1211 R -0 1211 R -0 -1211 R -0 1211 R -0 -1211 R -2761 1724 L -0 1210 R -0 -1210 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 1 R -1 0 R -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 4 R -0 -3 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -2 2 R -1 0 R -0 1 R -1 0 V -0 1 R -1 0 R -1 1 V -1 0 R -0 1 R -1 -5 R -1 0 V -0 1 R -1 5 R -1 1 R -1 2 R -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -0 2284 R -0 -2284 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -2827 1771 L -1 0 R -1 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 1 V -1 0 V -1 0 V -0 1 R -0 1073 R -0 -1073 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -0 2768 R -0 -2768 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -0 -1 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 1 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -2892 1815 L -0 1 V -2 1 R -0 -1181 R -1 1434 R -1 1357 R -0 324 R -0 -3045 R -1 1112 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -0 -615 R -0 2065 R -0 -1450 R -1 0 V -1 1 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -1 1 V -1 0 R -1 0 V -0 1 R -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 R -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -1 1 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -2963 1862 L -0 3 R -0 -1 R -0 2 R -0 1 R -0 -3 R -0 -2 R -0 1 R -0 1 R -0 3 R -0 -1 R -0 -3 R -0 -1 R -0 1 R -0 1 R -1 3 R -0 -1 R -0 -3 R -0 -1 R -0 1 R -0 1 R -0 3 R -0 -1 R -0 -1 R -0 1 R -0 -3 R -0 2 R -0 -1 R -0 2 R -0 1 R -0 -5 R -0 1 R -0 1 R -0 3 R -0 -2 R -0 1 V -0 -1 R -0 -2 R -0 -1 R -0 2 R -0 -2 R -0 1 R -0 -1 R -0 1 R -0 4 R -0 -2 R -0 -1 R -0 2 R -0 1 R -0 -2 R -0 1 R -0 -3 R -0 -1 R -0 1 R -0 1 R -1 -1 R -0 1 V -0 3 R -0 -1 R -0 -2 R -0 1 V -0 2 R -0 -1 R -0 1 V -0 -2 R -0 2 R -0 -3 R -0 1 R -0 2 R -0 1 R -0 -2 R -0 1 R -0 -5 R -0 1 R -0 1 R -0 1 R -0 -1 R -0 2 R -0 1 R -0 -3 R -0 3 R -0 -1 R -0 -3 R -0 -1 R -0 1 R -0 1 R -0 1 R -1 0 V --1 -1 R -1 0 V --1 2 R -1 0 V --1 1 R -1 0 V --1 -3 R -1 0 V --1 -2 R -1 0 V --1 1 R -1 0 V --1 1 R -1 0 V --1 3 R -1 0 V -0 -1 R -0 -3 R -0 -1 R -0 1 R -0 1 R -0 1 R -0 -1 R -0 2 R -0 -3 R -0 2 R -0 2 R -0 -3 R -2966 1865 L -0 -3 R -0 5 R -0 -1 R -0 -3 R -0 1 V -0 1 R -0 1 R -0 1 V -0 -2 R -0 2 R -0 -3 R -0 1 R -0 2 R -0 1 R -0 -2 R -1 -1 R -0 -1 R -0 -2 R -0 1 R -0 2 R -0 -1 R -0 2 R -0 1 R -0 -3 R -0 3 R -0 -1 R -0 -3 R -0 -1 R -0 1 R -0 1 R -0 1 R -0 -1 R -0 2 R -0 -3 R -0 2 R -0 2 R -0 -2 R -0 1 V -0 -1 R -0 -2 R -0 -1 R -0 2 R -0 3 R -0 -5 R -0 1 R -0 1 R -0 1 R -0 -1 R -0 2 R -0 1 R -0 -3 R -0 3 R -1 0 V --1 -1 R -1 0 V --1 -3 R -1 0 V --1 -1 R -1 0 V --1 1 R -1 0 V --1 1 R -1 0 V --1 0 R -1 0 V -0 1 R -0 -1 R -0 2 R -0 -3 R -0 2 R -0 2 R -0 -2 R -0 1 V -0 -1 R -0 -2 R -0 -1 R -0 2 R -0 3 R -0 -5 R -0 1 R -0 1 R -0 1 R -0 -1 R -0 2 R -0 1 R -0 -3 R -0 3 R -0 -1 R -0 -3 R -0 -1 R -0 1 R -0 1 R -1 1 R -0 -1 R -0 2 R -0 -3 R -0 2 R -0 2 R -0 -3 R -0 1 V -0 -3 R -0 5 R -0 -1 R -0 -3 R -0 2 R -0 2 R -0 1 R -0 -2 R -0 -2 R -0 1 R -0 -1 R -0 -2 R -0 1 R -0 2 R -0 -1 R -0 2 R -0 1 R -0 -3 R -1 3 R -0 -2 R -2970 1866 L -0 -1 R -0 1 R -0 1 R -0 -1 R -0 -1 R -0 1 R -0 -3 R -0 2 R -0 -1 R -0 2 R -0 1 R -0 -5 R -0 1 R -0 1 R -0 3 R -0 -2 R -0 1 V -0 -1 R -0 -2 R -0 -1 R -0 1 R -0 1 R -0 1 R -0 -1 R -0 2 R -0 -4 R -0 1 R -0 2 R -0 -1 R -0 2 R -0 1 R -0 -3 R -0 2 R -0 -1 R -0 1 R -0 -3 R -0 2 R -0 -1 R -0 2 R -0 -2 R -0 3 R -1 0 V -0 -5 R -0 5 R -0 1 R -0 -2 R -0 -2 R -0 1 R -0 -1 R -0 -2 R -0 1 R -0 2 R -0 -1 R -0 2 R -0 1 R -0 -3 R -0 3 R -0 -1 R -0 -3 R -0 -1 R -0 1 R -0 1 R -0 3 R -0 1 V -0 -1 R -1 0 V --1 1 R -1 0 V --1 0 R -1 0 V --1 -2 R -1 0 V --1 -2 R -1 0 V --1 1 R -1 0 V -0 3 R -0 -1 R -0 1 R -0 -2 R -0 -2 R -0 1 R -0 3 R -1 0 V -0 1 V -1 0 V -0 1 V -0 2 R -1 0 V --1 -1 R -1 0 V --1 -1 R -1 0 V -0 2 R -0 -1 R -0 -1 R -0 2 R -0 -1 R -0 -1 R -0 2 R -0 -1 R -0 -1 R -0 1 R -0 -1 R -0 1 R -0 -1 R -1 1 R -0 -1 R -0 1 R -0 -1 R -2976 1871 L -0 1 V -0 -1 R -0 1 R -0 -1 R -0 -1 R -0 2 R -0 -1 R -0 -1 R -0 2 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 R -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -0 2373 R -0 -2373 R -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -3037 1914 L -1 0 R -0 1 V -1 0 V -2 1 V -0 1 R -1 0 R -1 0 V -0 1 V -0 1583 R -0 -1583 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 1 R -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -0 2 R -1 0 V --1 -2 R -1 0 V -0 1 R -1 0 R -0 1 R -0 -1 R -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 26 R -0 1178 R -0 -1596 R -0 1 V -0 417 R -0 1178 R -0 -1595 R -0 393 R -1 0 V -1 1 V -0 -93 R -1 93 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -3096 1956 L -1 1 V -0 1057 R -0 -1057 R -0 1057 R -0 1 V -0 -1058 R -2 1 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -1 1 V -1 0 R -2 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -1 1 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 R -0 1039 R -0 -1039 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -3163 2002 L -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 1 R -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -7 13 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -0 -1358 R -0 55 R -0 -55 R -0 1358 R -1 1 V -1 0 V -0 1 V -1 0 V -3235 2060 L -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 1086 R -0 -1086 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 1 R -1 0 R -0 7 R -0 -7 R -0 1 V -1 0 R -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 R -1 0 R -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 1 R -3300 2104 L -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 R -0 819 R -0 -819 R -0 819 R -0 -819 R -1 0 V -0 1 V -0 819 R -0 -819 R -1 0 V -0 1 V -1 0 V --1 819 R -1 -819 R -0 1 R -1 0 V -1 2 R -1 1 R -0 1 R -1 0 R -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 1 R -1 1 R -1 1 R -1 0 V -0 -3 R -0 1 R -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -0 2169 R -0 -2169 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -3358 2145 L -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 -1661 R -0 1661 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -0 625 R -0 -437 R -0 -188 R -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 R -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 R -0 1 V -1 0 R -1 0 R -0 1 V -0 285 R -0 -1452 R -0 1167 R -1 0 V -0 1 V -1 0 V -1 1 V -0 -836 R -1 836 R -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -3420 2187 L -1 0 R -0 1 V -0 -245 R -0 245 R -1 0 R -0 1994 R -0 -1993 R -1 0 V -3 65 R -1 -65 R -1 0 V -0 1 R -1 0 R -1 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -1 1 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 R -0 -1 R -0 1 V -0 -1 R -0 1 V -0 -1 R -0 1 V -0 -1 R -0 1 V -0 -1 R -0 1 V -0 -1 R -0 1 V -0 -1 R -0 1 V -0 -1 R -0 1 R -1 0 V -1 1 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -1 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -0 2063 R -0 -2063 R -0 1 R -3479 2225 L -1 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -0 -1188 R -0 1188 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -0 1374 R -1 -1374 R -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 333 R -0 -333 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 1 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 R -0 1 R -1 0 R -0 1 R -1 0 R -1 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -0 1 R -1 0 V -1 1 R -1 0 R -0 1 R -1 0 R -1 1 R -1 0 R -0 4 R -1 0 V -0 1 V -3544 2273 L -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 R -3 5 R -1 0 R -0 1 R -1 0 V -0 -1537 R -0 1537 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -1 1 V -1 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 313 R -1 -313 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 R -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -45 27 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -3657 2349 L -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V --1 990 R -1 0 V -0 -990 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V --1 948 R -1 0 V -3677 945 M -1 0 V -0 1419 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 -1580 R -0 1580 R -2 1 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -0 -3 R -1 0 V -0 3 R -1 1 V -0 1880 R -0 -1880 R -0 1880 R -0 -1880 R -1 0 V -0 1 V -0 -950 R -0 950 R -1 0 R -1 1 V -0 1881 R -0 -1881 R -0 -949 R -1 949 R -0 1 V -0 572 R -0 -1522 R -0 1522 R -0 -572 R -0 1880 R -0 1 V -0 -1881 R -1 0 V --1 572 R -1 0 V -0 -572 R -0 -949 R -0 2830 R -0 -1881 R -3705 2383 L -0 -950 R -0 2830 R -0 -1880 R -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 1888 R -1 0 V -0 -1888 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 -1186 R -0 1187 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -0 -1389 R -0 1389 R -1 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1555 R -0 -1555 R -0 1 V -0 1554 R -0 -1554 R -1 0 V -1 1 R -0 1554 R -9 -2897 R -0 2502 R -0 -1153 R -0 -1874 R -0 2292 R -0 1 V -0 734 R -1 0 V -0 -1152 R -0 -1874 R -1 0 V --1 1874 R -1 0 V -0 -1919 R -0 1919 R -0 1153 R -0 -2502 R -0 -570 R -0 1919 R -0 -1349 R -0 1349 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -1 1 V -0 -1342 R -0 1 V -0 -578 R -0 1919 R -0 -1874 R -0 -45 R -0 2338 R -0 1334 R -0 -3672 R -0 2338 R -0 -419 R -0 -1341 R -0 2493 R -0 1 R -0 -1153 R -3760 2420 L --1 -1049 R -1 0 V --1 2737 R -1 0 V --1 -1688 R -1 0 V -0 1 R -1 0 V -0 1 V -0 -1296 R -0 1296 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -0 1 R -1 0 R -1 1 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -0 1 R -1 0 R -0 1 R -1 0 V -1 1 R -1 0 V -0 1 R -1 0 R -1 1 R -2 1 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -0 1 V -0 -1843 R -1 0 V --1 1843 R -1 0 V -1 0 R -0 1 R -1 0 R -0 1 V -1 0 V -1 1 V -0 -48 R -0 48 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -3822 2465 L -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 1 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -0 1689 R -0 -3644 R -0 2124 R -0 1 V -0 -170 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 656 R -0 -656 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 1 R -0 -1596 R -0 1596 R -3884 2507 L -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -1 0 R -0 1 V -1 1 R -1 0 R -3 2 R -1 1 R -1 0 R -1 1 R -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 R -1 0 V -0 -179 R -0 180 R -1 0 R -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 1 R -3954 2555 L -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -0 983 R -0 -983 R -1 0 V -0 1 R -1 0 R -0 1 R -1 0 R -1 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 R -1 -1313 R -0 1313 R -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 R -0 135 R -1 0 V --1 -133 R -1 0 V --1 135 R -1 0 V --1 -136 R -1 0 V -1 137 R -0 -136 R -0 135 R -1 0 V --1 -135 R -1 0 V -0 135 R -0 -135 R -0 135 R -0 -135 R -0 135 R -0 -135 R -0 135 R -0 -135 R -2 136 R -0 -136 R -0 134 R -0 -132 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -4007 2592 L -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -0 -1145 R -0 1145 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -1 1 R -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -1 1 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -4070 2637 L -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 R -5 -1 R -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 -169 R -0 -1955 R -0 2124 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -4141 2680 L -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -0 -1149 R -1 1149 R -1 1 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -0 -756 R -0 1 V -0 755 R -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 1 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 -135 R -0 135 R -0 -135 R -0 144 R -0 174 R -0 -174 R -0 1 V -0 -145 R -1 0 R -0 135 R -0 -134 R -0 134 R -0 -133 R -0 135 R -0 -136 R -0 134 R -0 1 V -1 0 V -0 -134 R -0 136 R -1 0 V --1 -135 R -4194 2583 L --1 -1 R -1 0 V -0 136 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 R -1 1 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -0 882 R -0 -882 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -0 7 R -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -2 1 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -4255 2769 L -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 1 V -1 0 V -0 1 R -1 0 R -1 1 V -1 0 R -0 1 R -1 0 R -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -0 -626 R -0 626 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 1 R -1 0 R -0 1 R -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V --1 1630 R -1 -1630 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -0 613 R -1 0 V -1 5 R -2 -1050 R -0 990 R -0 -990 R -3 436 R -1 1 V -1 0 R -0 1 V -1 0 V -0 -373 R -0 373 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 R -4321 2814 L -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -4 -1805 R -1 1805 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 -1525 R -0 1526 R -0 1 R -0 1 V -1 302 R -0 1 V -0 -1596 R -0 418 R -1 0 R -0 1178 R -0 -1596 R -0 1 V -0 1595 R -0 -1595 R -0 1595 R -0 -1595 R -0 1595 R -1 0 V --1 -1595 R -1 0 V -0 1595 R -0 -1595 R -0 1595 R -0 1 V -0 -1596 R -0 1 V -0 1595 R -0 -1595 R -1 1595 R -0 -1595 R -0 1595 R -0 -1595 R -0 1595 R -0 -1595 R -0 1595 R -0 -1595 R -0 1596 R -0 -1596 R -0 1596 R -1 0 V --1 -1596 R -1 1 V -0 1595 R -0 -1595 R -0 1595 R -0 -1595 R -0 1595 R -0 -1595 R -0 1595 R -0 -1595 R -1 1596 R -0 -1596 R -0 1 V -0 1595 R -0 -1595 R -0 1595 R -0 -1595 R -0 1596 R -1 0 V --1 -1596 R -1 1 V -0 1595 R -0 -1595 R -0 1595 R -0 -1595 R -0 1595 R -0 -1595 R -0 1595 R -0 -1595 R -0 1595 R -0 -1595 R -2 -239 R -2 0 R -0 1527 R -0 -1527 R -0 1527 R -0 -1 R -0 -1 R -0 -1526 R -0 1527 R -1 -1 R -4360 2830 L -0 172 R -0 -172 R -0 2 R -0 -1527 R -0 1527 R -0 -1527 R -0 1527 R -1 0 V --1 -1527 R -1 0 V --1 1525 R -1 0 V -1 922 R -0 -3256 R -0 2337 R -0 919 R -0 -2446 R -0 1109 R -0 1337 R -0 -2446 R -0 1528 R -0 918 R -1 0 V --1 -2446 R -1 0 V --1 1528 R -1 0 V --1 -1759 R -1 0 V -0 1759 R -0 -1768 R -0 1768 R -0 918 R -0 -2445 R -0 -811 R -0 1 V -0 2337 R -0 918 R -0 -2445 R -0 -232 R -0 1 V -0 1758 R -0 918 R -0 -2445 R -0 1527 R -0 -2293 R -0 1 V -0 3026 R -0 185 R -0 -2446 R -0 1527 R -0 -2337 R -0 45 R -0 2292 R -1 0 V -0 919 R -0 -2446 R -0 1527 R -0 1 V -0 918 R -0 -2446 R -0 -810 R -0 579 R -0 3310 R -0 -1551 R -0 918 R -0 -918 R -0 918 R -0 -2445 R -0 1527 R -0 918 R -0 -2445 R -0 1527 R -0 918 R -0 -2445 R -0 1527 R -1 918 R -0 1 V -0 -2446 R -0 1527 R -0 919 R -0 -2446 R -0 1527 R -0 -2337 R -0 3256 R -0 -2446 R -0 1527 R -0 919 R -0 -2446 R -0 1527 R -0 1 V -0 918 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 -2445 R -0 1527 R -0 -1527 R -0 -766 R -0 -45 R -0 3257 R -0 -2446 R -0 1527 R -0 -1527 R -0 2446 R -1 -2446 R -4370 1312 L -0 2445 R -0 -919 R -0 -2337 R -0 570 R -0 1767 R -0 1 V -0 -1527 R -0 2445 R -0 -918 R -0 -1768 R -0 241 R -0 2445 R -0 -2445 R -1 0 V --1 2446 R -1 0 V -0 725 R -0 -1903 R -0 259 R -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 -1543 R -0 453 R -0 1090 R -0 1 R -1 0 V -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 -659 R -0 659 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1748 R -0 -1748 R -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 R -4422 2874 L -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 266 R -0 -266 R -0 -2190 R -0 2190 R -1 0 V -0 266 R -0 -266 R -0 266 R -1 -2456 R -0 2190 R -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -0 73 R -0 -73 R -1 0 V -1 1 V -1 0 V --1 4 R -1 -4 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -0 -4 R -0 4 R -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -0 -1885 R -0 1885 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1217 R -0 -1217 R -1 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 -1574 R -0 1574 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -4479 2912 L -0 1 R -1 0 V -1 0 V -0 1 R -1 0 R -0 1 V -0 -1 R -0 1 R -0 -1 R -1 1 R -1 0 V -0 1 R -0 1 R -0 -1 R -1 0 V --1 1 R -1 0 V -0 -1 R -0 2 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 1 V -0 -1 R -0 1 R -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 -819 R -0 819 R -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -1 1 V -1 0 V -0 1 R -1 0 V -0 -819 R -0 819 R -1 1 V -1 0 V -0 1 R -0 -819 R -0 819 R -1 0 V -0 -1210 R -0 1211 R -0 -1211 R -1 0 V --1 1211 R -1 0 V -0 -1211 R -0 1211 R -0 -1210 R -0 1210 R -0 -1210 R -1 0 V --1 1210 R -1 0 V -0 -1210 R -0 1211 R -0 -1211 R -1 0 V --1 1211 R -1 0 V -0 -1211 R -0 1 V -0 1210 R -0 -1210 R -0 1210 R -0 -1210 R -0 1211 R -0 -1211 R -0 1211 R -1 -1211 R -0 1211 R -0 -1211 R -0 1211 R -0 -1211 R -0 1211 R -0 -1211 R -0 1 V -0 1210 R -1 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -4517 2939 L -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 -1 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -0 -73 R -0 73 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -0 -576 R -0 -3 R -1 0 V -0 3 R -1 1 V -0 1880 R -0 -1880 R -0 -950 R -0 950 R -1 0 V -0 -949 R -0 950 R -1 0 R -0 1 V -1 0 R -0 1881 R -0 -1881 R -1 0 V --1 -949 R -1 0 V -0 949 R -0 1 V -0 572 R -0 -1522 R -0 1522 R -1 0 V --1 -572 R -1 0 V --1 1880 R -0 1 V -1 -1881 R -0 572 R -0 -572 R -0 -949 R -0 2830 R -0 -1881 R -0 1 V -0 -950 R -0 2830 R -0 -1880 R -1 0 V -0 572 R -1 0 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -4562 2963 L -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 R -0 1 V -1 0 R -0 1206 R -0 -1206 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 R -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 R -0 1 R -1 -1894 R -0 1894 R -1 0 V -0 1 R -1 0 V -0 350 R -0 -349 R -1 420 R -0 -420 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -1 0 R -0 1 V -4624 3007 L -1 1 V -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 1 V -1 -290 R -0 -1944 R -0 1800 R -2 435 R -1 1 R -0 14 R -0 -14 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 R -1 1 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 R -0 -654 R -0 1246 R -0 -591 R -1 0 V -0 -14 R -0 14 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 1 V -1 0 V -0 -775 R -0 775 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -1 1 R -1 0 V -0 1 V -1 0 R -4689 3054 L -1 0 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 1 R -1 0 R -0 1 V -1 0 V -0 1155 R -0 1 V -0 -1156 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -5 -2510 R -10 2558 R -4 -543 R -54 542 R -1 0 V -28 -23 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -4 3 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 -18 R -1 0 V -0 1 V -1 0 V -1 1 V -1 1 V -14 -1394 R -15 2815 R -4 -3489 R -1 3109 R -1 0 V -0 1 R -1 0 V -2 -1028 R -1 1 R -1 1033 R -1 0 V -0 1 V -1 0 V -1 1 V -1 1 V -2 1 V -1 1 V -1 0 R -1 1 V -2 1 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 1 R -1 0 V -7 -413 R -1 425 R -1 0 V -0 1 R -24 419 R -68 -172 R -21 -2146 R -5014 2264 L -12 2164 R -15 -3250 R -11 1261 R -19 -228 R -1 0 V -0 1 R -0 -1 R -0 1 V -0 -1 R -0 1 V -0 -1 R -0 1 V -0 -1 R -0 1 V -0 -1 R -0 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -5097 497 M -19 34 R -39 2826 R -0 -471 R -1 471 R -0 -2332 R -0 2332 R -0 1 V -1 0 V -0 -2333 R -0 2333 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 1 R -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 -2173 R -0 2173 R -1 1 R -1 0 R -0 1 R -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 1 R -2 1 R -0 1 R -1 0 R -1 1 V -1 -3 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -5203 3387 L -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 1 R -1 -1447 R -0 1447 R -1 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -1 1 R -1 0 R -0 1 V -5 1 R -1 0 V -1 1 R -1 0 V -0 1 V -0 -614 R -0 1 V -1 0 V --1 613 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 -615 R -0 616 R -0 -616 R -0 616 R -1 0 R -0 1 V -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -3 0 R -1 1 R -1 0 V -0 1 R -1 0 R -1 1 V -1 0 R -0 1 R -1 0 R -0 1 R -1 0 V -5276 3432 L -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 24 R -0 -24 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 R -1 0 R -2 1 R -0 753 R -0 -753 R -0 753 R -0 -753 R -5342 3479 L -0 752 R -0 -752 R -1 0 V --1 752 R -1 1 V -0 -753 R -0 753 R -0 -753 R -0 753 R -0 -753 R -0 753 R -1 0 V --1 -753 R -1 1 V -0 753 R -1 0 V --1 -753 R -1 0 V -0 753 R -2 2 V --2 -755 R -0 1 V -2 1 V -0 753 R -0 -753 R -0 753 R -0 -753 R -0 753 R -0 -753 R -1 0 V --1 753 R -1 0 V -0 -753 R -0 753 R -0 -752 R -0 752 R -0 1 V -0 -753 R -1 0 V --1 753 R -1 0 V -0 -753 R -0 753 R -0 -753 R -0 753 R -0 -752 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -0 -2288 R -0 -294 R -0 2185 R -0 -1480 R -0 1275 R -0 1 V -0 -2052 R -0 2053 R -0 -2053 R -0 71 R -1 0 V -0 -70 R -0 2050 R -0 1 V -0 601 R -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 R -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -0 687 R -0 -687 R -1 1 V -1 0 V -0 1 R -1 0 V -5376 3502 L -1 0 V -1 0 R -0 1 R -1 0 V -1 1 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 V -5426 882 M -1 0 V -0 2655 R -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -2 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 R -5446 3550 L -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 R -0 -1153 R -0 1153 R -0 -3027 R -0 1875 R -0 -1350 R -0 2502 R -0 -734 R -1 0 V --1 918 R -1 0 V --1 -2445 R -1 0 V --1 -232 R -1 1 V --1 -580 R -1 1 V -0 3071 R -0 -3071 R -0 569 R -0 1 V -0 2501 R -0 1 V -0 -735 R -0 1 V -0 918 R -0 -2446 R -0 1109 R -1 0 V -0 1 V -0 1152 R -1 -733 R -0 -1527 R -0 2261 R -1 0 V -0 1 V -1 0 V -1 1 R -0 -734 R -0 -2337 R -0 577 R -0 1 V -0 2493 R -1 0 V -0 -2493 R -0 1341 R -0 1152 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -1 1 V -0 -2429 R -0 2429 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -5490 3580 L -0 1 V -1 0 V -1 0 R -0 1 V -0 82 R -1 -82 R -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 4 R -0 -2800 R -0 4 R -0 2792 R -0 4 R -0 -2800 R -0 4 R -0 -8 R -0 2800 R -0 1 V -0 3 R -0 -2800 R -0 4 R -0 -7 R -0 2800 R -0 3 R -0 -2800 R -0 4 R -0 -7 R -0 3 R -0 4 R -0 2793 R -1 0 V -5519 804 M -0 4 R -0 1 V -0 -8 R -0 2803 R -0 -2803 R -1 0 V --1 3 R -1 0 V --1 5 R -1 0 V --1 2795 R -1 0 V -0 -2803 R -0 3 R -0 1 V -0 4 R -0 2792 R -0 -2800 R -0 4 R -0 4 R -0 2795 R -0 1 V -0 -2800 R -0 4 R -0 -8 R -0 2804 R -0 -2800 R -0 4 R -0 -8 R -0 2800 R -0 -2796 R -0 4 R -0 2796 R -0 -2804 R -0 2800 R -0 -2796 R -0 4 R -0 2796 R -0 -2800 R -5521 805 L --1 4 R -1 0 V -0 -7 R -0 2800 R -0 3 R -0 -2800 R -0 1 R -0 2799 R -0 -2799 R -0 -4 R -0 2800 R -0 -2797 R -0 1 V -0 2796 R -0 -2800 R -0 4 R -0 2796 R -0 -2800 R -0 3 R -0 1 R -0 2796 R -0 -2800 R -0 2800 R -0 -2792 R -0 -4 R -0 2796 R -0 -2800 R -0 3 R -0 1 R -0 2796 R -0 -2800 R -0 2800 R -0 -2792 R -0 -4 R -0 2796 R -0 -2800 R -0 3 R -0 1 R -0 2796 R -0 -2800 R -0 2800 R -0 -2792 R -0 -4 R -0 2796 R -0 -2800 R -0 3 R -0 1 R -0 2796 R -0 -2800 R -0 2800 R -0 -2792 R -0 -4 R -0 2796 R -0 -2800 R -0 3 R -0 1 R -0 2796 R -0 -2800 R -0 2800 R -0 -2792 R -1 2795 R -0 -2795 R -0 -4 R -0 1 V -1 0 V -0 2796 R -0 3 R -0 1 V -0 -2804 R -0 5 R -0 2799 R -0 -4 R -0 -2800 R -0 1 V -0 4 R -0 2869 R -0 -70 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 -882 R -0 882 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 108 R -1 -104 R -0 1 R -1 0 R -5542 3623 L -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -0 -151 R -0 151 R -1 1 V -1 0 V -0 36 R -0 -36 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 -2857 R -0 2857 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -0 -865 R -0 865 R -1 1 V -1 0 V -0 1 R -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -0 -36 R -0 36 R -1 0 V -0 1 V -1 0 V -0 -3169 R -0 3169 R -5600 3664 L -1 0 V --1 -82 R -1 0 V -0 -3088 R -0 3170 R -1 1 R -1 0 R -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -0 1 R -1 0 R -1 1 R -1 0 V -0 1 V -1 0 V -1 -773 R -2 775 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -0 -1 R -0 1 V -1 0 V --1 -1 R -1 1 R -1 1 V -1 0 V -0 1 V -0 -2180 R -0 2180 R -1 0 R -0 1 V -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 -1305 R -0 1305 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -5663 3707 L -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 R -1 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 -1031 R -0 1031 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -3 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 -3254 R -0 3254 R -0 1 V -0 -919 R -0 -1527 R -1 -335 R -0 -474 R -0 2337 R -0 -1527 R -0 2446 R -0 -1 R -0 -2445 R -0 2445 R -0 -2445 R -1 2445 R -0 -2445 R -1 -810 R -0 2337 R -0 919 R -0 -2446 R -0 1109 R -0 1337 R -0 -2446 R -0 1528 R -0 918 R -0 -2446 R -0 1528 R -0 -1759 R -0 1759 R -0 -1768 R -0 1768 R -0 918 R -0 -2445 R -0 -811 R -5730 497 L -0 2337 R -0 918 R -0 -2445 R -0 -232 R -0 1 V -0 1758 R -0 918 R -0 -2445 R -0 1527 R -0 -2293 R -1 1 V --1 3026 R -1 185 R -0 -2446 R -0 1527 R -0 -2337 R -0 45 R -0 2292 R -0 919 R -0 -2446 R -0 1527 R -0 1 V -0 918 R -0 -2446 R -0 -810 R -0 579 R -0 3310 R -0 -1551 R -0 918 R -0 -918 R -1 918 R -0 -2445 R -0 1527 R -0 918 R -0 -2445 R -0 1527 R -0 918 R -0 -2445 R -0 1527 R -0 918 R -0 1 V -0 -2446 R -0 1527 R -0 919 R -0 -2446 R -0 1527 R -0 -2337 R -0 3256 R -0 -2446 R -0 1527 R -0 919 R -1 0 V --1 -2446 R -1 0 V --1 1527 R -1 1 V -0 -2338 R -0 810 R -0 1528 R -1 0 R -0 1 R -0 -2293 R -1 2293 R -0 1 V -1 0 V -0 -2293 R -0 2293 R -1 0 R -0 -2337 R -0 45 R -0 765 R -0 2446 R -0 -2446 R -0 1 V -0 2445 R -0 -919 R -0 1 V -0 1334 R -0 -3672 R -0 2338 R -0 1334 R -0 -415 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -5753 3768 L -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -0 -2278 R -0 2278 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 1 R -5821 3815 L -0 1 V -0 -3070 R -0 3070 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 1 V -1 0 R -0 1 R -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -1 1 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 R -2 0 R -0 1 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 -3260 R -0 3260 R -0 -3260 R -0 3260 R -1 5 R -1 0 R -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -5885 3864 L -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -1 1 V -2 1 V -1 1 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 R -2 1 R -0 -1019 R -0 -1392 R -1 2412 R -1 0 V -1 1 V -1 1 V -1 0 V -0 1 V -2 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -1 1 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -0 -1 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 R -5961 3917 L -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 R -1 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -0 -2132 R -0 2132 R -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 R -1 1 V -1 0 R -0 1 R -1 0 V -1 1 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 1 R -1 0 V -0 1 V -1 555 R -0 -555 R -0 1 V -1 0 V -0 2 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -1 1 R -1 0 R -0 1 V -1 0 V -0 1 V -1 -451 R -0 451 R -1 0 R -0 1 V -0 -1554 R -1 0 V -0 1554 R -6025 3964 L -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 R -1 0 R -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -0 5 R -0 -5 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 -5 R -0 5 R -0 1 R -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 1 R -1 0 R -0 1 R -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -6092 4010 L -0 1 V -1 0 V -1 0 R -0 1 R -1 0 R -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -1 0 R -0 1 R -1 0 V -0 1 R -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 8 R -0 -8 R -0 8 R -0 -8 R -0 9 R -1 0 R -0 1 V -1 0 V -0 1 R -1 0 R -2 2 R -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -0 -1089 R -0 1089 R -1 0 V -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -0 1 V -1 0 V -1 0 V --1 482 R -1 -481 R -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 0 R -0 1 R -0 -2284 R -0 2284 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -6154 4063 L -1 0 V -6154 798 M -1 0 V -0 3265 R -1 1 V -0 -880 R -0 880 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -4 383 R -8 -1431 R -6 1109 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 1 R -0 1 V -1 0 R -1 1 V -1 1 V -2 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -1 1 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -2 2 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -0 -3028 R -0 3028 R -6238 4169 L -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 -1334 R -0 -2338 R -0 2338 R -0 1334 R -1 1 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -1 1 V -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -0 -687 R -0 687 R -1 0 R -0 1 R -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 R -1 0 R -0 1 R -1 0 R -1 0 R -0 1 R -1 3 R -1 0 V -0 1 V -2 0 R -18 15 R -1 -1 R -0 272 R -0 -3943 R -0 3943 R -0 -3943 R -0 3672 R -0 271 R -1 -272 R -0 272 R -0 -3943 R -0 3671 R -0 272 R -0 -272 R -0 1 V -0 271 R -0 -271 R -0 271 R -1 -271 R -1 0 R -0 1 V -1 0 R -0 1 R -1 1 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -6312 4223 L -0 1 R -1 0 R -0 -1148 R -0 1148 R -0 -2627 R -0 2627 R -1 1 R -0 1 R -1 0 V -1 0 V -1 3 R -1 0 V -0 1 V -1 0 R -2 -752 R -0 753 R -0 -753 R -0 753 R -0 -753 R -0 1 V -0 752 R -0 -752 R -1 0 V --1 752 R -1 1 V -0 -753 R -0 753 R -0 -753 R -0 753 R -0 -753 R -0 753 R -1 0 V --1 -753 R -1 1 V -0 753 R -1 0 V --1 -753 R -1 0 V -0 753 R -2 2 V --2 -755 R -0 1 V -2 1 V -0 753 R -0 -753 R -0 753 R -0 -753 R -0 753 R -0 -753 R -1 0 V --1 753 R -1 0 V -0 -753 R -0 753 R -0 -752 R -0 752 R -0 1 V -0 -753 R -1 0 V --1 753 R -1 0 V -0 -753 R -0 753 R -0 -753 R -0 753 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -0 -3593 R -1 0 V -0 3593 R -0 1 R -0 -2577 R -0 2577 R -1 0 V -0 1 V -1 0 R -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 -2796 R -0 2796 R -6350 4251 L -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 1 R -0 -2830 R -0 950 R -1 1880 R -0 1 V -1 0 V -0 -1880 R -0 -950 R -0 2830 R -1 1 V -0 -2830 R -0 2830 R -1 1 V -1 0 V -0 -1880 R -0 -950 R -0 2830 R -0 1 V -1 0 V -1 0 R -0 1 V -0 -2831 R -0 2831 R -0 -2830 R -0 949 R -0 1881 R -1 0 V --1 -2830 R -1 0 V -0 2830 R -0 1 V -1 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 R -1 0 V -0 -2063 R -0 2063 R -1 281 R -1 -280 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -6405 4291 L -1 0 V -1 0 R -0 1 R -1 0 V -1 1 V -1 0 R -0 1 R -1 0 V -0 1 V -0 -3350 R -0 3350 R -1 0 V -1 0 V -0 1 R -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V --1 -2169 R -1 2169 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -0 1 R -0 -1 R -1 0 R -0 1 V -1 0 R -0 -1 R -0 1 R -0 -1 R -2 2 R -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 R -6467 4333 L -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -1 1 R -1 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 R -1 0 R -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -0 -1410 R -0 1410 R -1 0 R -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 R -1 1 V -6533 4379 L -0 1 V -1 0 V -0 1 R -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 R -1 0 V -0 1 R -1 0 R -0 1 R -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 R -0 1 V -1 -623 R -0 623 R -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V --1 -2936 R -1 0 V -0 2936 R -0 1 V -1 0 V -1 0 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 R -1 0 R -0 1 R -1 0 R -1 1 R -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 R -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 R -1 1 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 R -0 1 V -1 0 V -0 1 V -0 -1631 R -1 1631 R -1 0 V -0 1 R -1 0 V -0 1 R -1 0 R -1 0 V -0 1 V -6596 4423 L -0 1 V -1 0 R -1 1 R -1 0 R -0 1 V -1 0 V -1 -3404 R -0 3109 R -1 0 R -2 296 R -1 0 R -1 0 V -1 0 V -0 2 R -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -1 1 V -2 1 R -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -2 2 R -0 1 V -1 0 R -0 6 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -2 1 R -1 1 V -1 0 V -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -0 -3398 R -0 3398 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -0 -33 R -0 33 R -1 0 R -0 1 V -1 0 V -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -0 1 V -1 0 R -1 0 V -0 1 R -1 0 R -0 1 R -1 0 V -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -6664 4475 L -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -0 -272 R -1 0 V --1 272 R -1 0 V -0 1 R -6 -2448 R -20 901 R -5 -631 R -0 1 V -5 316 R -3 -219 R -0 379 R -0 -626 R -14 1219 R -7 -1667 R -1 1 V -1 2594 R -5 -3565 R -85 1092 R -29 2714 R -1 0 R -0 1 R -1 0 R -1 1 R -1 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 0 V -0 1 R -1 0 R -0 1 V -1 0 R -1 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1 R -1 0 R -0 1 V -1 0 R -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 3 R -0 -3 R -0 3 R -1 -3 R -0 1 V -0 2 R -0 1 V -0 -3 R -0 3 R -0 -3 R -0 3 R -0 -3 R -1 0 V -0 3 R -0 -3 R -0 3 R -0 -2 R -1 0 V -0 3 R -0 -3 R -0 3 R -0 -3 R -1 1 V -0 -3 R -0 3 R -0 -3 R -1 0 V --1 3 R -1 0 V -0 -3 R -0 3 R -0 -3 R -0 3 R -0 -3 R -0 1 V -0 2 R -0 1 V -0 -3 R -0 3 R -0 -3 R -6882 4559 L --1 3 R -1 0 R -0 -3 R -0 3 R -0 -3 R -0 3 R -0 -3 R -0 3 R -0 1 V -1 0 R -0 -3 R -0 3 R -0 -3 R -0 3 R -1 0 R -0 1 V -0 -4 R -0 4 R -1 0 V -0 1 V -1 0 V -1 1 R -1 0 V -0 1 V -1 0 V -1 1 R -1 0 R -0 1 R -1 0 V -0 1 V -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -0 -3879 R -1 3879 R -0 1 V -1 0 V -0 1 R -1 0 R -1 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -0 1 V -1 0 V -1 0 V -0 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 V -0 1 R -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -0 -2417 R -0 595 R -1 1825 R -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 0 R -0 1 R -1 0 V -0 1 R -1 0 V -1 0 V -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -6939 4604 L -0 1 V -1 0 V -1 1 R -1 -3993 R -1 3993 R -0 1 V -1 0 V -0 1 V -1 0 V -1 0 V -0 1 R -4 0 R -0 1 V -1 0 R -0 1 V -1 0 V -1 1 V -1 0 R -0 1 V -1 0 V -0 6 R -0 -6 R -0 1 V -0 3 R -0 2 R -0 -5 R -1 0 V -0 5 R -0 1 V -0 -6 R -0 6 R -0 -4137 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V -0 4131 R -1 1 R -1 0 R -0 1 V -0 2 R -0 -2 R -1 0 R -1 1 V --1 2 R -1 0 V --1 -6 R -1 0 V -0 4 R -1 0 V -0 1 V -stroke -LT0 -1.00 0.00 0.00 C /Helvetica findfont 140 scalefont setfont -6941 3743 M --39 84 R --12 657 R --28 -817 R --12 824 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 -488 R -0 488 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 3 R -0 -2 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -0 1 R -6823 4510 L --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 1 R -0 -1 V -0 1 R --1 -482 R -1 0 V --1 482 R -0 1 R --1 0 R --1 0 R -1 0 V --1 1 R --1 0 R -0 1 R -6793 4531 L -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R --1 0 R --1 1 R -1 -1 V --1 1 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 -522 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R -0 8 R -0 -8 R -0 8 R -0 -8 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R -1 0 V --1 0 R --1 0 R -0 1 R -0 45 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -6765 4074 L --1 0 R --1 1 R -1 0 V --2 0 R -0 1 R --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R --3 5 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R --1 1 R --1 1 R -0 -1 V --1 1 R --1 0 R -0 1 R --6 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -0 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -6720 4104 L --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R --1 0 R -0 1 R -0 -1688 R -0 1688 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -0 -1 V -0 1 R -1 0 V --2 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R -0 -2436 R -0 2436 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R -0 -3427 R -0 3427 R --1 0 R -0 1 R -0 -1 V --1 1 R -6691 4125 L --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 0 V --1 0 R --2 -59 R -0 1 R -0 -1 V -0 1 R -0 58 R --2 -59 R -0 1 R -0 58 R --2 -59 R --29 -796 R -6638 1343 M --2 695 R --2 1 R --1 1 R --15 1783 R --29 -930 R --6 3 R -6553 1296 M --11 1518 R --17 -512 R --40 1765 R -6464 2871 M --10 1592 R --7 -3294 R --19 2846 R --3 -2877 R --1 2957 R --6 -1190 R -6411 522 M -0 2125 R -0 -1 V -0 -169 R --10 950 R -0 -703 R --82 1 R -0 173 R -6313 595 M --20 3602 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -6283 4204 L -0 1 R --1 0 R --1 0 R -1 0 V --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V --1 0 R -6243 1015 M --13 555 R --3 2 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R --1 0 R -0 1 R --1 0 R --1 1 R --1 0 R -1 0 V --1 1 R --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -2 -1 V --3 2 R -1 -1 V --2 2 R -1 -1 V --2 1 R --1 1 R -1 -1 V --1 1 R --1 2 R -3 -2 V --6 3 R -6196 1594 L --1 2944 R --1 -2943 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -1 -1 V --1 1481 R -1 -1 V --16 758 R --5 203 R -0 -1 V --2 2 R --2 -2755 R --4 3273 R --7 -943 R -6079 2373 M --42 538 R -1 0 V -6025 496 M -0 570 R -0 2502 R -0 -1153 R -0 1337 R -0 -2446 R --1 1528 R -1 -1 V --1 -1758 R -0 2493 R -0 -2502 R -0 1 R -0 -1 V --1 2502 R -1 0 V --1 -2501 R -1 0 V --1 1767 R -0 919 R -0 -2446 R -0 -240 R -0 2502 R -0 -735 R -0 -1767 R -0 -570 R -0 3072 R -0 -3072 R -0 3256 R -0 -918 R -0 -1528 R -0 -765 R -0 534 R --1 -9 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R --1 0 R --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 2501 R -0 -734 R -0 -2337 R --1 570 R -1 0 V --1 0 R -0 -525 R -0 2237 R -0 94 R --26 -427 R --28 -242 R --19 784 R --44 629 R --7 -1323 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 0 V --1 0 R --1 1566 R --14 625 R -5847 680 M --15 3756 R -5821 2751 M --12 1803 R -5783 1711 M --24 2330 R -5737 2441 M -0 724 R -0 -2150 R -0 1389 R -0 -1389 R --7 2948 R -0 -1554 R --15 2197 R -5716 4606 L -5651 2055 M --1 455 R -5622 1460 M --2 1519 R -1 0 V --1 0 R --6 101 R --69 186 R --6 -2203 R -0 -283 R --2 2836 R -5479 2404 M --5 3 R -0 477 R -0 -476 R -0 -1 V --1 1555 R --1 1 R -1 0 V --1 -1555 R -0 1 R --38 908 R --16 178 R --8 -318 R --32 250 R --10 103 R --10 -41 R --3 -1849 R -0 1 R -0 2623 R -0 -3358 R -0 734 R -0 -734 R -0 734 R -0 2625 R -0 -3359 R -0 734 R --1 1 R -0 -1 V -0 2624 R -0 -664 R -0 -2694 R -0 735 R -0 2848 R -0 -1371 R -0 239 R -0 908 R -5246 1755 M --1 0 R -1 0 V --1 0 R --1 1 R --1 0 R -0 1 R -0 338 R -0 -338 R --1 1 R --1 0 R --1 0 R -1 0 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 0 V --1 0 R --6 1 R -0 1 R -0 -1 V --1 1 R --1 1 R --8 1987 R --3 -324 R -0 -703 R -5159 1355 M -0 -1 V --4 40 R --1 0 R -1 0 V --1 1 R -1 -1 V --2 1 R -0 3 R -0 -3 R --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R --1 0 R --1 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --3 1 R -1 -1 V --1 1 R --1 0 R -5144 1401 L --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R --1 0 R -1 0 V --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --2 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -0 1666 R -0 -1666 R --1 0 R -1 0 V --1 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -5115 1421 L --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 2830 R -0 -2830 R --1 1 R -0 2830 R -0 -2830 R --1 2831 R -0 -2830 R --1 0 R --1 2831 R -0 -2830 R -0 950 R -0 -950 R -0 2830 R --1 -2830 R -0 2831 R -0 -2831 R -0 2831 R -0 -2831 R -0 1 R -0 949 R -0 -949 R --1 0 R -1 0 V --1 0 R -1 0 V --1 950 R -0 -950 R --1 1 R -1 -1 V --1 2831 R -1 0 V --1 -1881 R -0 -949 R -0 2830 R -0 -2830 R -0 2830 R -0 -2830 R --1 0 R -0 1 R --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -5088 1439 L --1 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R --1 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R --1 0 R --1 0 R -1 0 V --1 1 R --1 0 R -0 1 R --1 0 R --1 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -0 -1 V -0 1 R -1 0 V --1 0 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R --1 2261 R -0 -2261 R -0 1 R --1 0 R -0 1966 R -0 -2721 R --1 755 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -5051 1464 L --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -0 196 R -0 -196 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R --1 0 R -1 0 V --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -1 -1 V --2 1 R --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R -1 0 V --1 0 R --1 6 R -0 -6 R --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -5025 1482 L --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R --1 0 R -4998 1501 L --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R --1 0 R --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R --1 0 R -0 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -1 -1 V --1 1 R --1 0 R --1 1 R -0 -1 V -0 1 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -0 1 R --1 0 R -4967 1523 L --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R --1 1 R --1 0 R -0 1 R --1 0 R -0 -365 R -0 366 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1149 R -0 -1 V -0 -1147 R --1 2483 R --1 -2478 R --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 -1 V --1 1 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -4934 1550 L -0 1 R --1 -719 R -1 0 V --1 719 R -1 0 V --1 0 R --1 1 R -0 -1 V -0 1 R -1 0 V --2 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -0 1620 R -0 -1620 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --5 2 R --1 0 R -1 0 V --5 1881 R --4 -1868 R --1 0 R --1 1 R --1 1 R --1 0 R -4889 1587 L --1 0 R -0 1 R -0 -1 V -0 1 R -1 0 V --2 0 R -1 0 V --1 1 R -1 -1 V --3 2 R -1 -1 V --1 2 R -1 -1 V --3 1 R -1 0 V --1 0 R -1 0 V --1 1 R -0 -1 V --1 1 R -0 2 R -3 -2 V --6 3 R -1 -1 V --1 2944 R --1 -2943 R --1 0 R -0 1 R -0 -1 V -0 1481 R -0 -1 V -4862 1508 M --15 1598 R -0 -1 V --1 1 R -1 0 V --1 0 R -1 0 V --1 0 R --1 1 R -1 0 V --2 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R --1 2 R -3 -2 V --5 2 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -1 0 V --5 -1454 R --1 1 R -1 -1 V --1 1 R --2 1106 R -4794 1108 M --4 514 R --47 -8 R --6 6 R --3 2 R --6 1489 R --4 -31 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 0 R -0 1 R -1 -1 V --1 1 R --1 0 R -1 0 V -4708 1823 M --45 1340 R --8 1298 R -0 -1606 R -0 -1392 R -0 -683 R -0 1944 R -0 -144 R -0 144 R -0 1759 R -0 -3702 R -0 -1 V -0 2118 R --3 1273 R --4 -1489 R --14 -307 R -4624 1245 M --20 -90 R --2 2750 R --9 -2143 R --10 1916 R -0 -1 R -4507 1617 M --30 481 R --8 2201 R -4449 2783 M -4443 906 M --1 3359 R --9 -512 R -0 -918 R -0 -1528 R -0 -765 R --5 3416 R --1 -2285 R -4428 1673 L --1 0 R --1 0 R --3 2095 R -1 0 V --6 563 R -4408 1798 M --16 2808 R -0 -1 V --22 -647 R --7 5 R --1 -1554 R --1 1879 R -4354 697 M -1 0 V --6 1143 R -1 0 V -4348 698 M -1 -1 V --42 2610 R --2 1270 R --7 -2497 R --6 -323 R --1 5 R --1 0 R -1 0 V --15 2346 R -0 -2737 R -0 2587 R -4234 1810 M --1 0 R -1 0 V --1 0 R --12 88 R -1 0 V --2 -92 R --5 -344 R --9 -566 R -0 317 R --38 3247 R --7 -1712 R --12 679 R --8 -1110 R -0 -96 R --2 20 R -1 0 V --25 375 R -4091 945 M -0 3350 R -4083 740 M --27 2384 R --6 985 R -4040 2662 M -0 1 R --13 673 R -4015 1666 M --4 2259 R -0 -1 V -3943 720 M --25 3781 R -3914 893 M --27 1500 R -1 0 V -3858 544 M -0 2293 R -3840 945 M -0 3350 R -3830 1362 M --26 1143 R --44 -101 R --1 0 R -0 -1389 R -0 1389 R --2 1 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 1555 R -0 -1554 R -0 -1 V -0 1555 R -0 -1554 R --1 0 R -1 0 V --1 0 R --1 1 R -0 1554 R -0 -1554 R --1 0 R -1 0 V --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -3750 2410 L --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 -1349 R -0 2502 R -0 -1153 R -0 -1874 R -0 2293 R -0 -1 V --1 735 R -0 -1152 R -3741 542 M -1 0 V --1 1874 R -1 0 V -3741 497 M -0 1919 R -0 1153 R -0 -2502 R -0 -570 R -0 1919 R -0 -1349 R -0 1350 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R --2 2 R -0 -1919 R -0 1919 R -0 -1341 R -0 -1 V -0 -577 R -0 1919 R -0 -1874 R -0 -45 R -0 1919 R -0 -1919 R -0 2338 R -0 1334 R -0 -3672 R -0 2338 R -0 -419 R -0 -1341 R --1 1342 R --5 2069 R --8 -1110 R --2 -870 R -3693 781 M -3679 502 M -1 0 V --8 961 R --13 2948 R --7 -2091 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 0 R -3644 2325 L --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -0 1012 R -0 -1012 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 145 R --1 -145 R -1 0 V --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --12 336 R --5 -293 R -3598 2387 L --3 1722 R --13 485 R --3 -727 R --1 1 R --1 0 R -0 1 R -0 -1 V --1 1 R -3547 2275 M --28 -772 R --45 1177 R -3464 1448 M --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R --1 0 R -1 0 V --1 0 R --4 2468 R -0 -1 V -3431 496 M -0 2228 R --1 1150 R --15 -649 R --3 -386 R -0 1334 R -0 -3672 R -0 2338 R --35 1269 R --44 -747 R --5 -1742 R --16 2915 R -3246 788 M -1 0 V --15 -42 R --49 3613 R --12 -184 R --3 83 R -3156 1285 M --37 2065 R -3080 1561 M --14 2759 R -3062 817 M --129 864 R --3 1453 R -0 -1595 R --1 1595 R -1 0 V --1 -1595 R -1 0 V --1 1596 R -0 -1596 R --14 2313 R --6 -949 R -2896 834 M --1 2053 R -0 -1982 R --10 1848 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R --5 -9 R --5 507 R -2865 784 M --18 1326 R --13 604 R -2833 650 M --9 3591 R --6 -2377 R -1 0 V --1 1541 R --1 1 R --1 5 R -1 0 V --1 0 R -0 1 R --1 0 R --1 -5 R -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R -0 -613 R -0 613 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --2 -615 R --1 0 R -1 0 V --1 617 R --1 0 R --17 -451 R --18 700 R -0 -36 R -2759 1617 M --69 1524 R -0 1 R -2690 3141 L --1 1 R -0 -266 R -0 266 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 -656 R -1 0 V --1 656 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R --1 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 -1086 R -0 1086 R --1 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 0 R -1 0 V --1 1 R -1 -1 V --1 1 R --1 0 R -2662 3161 L --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -2634 3180 L --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 880 R -0 -1 V --1 -878 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 -1620 R -0 1621 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R --1 0 R -1 0 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 1 R -1 0 V --2 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R --1 0 R -2605 3200 L --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --2 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -0 -2348 R -0 2349 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R -1 0 V --1 0 R --1 1 R --1 0 R -0 -2516 R -0 2455 R -0 64 R --1 -1552 R -0 1550 R -2572 3222 L --1 1 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -0 1 R --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R -0 5 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -2546 3246 L --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R --1 0 R -1 0 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 -8 R -0 8 R --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R --1 0 R -2514 3268 L --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -0 -2256 R -0 156 R --1 2100 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R -1 0 V --1 0 R -0 -2065 R -0 615 R -0 1450 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R -2486 3287 L -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R --1 0 R -0 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --2 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 1 R --1 0 R --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 1 R -1 -1 V --1 1 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -2453 3310 L -0 1 R --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R -1 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 0 R -0 1 R --1 0 R --1 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R --1 1 R -1 0 V --1 0 R --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -0 1 R -2421 3332 L -0 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R --1 0 R -1 0 V --1 1 R -0 -1 V --1 1 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 0 R -1 0 V --1 1 R -1 -1 V --1 1 R --1 0 R -1 0 V --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 1 R -1 0 V --1 0 R --1 83 R -0 -1662 R -0 1579 R -0 1 R -0 -1 V --1 1 R -1 0 V --1 0 R --1 0 R -1 0 V --1 0 R -1 0 V --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R --1 0 R -1 0 V --1 0 R -0 1 R --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R -0 1 R -0 -1 V --1 1 R --1 -989 R -0 990 R --10 275 R -1 0 V --1 -387 R --23 -835 R --14 66 R --8 254 R --16 -696 R --20 1928 R --23 578 R --4 -1033 R --6 892 R -2256 1997 M --15 -869 R --1 1 R -0 -1 V --2 1306 R -1 0 V --25 993 R -2215 3427 L --1 -703 R -1 0 V --1 1150 R --18 -866 R --38 -264 R -0 -1 V --4 -1527 R -0 -1 V --3 1512 R -2131 1063 M -0 47 R -0 2317 R -0 -847 R -0 -1800 R -0 221 R -0 -220 R -0 -1 V -0 2118 R -0 1586 R -0 -1586 R -0 -173 R --21 -213 R --30 1878 R --38 -86 R -0 -3762 R --50 1861 R -1976 902 M --4 1953 R --1 1695 R -1958 641 M --1 1763 R --1 0 R --3 1107 R -0 451 R --1 -1554 R -0 1555 R --6 -5 R --1 0 R --1 1 R -1 0 V --1 0 R --1 1 R --1 0 R -1 0 V --1 0 R -0 1 R -0 -1 V -0 1 R --1 0 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 -450 R -0 451 R --1 0 R -1 0 V --1 1 R -0 -1 V -0 1 R --1 -1554 R -0 1554 R -0 -2838 R --4 2496 R -0 -1336 R -1900 519 M -0 -1 V --36 2321 R -0 1334 R -0 -3672 R -0 2338 R -0 1334 R -1762 2477 M -0 -1955 R -0 2125 R -0 -1 V --4 1397 R --24 74 R --1 -711 R -1711 1463 M -0 2411 R -0 -3168 R -0 2192 R -0 -1435 R -0 1964 R -0 324 R -1696 906 M -0 734 R -0 2625 R -0 -3359 R --1 2715 R -0 -2531 R --3 2 R --9 1635 R --13 180 R -1636 1143 M --26 2344 R -0 -2288 R -0 2764 R -1609 780 M -0 -74 R -0 295 R --9 2440 R -1585 1463 M --1 -682 R --17 1151 R --30 135 R -1489 746 M -1489 745 L -0 1 R --13 -48 R -1 0 V --1 1 R -1 0 V --1 1358 R -0 -1303 R -0 1 R --18 2273 R --12 -784 R --12 617 R -1409 746 M -0 -1 V -0 1 R --2 1210 R -0 1182 R -0 -1595 R --18 897 R --2 1394 R --23 707 R -1330 3475 M --35 128 R -0 -2799 R -0 4 R -0 2799 R -1274 1597 M -0 1479 R -0 1148 R -1253 3192 M --28 966 R -1206 834 M -1 0 V --1 2052 R -0 -1 V --1 -481 R --1 0 R --5 -1266 R -0 2373 R --1 -1103 R --1 1555 R -1183 1420 M --18 -381 R --1 590 R -0 -1 V --25 776 R -1 0 V --1 0 R --5 4 R -0 -1 V -0 1555 R --1 -2529 R -1 0 V --1 975 R -0 1555 R -1122 935 M -stroke -1.000 UL -LTb -1113 4620 N -0 -4137 V -5849 0 V -0 4137 V --5849 0 V -Z stroke -1.000 UP -1.000 UL -LTb -stroke -grestore -end -showpage -%%Trailer -%%DocumentFonts: Helvetica -%%Pages: 1 diff --git a/bp_doc/dotplot.svg b/bp_doc/dotplot.svg deleted file mode 100644 index 682a4fe..0000000 --- a/bp_doc/dotplot.svg +++ /dev/null @@ -1,1359 +0,0 @@ - - - -Produced by GNUPLOT 4.2 patchlevel 0 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0 - - - - - - - - - 200000 - - - - - - - - - 400000 - - - - - - - - - 600000 - - - - - - - - - 800000 - - - - - - - - - 1e+06 - - - - - - - - - 1.2e+06 - - - - - - - - - 1.4e+06 - - - - - - - - - 1.6e+06 - - - - - - - - - 0 - - - - - - - - - 200000 - - - - - - - - - 400000 - - - - - - - - - 600000 - - - - - - - - - 800000 - - - - - - - - - 1e+06 - - - - - - - - - 1.2e+06 - - - - - - - - - 1.4e+06 - - - - - - - - - 1.6e+06 - - - - Helicobacter_pylori_26695 - - - Helicobacter_pylori_J99 - - - plot_matches - - - - - - - - - - - - - - - diff --git a/bp_doc/histogram.png b/bp_doc/histogram.png deleted file mode 100644 index fb5325e..0000000 Binary files a/bp_doc/histogram.png and /dev/null differ diff --git a/bp_doc/karyogram.png b/bp_doc/karyogram.png deleted file mode 100644 index 40c72f1..0000000 Binary files a/bp_doc/karyogram.png and /dev/null differ diff --git a/bp_doc/karyogram.svg b/bp_doc/karyogram.svg deleted file mode 100644 index b5a4b1d..0000000 --- a/bp_doc/karyogram.svg +++ /dev/null @@ -1,1182 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 3 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 4 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 5 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 8 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 9 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 10 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 11 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 12 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 13 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 14 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 15 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 16 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 17 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 18 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 19 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 20 - - - - - - - - - - - - - - - - - - - - - - - - - - 21 - - - - - - - - - - - - - - - - - - - - - - - - - - - - 22 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - X - - - - - - - - - - - - - - - - - - - - - - - Y - - - - \ No newline at end of file diff --git a/bp_doc/lendist.pdf b/bp_doc/lendist.pdf deleted file mode 100644 index fe82147..0000000 Binary files a/bp_doc/lendist.pdf and /dev/null differ diff --git a/bp_doc/lendist.png b/bp_doc/lendist.png deleted file mode 100644 index d1752cc..0000000 Binary files a/bp_doc/lendist.png and /dev/null differ diff --git a/bp_doc/lendist.ps b/bp_doc/lendist.ps deleted file mode 100644 index 97c878f..0000000 --- a/bp_doc/lendist.ps +++ /dev/null @@ -1,817 +0,0 @@ -%!PS-Adobe-2.0 -%%Creator: gnuplot 4.2 patchlevel 0 -%%CreationDate: Mon Sep 3 10:28:29 2007 -%%DocumentFonts: (atend) -%%BoundingBox: 50 50 554 770 -%%Orientation: Landscape -%%Pages: (atend) -%%EndComments -%%BeginProlog -/gnudict 256 dict def -gnudict begin -% -% The following 6 true/false flags may be edited by hand if required -% The unit line width may also be changed -% -/Color false def -/Blacktext false def -/Solid false def -/Dashlength 1 def -/Landscape true def -/Level1 false def -/Rounded false def -/TransparentPatterns false def -/gnulinewidth 5.000 def -/userlinewidth gnulinewidth def -% -/vshift -33 def -/dl1 { - 10.0 Dashlength mul mul - Rounded { currentlinewidth 0.75 mul sub dup 0 le { pop 0.01 } if } if -} def -/dl2 { - 10.0 Dashlength mul mul - Rounded { currentlinewidth 0.75 mul add } if -} def -/hpt_ 31.5 def -/vpt_ 31.5 def -/hpt hpt_ def -/vpt vpt_ def -Level1 {} { -/SDict 10 dict def -systemdict /pdfmark known not { - userdict /pdfmark systemdict /cleartomark get put -} if -SDict begin [ - /Title () - /Subject (gnuplot plot) - /Creator (gnuplot 4.2 patchlevel 0) - /Author (Martin Hansen) -% /Producer (gnuplot) -% /Keywords () - /CreationDate (Mon Sep 3 10:28:29 2007) - /DOCINFO pdfmark -end -} ifelse -% -% Gnuplot Prolog Version 4.2 (August 2006) -% -/M {moveto} bind def -/L {lineto} bind def -/R {rmoveto} bind def -/V {rlineto} bind def -/N {newpath moveto} bind def -/Z {closepath} bind def -/C {setrgbcolor} bind def -/f {rlineto fill} bind def -/vpt2 vpt 2 mul def -/hpt2 hpt 2 mul def -/Lshow {currentpoint stroke M 0 vshift R - Blacktext {gsave 0 setgray show grestore} {show} ifelse} def -/Rshow {currentpoint stroke M dup stringwidth pop neg vshift R - Blacktext {gsave 0 setgray show grestore} {show} ifelse} def -/Cshow {currentpoint stroke M dup stringwidth pop -2 div vshift R - Blacktext {gsave 0 setgray show grestore} {show} ifelse} def -/UP {dup vpt_ mul /vpt exch def hpt_ mul /hpt exch def - /hpt2 hpt 2 mul def /vpt2 vpt 2 mul def} def -/DL {Color {setrgbcolor Solid {pop []} if 0 setdash} - {pop pop pop 0 setgray Solid {pop []} if 0 setdash} ifelse} def -/BL {stroke userlinewidth 2 mul setlinewidth - Rounded {1 setlinejoin 1 setlinecap} if} def -/AL {stroke userlinewidth 2 div setlinewidth - Rounded {1 setlinejoin 1 setlinecap} if} def -/UL {dup gnulinewidth mul /userlinewidth exch def - dup 1 lt {pop 1} if 10 mul /udl exch def} def -/PL {stroke userlinewidth setlinewidth - Rounded {1 setlinejoin 1 setlinecap} if} def -% Default Line colors -/LCw {1 1 1} def -/LCb {0 0 0} def -/LCa {0 0 0} def -/LC0 {1 0 0} def -/LC1 {0 1 0} def -/LC2 {0 0 1} def -/LC3 {1 0 1} def -/LC4 {0 1 1} def -/LC5 {1 1 0} def -/LC6 {0 0 0} def -/LC7 {1 0.3 0} def -/LC8 {0.5 0.5 0.5} def -% Default Line Types -/LTw {PL [] 1 setgray} def -/LTb {BL [] LCb DL} def -/LTa {AL [1 udl mul 2 udl mul] 0 setdash LCa setrgbcolor} def -/LT0 {PL [] LC0 DL} def -/LT1 {PL [4 dl1 2 dl2] LC1 DL} def -/LT2 {PL [2 dl1 3 dl2] LC2 DL} def -/LT3 {PL [1 dl1 1.5 dl2] LC3 DL} def -/LT4 {PL [6 dl1 2 dl2 1 dl1 2 dl2] LC4 DL} def -/LT5 {PL [3 dl1 3 dl2 1 dl1 3 dl2] LC5 DL} def -/LT6 {PL [2 dl1 2 dl2 2 dl1 6 dl2] LC6 DL} def -/LT7 {PL [1 dl1 2 dl2 6 dl1 2 dl2 1 dl1 2 dl2] LC7 DL} def -/LT8 {PL [2 dl1 2 dl2 2 dl1 2 dl2 2 dl1 2 dl2 2 dl1 4 dl2] LC8 DL} def -/Pnt {stroke [] 0 setdash gsave 1 setlinecap M 0 0 V stroke grestore} def -/Dia {stroke [] 0 setdash 2 copy vpt add M - hpt neg vpt neg V hpt vpt neg V - hpt vpt V hpt neg vpt V closepath stroke - Pnt} def -/Pls {stroke [] 0 setdash vpt sub M 0 vpt2 V - currentpoint stroke M - hpt neg vpt neg R hpt2 0 V stroke - } def -/Box {stroke [] 0 setdash 2 copy exch hpt sub exch vpt add M - 0 vpt2 neg V hpt2 0 V 0 vpt2 V - hpt2 neg 0 V closepath stroke - Pnt} def -/Crs {stroke [] 0 setdash exch hpt sub exch vpt add M - hpt2 vpt2 neg V currentpoint stroke M - hpt2 neg 0 R hpt2 vpt2 V stroke} def -/TriU {stroke [] 0 setdash 2 copy vpt 1.12 mul add M - hpt neg vpt -1.62 mul V - hpt 2 mul 0 V - hpt neg vpt 1.62 mul V closepath stroke - Pnt} def -/Star {2 copy Pls Crs} def -/BoxF {stroke [] 0 setdash exch hpt sub exch vpt add M - 0 vpt2 neg V hpt2 0 V 0 vpt2 V - hpt2 neg 0 V closepath fill} def -/TriUF {stroke [] 0 setdash vpt 1.12 mul add M - hpt neg vpt -1.62 mul V - hpt 2 mul 0 V - hpt neg vpt 1.62 mul V closepath fill} def -/TriD {stroke [] 0 setdash 2 copy vpt 1.12 mul sub M - hpt neg vpt 1.62 mul V - hpt 2 mul 0 V - hpt neg vpt -1.62 mul V closepath stroke - Pnt} def -/TriDF {stroke [] 0 setdash vpt 1.12 mul sub M - hpt neg vpt 1.62 mul V - hpt 2 mul 0 V - hpt neg vpt -1.62 mul V closepath fill} def -/DiaF {stroke [] 0 setdash vpt add M - hpt neg vpt neg V hpt vpt neg V - hpt vpt V hpt neg vpt V closepath fill} def -/Pent {stroke [] 0 setdash 2 copy gsave - translate 0 hpt M 4 {72 rotate 0 hpt L} repeat - closepath stroke grestore Pnt} def -/PentF {stroke [] 0 setdash gsave - translate 0 hpt M 4 {72 rotate 0 hpt L} repeat - closepath fill grestore} def -/Circle {stroke [] 0 setdash 2 copy - hpt 0 360 arc stroke Pnt} def -/CircleF {stroke [] 0 setdash hpt 0 360 arc fill} def -/C0 {BL [] 0 setdash 2 copy moveto vpt 90 450 arc} bind def -/C1 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 0 90 arc closepath fill - vpt 0 360 arc closepath} bind def -/C2 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 90 180 arc closepath fill - vpt 0 360 arc closepath} bind def -/C3 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 0 180 arc closepath fill - vpt 0 360 arc closepath} bind def -/C4 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 180 270 arc closepath fill - vpt 0 360 arc closepath} bind def -/C5 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 0 90 arc - 2 copy moveto - 2 copy vpt 180 270 arc closepath fill - vpt 0 360 arc} bind def -/C6 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 90 270 arc closepath fill - vpt 0 360 arc closepath} bind def -/C7 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 0 270 arc closepath fill - vpt 0 360 arc closepath} bind def -/C8 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 270 360 arc closepath fill - vpt 0 360 arc closepath} bind def -/C9 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 270 450 arc closepath fill - vpt 0 360 arc closepath} bind def -/C10 {BL [] 0 setdash 2 copy 2 copy moveto vpt 270 360 arc closepath fill - 2 copy moveto - 2 copy vpt 90 180 arc closepath fill - vpt 0 360 arc closepath} bind def -/C11 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 0 180 arc closepath fill - 2 copy moveto - 2 copy vpt 270 360 arc closepath fill - vpt 0 360 arc closepath} bind def -/C12 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 180 360 arc closepath fill - vpt 0 360 arc closepath} bind def -/C13 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 0 90 arc closepath fill - 2 copy moveto - 2 copy vpt 180 360 arc closepath fill - vpt 0 360 arc closepath} bind def -/C14 {BL [] 0 setdash 2 copy moveto - 2 copy vpt 90 360 arc closepath fill - vpt 0 360 arc} bind def -/C15 {BL [] 0 setdash 2 copy vpt 0 360 arc closepath fill - vpt 0 360 arc closepath} bind def -/Rec {newpath 4 2 roll moveto 1 index 0 rlineto 0 exch rlineto - neg 0 rlineto closepath} bind def -/Square {dup Rec} bind def -/Bsquare {vpt sub exch vpt sub exch vpt2 Square} bind def -/S0 {BL [] 0 setdash 2 copy moveto 0 vpt rlineto BL Bsquare} bind def -/S1 {BL [] 0 setdash 2 copy vpt Square fill Bsquare} bind def -/S2 {BL [] 0 setdash 2 copy exch vpt sub exch vpt Square fill Bsquare} bind def -/S3 {BL [] 0 setdash 2 copy exch vpt sub exch vpt2 vpt Rec fill Bsquare} bind def -/S4 {BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt Square fill Bsquare} bind def -/S5 {BL [] 0 setdash 2 copy 2 copy vpt Square fill - exch vpt sub exch vpt sub vpt Square fill Bsquare} bind def -/S6 {BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt vpt2 Rec fill Bsquare} bind def -/S7 {BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt vpt2 Rec fill - 2 copy vpt Square fill Bsquare} bind def -/S8 {BL [] 0 setdash 2 copy vpt sub vpt Square fill Bsquare} bind def -/S9 {BL [] 0 setdash 2 copy vpt sub vpt vpt2 Rec fill Bsquare} bind def -/S10 {BL [] 0 setdash 2 copy vpt sub vpt Square fill 2 copy exch vpt sub exch vpt Square fill - Bsquare} bind def -/S11 {BL [] 0 setdash 2 copy vpt sub vpt Square fill 2 copy exch vpt sub exch vpt2 vpt Rec fill - Bsquare} bind def -/S12 {BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill Bsquare} bind def -/S13 {BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill - 2 copy vpt Square fill Bsquare} bind def -/S14 {BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill - 2 copy exch vpt sub exch vpt Square fill Bsquare} bind def -/S15 {BL [] 0 setdash 2 copy Bsquare fill Bsquare} bind def -/D0 {gsave translate 45 rotate 0 0 S0 stroke grestore} bind def -/D1 {gsave translate 45 rotate 0 0 S1 stroke grestore} bind def -/D2 {gsave translate 45 rotate 0 0 S2 stroke grestore} bind def -/D3 {gsave translate 45 rotate 0 0 S3 stroke grestore} bind def -/D4 {gsave translate 45 rotate 0 0 S4 stroke grestore} bind def -/D5 {gsave translate 45 rotate 0 0 S5 stroke grestore} bind def -/D6 {gsave translate 45 rotate 0 0 S6 stroke grestore} bind def -/D7 {gsave translate 45 rotate 0 0 S7 stroke grestore} bind def -/D8 {gsave translate 45 rotate 0 0 S8 stroke grestore} bind def -/D9 {gsave translate 45 rotate 0 0 S9 stroke grestore} bind def -/D10 {gsave translate 45 rotate 0 0 S10 stroke grestore} bind def -/D11 {gsave translate 45 rotate 0 0 S11 stroke grestore} bind def -/D12 {gsave translate 45 rotate 0 0 S12 stroke grestore} bind def -/D13 {gsave translate 45 rotate 0 0 S13 stroke grestore} bind def -/D14 {gsave translate 45 rotate 0 0 S14 stroke grestore} bind def -/D15 {gsave translate 45 rotate 0 0 S15 stroke grestore} bind def -/DiaE {stroke [] 0 setdash vpt add M - hpt neg vpt neg V hpt vpt neg V - hpt vpt V hpt neg vpt V closepath stroke} def -/BoxE {stroke [] 0 setdash exch hpt sub exch vpt add M - 0 vpt2 neg V hpt2 0 V 0 vpt2 V - hpt2 neg 0 V closepath stroke} def -/TriUE {stroke [] 0 setdash vpt 1.12 mul add M - hpt neg vpt -1.62 mul V - hpt 2 mul 0 V - hpt neg vpt 1.62 mul V closepath stroke} def -/TriDE {stroke [] 0 setdash vpt 1.12 mul sub M - hpt neg vpt 1.62 mul V - hpt 2 mul 0 V - hpt neg vpt -1.62 mul V closepath stroke} def -/PentE {stroke [] 0 setdash gsave - translate 0 hpt M 4 {72 rotate 0 hpt L} repeat - closepath stroke grestore} def -/CircE {stroke [] 0 setdash - hpt 0 360 arc stroke} def -/Opaque {gsave closepath 1 setgray fill grestore 0 setgray closepath} def -/DiaW {stroke [] 0 setdash vpt add M - hpt neg vpt neg V hpt vpt neg V - hpt vpt V hpt neg vpt V Opaque stroke} def -/BoxW {stroke [] 0 setdash exch hpt sub exch vpt add M - 0 vpt2 neg V hpt2 0 V 0 vpt2 V - hpt2 neg 0 V Opaque stroke} def -/TriUW {stroke [] 0 setdash vpt 1.12 mul add M - hpt neg vpt -1.62 mul V - hpt 2 mul 0 V - hpt neg vpt 1.62 mul V Opaque stroke} def -/TriDW {stroke [] 0 setdash vpt 1.12 mul sub M - hpt neg vpt 1.62 mul V - hpt 2 mul 0 V - hpt neg vpt -1.62 mul V Opaque stroke} def -/PentW {stroke [] 0 setdash gsave - translate 0 hpt M 4 {72 rotate 0 hpt L} repeat - Opaque stroke grestore} def -/CircW {stroke [] 0 setdash - hpt 0 360 arc Opaque stroke} def -/BoxFill {gsave Rec 1 setgray fill grestore} def -/Density { - /Fillden exch def - currentrgbcolor - /ColB exch def /ColG exch def /ColR exch def - /ColR ColR Fillden mul Fillden sub 1 add def - /ColG ColG Fillden mul Fillden sub 1 add def - /ColB ColB Fillden mul Fillden sub 1 add def - ColR ColG ColB setrgbcolor} def -/BoxColFill {gsave Rec PolyFill} def -/PolyFill {gsave Density fill grestore grestore} def -/h {rlineto rlineto rlineto gsave fill grestore} bind def -% -% PostScript Level 1 Pattern Fill routine for rectangles -% Usage: x y w h s a XX PatternFill -% x,y = lower left corner of box to be filled -% w,h = width and height of box -% a = angle in degrees between lines and x-axis -% XX = 0/1 for no/yes cross-hatch -% -/PatternFill {gsave /PFa [ 9 2 roll ] def - PFa 0 get PFa 2 get 2 div add PFa 1 get PFa 3 get 2 div add translate - PFa 2 get -2 div PFa 3 get -2 div PFa 2 get PFa 3 get Rec - gsave 1 setgray fill grestore clip - currentlinewidth 0.5 mul setlinewidth - /PFs PFa 2 get dup mul PFa 3 get dup mul add sqrt def - 0 0 M PFa 5 get rotate PFs -2 div dup translate - 0 1 PFs PFa 4 get div 1 add floor cvi - {PFa 4 get mul 0 M 0 PFs V} for - 0 PFa 6 get ne { - 0 1 PFs PFa 4 get div 1 add floor cvi - {PFa 4 get mul 0 2 1 roll M PFs 0 V} for - } if - stroke grestore} def -% -/languagelevel where - {pop languagelevel} {1} ifelse - 2 lt - {/InterpretLevel1 true def} - {/InterpretLevel1 Level1 def} - ifelse -% -% PostScript level 2 pattern fill definitions -% -/Level2PatternFill { -/Tile8x8 {/PaintType 2 /PatternType 1 /TilingType 1 /BBox [0 0 8 8] /XStep 8 /YStep 8} - bind def -/KeepColor {currentrgbcolor [/Pattern /DeviceRGB] setcolorspace} bind def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 0 0 M 8 8 L 0 8 M 8 0 L stroke} ->> matrix makepattern -/Pat1 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 0 0 M 8 8 L 0 8 M 8 0 L stroke - 0 4 M 4 8 L 8 4 L 4 0 L 0 4 L stroke} ->> matrix makepattern -/Pat2 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 0 0 M 0 8 L - 8 8 L 8 0 L 0 0 L fill} ->> matrix makepattern -/Pat3 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop -4 8 M 8 -4 L - 0 12 M 12 0 L stroke} ->> matrix makepattern -/Pat4 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop -4 0 M 8 12 L - 0 -4 M 12 8 L stroke} ->> matrix makepattern -/Pat5 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop -2 8 M 4 -4 L - 0 12 M 8 -4 L 4 12 M 10 0 L stroke} ->> matrix makepattern -/Pat6 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop -2 0 M 4 12 L - 0 -4 M 8 12 L 4 -4 M 10 8 L stroke} ->> matrix makepattern -/Pat7 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 8 -2 M -4 4 L - 12 0 M -4 8 L 12 4 M 0 10 L stroke} ->> matrix makepattern -/Pat8 exch def -<< Tile8x8 - /PaintProc {0.5 setlinewidth pop 0 -2 M 12 4 L - -4 0 M 12 8 L -4 4 M 8 10 L stroke} ->> matrix makepattern -/Pat9 exch def -/Pattern1 {PatternBgnd KeepColor Pat1 setpattern} bind def -/Pattern2 {PatternBgnd KeepColor Pat2 setpattern} bind def -/Pattern3 {PatternBgnd KeepColor Pat3 setpattern} bind def -/Pattern4 {PatternBgnd KeepColor Landscape {Pat5} {Pat4} ifelse setpattern} bind def -/Pattern5 {PatternBgnd KeepColor Landscape {Pat4} {Pat5} ifelse setpattern} bind def -/Pattern6 {PatternBgnd KeepColor Landscape {Pat9} {Pat6} ifelse setpattern} bind def -/Pattern7 {PatternBgnd KeepColor Landscape {Pat8} {Pat7} ifelse setpattern} bind def -} def -% -% -%End of PostScript Level 2 code -% -/PatternBgnd { - TransparentPatterns {} {gsave 1 setgray fill grestore} ifelse -} def -% -% Substitute for Level 2 pattern fill codes with -% grayscale if Level 2 support is not selected. -% -/Level1PatternFill { -/Pattern1 {0.250 Density} bind def -/Pattern2 {0.500 Density} bind def -/Pattern3 {0.750 Density} bind def -/Pattern4 {0.125 Density} bind def -/Pattern5 {0.375 Density} bind def -/Pattern6 {0.625 Density} bind def -/Pattern7 {0.875 Density} bind def -} def -% -% Now test for support of Level 2 code -% -Level1 {Level1PatternFill} {Level2PatternFill} ifelse -% -/Symbol-Oblique /Symbol findfont [1 0 .167 1 0 0] makefont -dup length dict begin {1 index /FID eq {pop pop} {def} ifelse} forall -currentdict end definefont pop -end -%%EndProlog -%%Page: 1 1 -gnudict begin -gsave -50 50 translate -0.100 0.100 scale -90 rotate -0 -5040 translate -0 setgray -newpath -(Helvetica) findfont 100 scalefont setfont -1.000 UL -LTb -410 263 M -63 0 V -6557 0 R --63 0 V -350 263 M -( 0) Rshow -1.000 UL -LTb -410 903 M -63 0 V -6557 0 R --63 0 V -350 903 M -( 20) Rshow -1.000 UL -LTb -410 1542 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 40) Rshow -1.000 UL -LTb -410 2182 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 60) Rshow -1.000 UL -LTb -410 2821 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 80) Rshow -1.000 UL -LTb -410 3461 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 100) Rshow -1.000 UL -LTb -410 4100 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 120) Rshow -1.000 UL -LTb -410 4740 M -63 0 V -6557 0 R --63 0 V --6617 0 R -( 140) Rshow -1.000 UL -LTb -571 263 M -0 -63 V -0 -100 R -( 0) Cshow -1.000 UL -LTb -1379 263 M -0 -63 V -0 -100 R -( 5) Cshow -1.000 UL -LTb -2186 263 M -0 -63 V -0 -100 R -( 10) Cshow -1.000 UL -LTb -2993 263 M -0 -63 V -0 -100 R -( 15) Cshow -1.000 UL -LTb -3801 263 M -0 -63 V -0 -100 R -( 20) Cshow -1.000 UL -LTb -4608 263 M -0 -63 V -0 -100 R -( 25) Cshow -1.000 UL -LTb -5415 263 M -0 -63 V -0 -100 R -( 30) Cshow -1.000 UL -LTb -6223 263 M -0 -63 V -0 -100 R -( 35) Cshow -1.000 UL -LTb -7030 263 M -0 -63 V -0 -100 R -( 40) Cshow -1.000 UL -LTb -1.000 UL -LTb -410 4740 N -410 263 L -6620 0 V -0 4477 V --6620 0 V -Z stroke -3720 4890 M -(Length Distribution) Cshow -1.000 UP -1.000 UL -LTb -1.000 UL -LT0 -/Helvetica findfont 100 scalefont setfont -1.000 531 263 82 1 BoxColFill -531 263 N -81 0 V --81 0 V -Z stroke -1.000 693 263 81 1 BoxColFill -693 263 N -80 0 V --80 0 V -Z stroke -1.000 854 263 82 1 BoxColFill -854 263 N -81 0 V --81 0 V -Z stroke -1.000 1015 263 82 1 BoxColFill -1015 263 N -81 0 V --81 0 V -Z stroke -1.000 1177 263 82 1 BoxColFill -1177 263 N -81 0 V --81 0 V -Z stroke -1.000 1338 263 82 1 BoxColFill -1338 263 N -81 0 V --81 0 V -Z stroke -1.000 1500 263 82 1 BoxColFill -1500 263 N -81 0 V --81 0 V -Z stroke -1.000 1661 263 82 1 BoxColFill -1661 263 N -81 0 V --81 0 V -Z stroke -1.000 1823 263 82 1 BoxColFill -1823 263 N -81 0 V --81 0 V -Z stroke -1.000 1984 263 82 1 BoxColFill -1984 263 N -81 0 V --81 0 V -Z stroke -1.000 2146 263 81 1 BoxColFill -2146 263 N -80 0 V --80 0 V -Z stroke -1.000 2307 263 82 1 BoxColFill -2307 263 N -81 0 V --81 0 V -Z stroke -1.000 2469 263 81 1 BoxColFill -2469 263 N -80 0 V --80 0 V -Z stroke -1.000 2630 263 82 1 BoxColFill -2630 263 N -81 0 V --81 0 V -Z stroke -1.000 2792 263 81 1 BoxColFill -2792 263 N -80 0 V --80 0 V -Z stroke -1.000 2953 263 82 1 BoxColFill -2953 263 N -81 0 V --81 0 V -Z stroke -1.000 3115 263 81 1 BoxColFill -3115 263 N -80 0 V --80 0 V -Z stroke -1.000 3276 263 82 1 BoxColFill -3276 263 N -81 0 V --81 0 V -Z stroke -1.000 3437 263 82 97 BoxColFill -3437 263 N -0 96 V -81 0 V -0 -96 V --81 0 V -Z stroke -1.000 3599 263 82 1 BoxColFill -3599 263 N -81 0 V --81 0 V -Z stroke -1.000 3760 263 82 129 BoxColFill -3760 263 N -0 128 V -81 0 V -0 -128 V --81 0 V -Z stroke -1.000 3922 263 82 225 BoxColFill -3922 263 N -0 224 V -81 0 V -0 -224 V --81 0 V -Z stroke -1.000 4083 263 82 257 BoxColFill -4083 263 N -0 256 V -81 0 V -0 -256 V --81 0 V -Z stroke -1.000 4245 263 81 417 BoxColFill -4245 263 N -0 416 V -80 0 V -0 -416 V --80 0 V -Z stroke -1.000 4406 263 82 385 BoxColFill -4406 263 N -0 384 V -81 0 V -0 -384 V --81 0 V -Z stroke -1.000 4568 263 81 1120 BoxColFill -4568 263 N -0 1119 V -80 0 V -0 -1119 V --80 0 V -Z stroke -1.000 4729 263 82 1408 BoxColFill -4729 263 N -0 1407 V -81 0 V -0 -1407 V --81 0 V -Z stroke -1.000 4891 263 81 2527 BoxColFill -4891 263 N -0 2526 V -80 0 V -0 -2526 V --80 0 V -Z stroke -1.000 5052 263 82 2943 BoxColFill -5052 263 N -0 2942 V -81 0 V -0 -2942 V --81 0 V -Z stroke -1.000 5214 263 81 4094 BoxColFill -5214 263 N -0 4093 V -80 0 V -0 -4093 V --80 0 V -Z stroke -1.000 5375 263 82 4062 BoxColFill -5375 263 N -0 4061 V -81 0 V -0 -4061 V --81 0 V -Z stroke -1.000 5536 263 82 1824 BoxColFill -5536 263 N -0 1823 V -81 0 V -0 -1823 V --81 0 V -Z stroke -1.000 5698 263 82 353 BoxColFill -5698 263 N -0 352 V -81 0 V -0 -352 V --81 0 V -Z stroke -1.000 5859 263 82 129 BoxColFill -5859 263 N -0 128 V -81 0 V -0 -128 V --81 0 V -Z stroke -1.000 6021 263 82 33 BoxColFill -6021 263 N -0 32 V -81 0 V -0 -32 V --81 0 V -Z stroke -1.000 6182 263 82 33 BoxColFill -6182 263 N -0 32 V -81 0 V -0 -32 V --81 0 V -Z stroke -1.000 6344 263 82 33 BoxColFill -6344 263 N -0 32 V -81 0 V -0 -32 V --81 0 V -Z stroke -1.000 6505 263 82 65 BoxColFill -6505 263 N -0 64 V -81 0 V -0 -64 V --81 0 V -Z stroke -1.000 UL -LTb -410 4740 N -410 263 L -6620 0 V -0 4477 V --6620 0 V -Z stroke -1.000 UP -1.000 UL -LTb -stroke -grestore -end -showpage -%%Trailer -%%DocumentFonts: Helvetica -%%Pages: 1 diff --git a/bp_doc/lendist_ascii.png b/bp_doc/lendist_ascii.png deleted file mode 100644 index d74f589..0000000 Binary files a/bp_doc/lendist_ascii.png and /dev/null differ diff --git a/bp_doc/logo.svg b/bp_doc/logo.svg deleted file mode 100644 index 1759293..0000000 --- a/bp_doc/logo.svg +++ /dev/null @@ -1,334 +0,0 @@ - - - - - N - - - - A - - - - C - - - - G - - - - U - - - - C - - - - G - - - - U - - - - A - - - - G - - - - A - - - - U - - - - C - - - - U - - - - C - - - - A - - - - G - - - - U - - - - A - - - - C - - - - G - - - - G - - - - U - - - - C - - - - U - - - - A - - - - G - - - - C - - - - A - - - - G - - - - U - - - - G - - - - U - - - - C - - - - A - - - - G - - - - U - - - - C - - - - C - - - - G - - - - U - - - - A - - - - A - - - - G - - - - C - - - - U - - - - C - - - - U - - - - G - - - - A - - - - C - - - - U - - - - A - - - - G - - - - G - - - - U - - - - C - - - - U - - - - C - - - - A - - - - G - - - - C - - - - U - - - - A - - - - G - - - - A - - - - U - - - - G - - - - C - - - - U - - - - C - - - - A - - - - G - - - - C - - - - A - - - - G - - - - G - - - - U - - - - - - 2 - - - 1 - - - 0 - - - bits - - \ No newline at end of file diff --git a/bp_doc/seqlogo.png b/bp_doc/seqlogo.png deleted file mode 100644 index 3d0f952..0000000 Binary files a/bp_doc/seqlogo.png and /dev/null differ diff --git a/bp_doc/seqlogo.svg b/bp_doc/seqlogo.svg deleted file mode 100644 index 9f2d94a..0000000 --- a/bp_doc/seqlogo.svg +++ /dev/null @@ -1,334 +0,0 @@ - - - - - N - - - - A - - - - C - - - - G - - - - U - - - - C - - - - G - - - - U - - - - A - - - - G - - - - A - - - - U - - - - C - - - - U - - - - C - - - - A - - - - G - - - - U - - - - A - - - - C - - - - G - - - - G - - - - U - - - - C - - - - U - - - - A - - - - G - - - - C - - - - A - - - - G - - - - U - - - - G - - - - U - - - - C - - - - A - - - - G - - - - U - - - - C - - - - C - - - - G - - - - U - - - - A - - - - A - - - - G - - - - C - - - - U - - - - C - - - - U - - - - G - - - - A - - - - C - - - - U - - - - A - - - - G - - - - G - - - - U - - - - C - - - - U - - - - C - - - - A - - - - G - - - - C - - - - U - - - - A - - - - G - - - - A - - - - U - - - - G - - - - C - - - - U - - - - C - - - - A - - - - G - - - - C - - - - A - - - - G - - - - G - - - - U - - - - - - 2 - - - 1 - - - 0 - - - bits - -