X-Git-Url: https://git.donarmstrong.com/?a=blobdiff_plain;f=README.md;h=5acd5937db2f573f9e2a3900baa9bb5b84cdec3c;hb=019648be71e0b8ea5772530b5496720fcb841bba;hp=0c900b725107ac55b3074fa6cc0fa5e1b628c978;hpb=18c158c73b8f7f0afe4db97ac51b520149d33c12;p=rsem.git diff --git a/README.md b/README.md index 0c900b7..5acd593 100644 --- a/README.md +++ b/README.md @@ -40,9 +40,13 @@ variable. ### Prerequisites +C++ and Perl are required to be installed. + To take advantage of RSEM's built-in support for the Bowtie alignment program, you must have [Bowtie](http://bowtie-bio.sourceforge.net) installed. +If you want to plot model learned by RSEM, you should also install R. + ## Usage ### I. Preparing Reference Sequences @@ -125,21 +129,29 @@ Refer to the [UCSC custom track help page](http://genome.ucsc.edu/goldenPath/hel #### c) Visualize the model learned by RSEM -RSEM provides an R script, plotModel.R, for visulazing the model learned. +RSEM provides an R script, 'rsem-plot-model', for visulazing the model learned. Usage: - plotModel.R modelF outF + rsem-plot-model modelF outF -modelF: the sample_name.model file generated by RSEM -outF: the file name for plots generated from the model. It is a pdf file +modelF: the sample_name.model file generated by RSEM +outF: the file name for plots generated from the model. It is a pdf file The plots generated depends on read type and user configuration. It may include fragment length distribution, mate length distribution, -read start position distribution (RSPD), quality score vs percentage -of sequecing error given the reference base, position vs percentage of -sequencing errro given the reference base. +read start position distribution (RSPD), quality score vs observed +quality given a reference base, position vs percentage of sequencing +error given a reference base. + +fragment length distribution and mate length distribution: x-axis is fragment/mate length, y axis is the probability of generating a fragment/mate with the associated length +RSPD: Read Start Position Distribution. x-axis is bin number, y-axis is the probability of each bin. RSPD can be used as an indicator of 3' bias + +Quality score vs. observed quality given a reference base: x-axis is Phred quality scores associated with data, y-axis is the "observed quality", Phred quality scores learned by RSEM from the data. Q = -10log_10(P), where Q is Phred quality score and P is the probability of sequencing error for a particular base + +Position vs. percentage sequencing error given a reference base: x-axis is position and y-axis is percentage sequencing error + ## Example Suppose we download the mouse genome from UCSC Genome Browser. We will