X-Git-Url: https://git.donarmstrong.com/?a=blobdiff_plain;f=README.md;h=2bb60152c0492f6b8260169a72edfe877b5938c0;hb=3ec78aa9af79921c44d62b65f88865a4b65880be;hp=0c74b6eb387158835f7cbea1005c1273ecab7d87;hpb=a35ba4aedd7ef00cf33fc50a1e4c9454635d0f4a;p=rsem.git diff --git a/README.md b/README.md index 0c74b6e..2bb6015 100644 --- a/README.md +++ b/README.md @@ -27,7 +27,7 @@ the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. It can also generate genomic-coordinate BAM files and UCSC wiggle files for visualization. In addition, it provides posterior mean and 95% credibility interval -estimates for expression levels. +estimates for expression levels. ## Compilation & Installation @@ -40,9 +40,13 @@ variable. ### Prerequisites +C++ and Perl are required to be installed. + To take advantage of RSEM's built-in support for the Bowtie alignment program, you must have [Bowtie](http://bowtie-bio.sourceforge.net) installed. +If you want to plot model learned by RSEM, you should also install R. + ## Usage ### I. Preparing Reference Sequences @@ -96,6 +100,12 @@ and provide the SAM or BAM file as an argument. When using an alternative aligner, you may also want to provide the --no-bowtie option to rsem-prepare-reference so that the Bowtie indices are not built. +However, please note that RSEM does ** not ** support gapped +alignments. So make sure that your aligner does not produce alignments +with intersions/deletions. Also, please make sure that you use +'reference_name.idx.fa' , which is generated by RSEM, to build your +aligner's indices. + ### III. Visualization RSEM contains a version of samtools in the 'sam' subdirectory. When @@ -125,21 +135,31 @@ Refer to the [UCSC custom track help page](http://genome.ucsc.edu/goldenPath/hel #### c) Visualize the model learned by RSEM -RSEM provides an R script, plotModel.R, for visulazing the model learned. +RSEM provides an R script, 'rsem-plot-model', for visulazing the model learned. Usage: - plotModel.R modelF outF + rsem-plot-model sample_name outF -modelF: the sample_name.model file generated by RSEM +sample_name: the name of the sample analyzed outF: the file name for plots generated from the model. It is a pdf file The plots generated depends on read type and user configuration. It may include fragment length distribution, mate length distribution, -read start position distribution (RSPD), quality score vs percentage -of sequecing error given the reference base, position vs percentage of -sequencing errro given the reference base. +read start position distribution (RSPD), quality score vs observed +quality given a reference base, position vs percentage of sequencing +error given a reference base and histogram of read alignments. + +fragment length distribution and mate length distribution: x-axis is fragment/mate length, y axis is the probability of generating a fragment/mate with the associated length +RSPD: Read Start Position Distribution. x-axis is bin number, y-axis is the probability of each bin. RSPD can be used as an indicator of 3' bias + +Quality score vs. observed quality given a reference base: x-axis is Phred quality scores associated with data, y-axis is the "observed quality", Phred quality scores learned by RSEM from the data. Q = -10log_10(P), where Q is Phred quality score and P is the probability of sequencing error for a particular base + +Position vs. percentage sequencing error given a reference base: x-axis is position and y-axis is percentage sequencing error + +Histogram of read alignments: x-axis is the number of alignments a read has and y-axis is the number of such reads + ## Example Suppose we download the mouse genome from UCSC Genome Browser. We will @@ -187,9 +207,7 @@ output_name.sim.isoforms.results, output_name.sim.genes.results : Results estima ## Acknowledgements -RSEM uses randomc.h and mersenne.cpp from - for random number generation. RSEM -also uses the [Boost C++](http://www.boost.org) and +RSEM uses the [Boost C++](http://www.boost.org) and [samtools](http://samtools.sourceforge.net) libraries. ## License